CN107507628B

CN107507628B - Singing scoring method, singing scoring device and terminal

Info

Publication number: CN107507628B
Application number: CN201710770576.XA
Authority: CN
Inventors: 梁衍鹏
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2021-01-15
Anticipated expiration: 2037-08-31
Also published as: CN107507628A

Abstract

The invention discloses a singing scoring method, a singing scoring device and a singing scoring terminal, and belongs to the technical field of voice signal processing. The method comprises the following steps: acquiring a voice data stream generated when a user sings a song in real time; converting the human voice data stream into a pitch data stream; shifting the pitch data stream based on n shift durations respectively to obtain n shift data streams, wherein the n shift durations are different; respectively calculating the singing score of each lyric according to the n offset data streams; and calculating the singing score of the song according to the singing scores of all the lyrics in the song. The method and the device can improve the singing score of each lyric by scoring different offset data streams for multiple times, and finally improve the singing score of the song instead of calculating the singing score of each lyric according to one pitch data stream, thereby solving the problem of low singing score of the song when the singing score of the lyric is inaccurate, and achieving the effect of improving the singing score of the song.

Description

Singing scoring method, singing scoring device and terminal

Technical Field

The invention relates to the technical field of voice signal processing, in particular to a singing scoring method, a singing scoring device and a singing scoring terminal.

Background

The user is when singing, is the accompaniment of terminal broadcast song usually, and the user sings along with the accompaniment, and the sound card in the terminal records the external sound of this moment, obtains live broadcast audio stream. The terminal can also score the live audio stream, and the user can determine the singing level according to the singing score.

In the related technology, a terminal acquires a voice data stream generated when a user sings a song; converting the voice data stream into a pitch data stream; respectively calculating singing scores of each lyric according to the pitch data stream; and calculating the singing score of the song according to the singing scores of all the lyrics in the song.

The singing score of each lyric is calculated only according to the pitch data stream, and the scoring mode is strict, so that the singing score of the song is not high.

Disclosure of Invention

In order to solve the problem that singing score of a user is not high due to the fact that a standard pitch file starting from the initial moment of a pitch segment scores the pitch segment, the embodiment of the invention provides a singing scoring method, a singing scoring device and a singing scoring terminal. The technical scheme is as follows:

in a first aspect, a singing scoring method is provided, the method comprising:

acquiring a voice data stream generated when a user sings a song in real time;

converting the human voice data stream into a pitch data stream;

shifting the pitch data stream based on n shift durations respectively to obtain n shift data streams, wherein the n shift durations are different, and n is more than or equal to 2;

respectively calculating the highest singing score of each lyric according to the n offset data streams;

and calculating the singing score of the song according to the singing scores of all the lyrics in the song.

In a second aspect, there is provided a singing scoring apparatus, the apparatus comprising:

the acquisition module is used for acquiring a voice data stream generated when a user sings a song in real time;

the conversion module is used for converting the human voice data stream obtained by the acquisition module into a pitch data stream;

the shifting module is used for shifting the pitch data stream based on n shifting durations respectively to obtain n shifting data streams, wherein the n shifting durations are different, and n is more than or equal to 2;

the first calculation module is used for calculating the highest singing score of each lyric according to the n offset data streams obtained by the offset module;

and the second calculation module is used for calculating the singing score of the song according to the singing scores of all the lyrics in the song.

In a third aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the singing scoring method according to the first aspect.

In a fourth aspect, there is provided a singing scoring apparatus comprising a processor and a memory, wherein the memory has stored therein at least one instruction, wherein the instruction is loaded by the processor and executes the singing scoring method according to the first aspect.

The technical scheme provided by the embodiment of the invention has the beneficial effects that:

since the n offset durations are different, n different offset data streams can be obtained after the offset based on the n offset durations, and thus, the terminal can respectively calculate the highest singing score of each lyric according to the n different offset data streams, so that the singing score of each lyric is improved by scoring different offset data streams for many times, the singing score of the song is finally improved, rather than calculating the singing score of each lyric according to one pitch data stream, the problem of low singing score of the song when the singing score of the lyric is inaccurate is solved, and the effect of improving the singing score of the song is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a singing scoring method according to an embodiment of the present invention;

fig. 2 is a flowchart of a singing scoring method according to another embodiment of the present invention;

FIG. 3 is a schematic flow chart of singing scoring according to an embodiment of the present invention;

fig. 4 is a block diagram illustrating a structure of a singing scoring apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram illustrating a structure of a singing scoring apparatus according to still another embodiment of the present invention;

fig. 6 is a block diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a singing scoring method according to an embodiment of the present invention is shown, where the singing scoring method can be applied to a terminal, where the terminal can be a smart television, a smart phone, or a tablet computer. The singing scoring method comprises the following steps:

step 101, acquiring a voice data stream generated when a user sings a song in real time.

Step 102, converting the human voice data stream into a pitch data stream.

And 103, shifting the pitch data streams respectively based on the n shifting durations to obtain n shifting data streams.

Wherein, the n offset time lengths are different, and n is more than or equal to 2.

And step 104, respectively calculating the singing score of each sentence of lyrics according to the n offset data streams.

And 105, calculating the singing score of the song according to the singing scores of all the lyrics in the song.

In summary, in the singing scoring method provided in the embodiment of the present invention, since n offset durations are different from each other, n different offset data streams can be obtained based on the offset of the n offset durations, so that the terminal can calculate the highest singing score of each lyric according to the n different offset data streams, and thus, by scoring different offset data streams for multiple times, the singing score of each lyric is improved, and finally, the singing score of the song is improved, instead of calculating the singing score of each lyric according to only one pitch data stream, the problem that the singing score of the song is not high when the singing score of the lyric is not accurate is solved, and the effect of improving the singing score of the song is achieved.

Referring to fig. 2, a flowchart of a singing scoring method according to another embodiment of the present invention is shown, where the singing scoring method can be applied to a terminal, where the terminal can be a smart television, a smart phone, or a tablet computer. The singing scoring method comprises the following steps:

step 201, recording external sound when a user sings a song in real time to obtain a live broadcast audio stream.

The user is when singing, plays the accompaniment of song in the terminal, and the user sings along with the accompaniment, and at this moment, the sound card in the terminal is recorded external sound in real time, and the data stream that this embodiment obtained the recording is called live audio stream. At this time, the live broadcast audio stream at least includes a voice data stream and an accompaniment, and the voice data stream is a data stream formed by sound emitted by the user.

Step 202, extracting a voice data stream from the live audio stream according to the accompaniment of the song.

On the premise of acquiring the accompaniment, the technology for extracting the vocal data stream from the live audio stream is mature, and is not described in detail in this embodiment.

Step 203, converting the human voice data stream into a pitch data stream.

The pitch is the height of sound and is determined by the frequency of vibration of the sounding body. In this embodiment, the terminal scores the singing level of the user by pitch.

The technology for converting the human voice data stream into the pitch data stream is already mature, and is not described in detail in this embodiment.

It should be noted that, the terminal starts timing when the sound card starts recording, so the voice data stream corresponds to a time axis, and correspondingly, the pitch data stream converted according to the voice data stream also corresponds to the time axis. That is, the pitch data stream may represent the pitch of the sound at each time instant.

And 204, shifting the pitch data stream based on the n shifting durations respectively to obtain n shifting data streams.

Wherein, the n offset time lengths are different, and n is more than or equal to 2. The offset comprises a forward offset and a backward offset, and when the offset is forward, the offset duration is a negative value; when shifting backward, the shift duration is a positive value. When the shift duration is 0, the shift data stream is the pitch data stream obtained in step 203.

In setting the offset duration, the terminal may set an offset step size, and n offset durations are expressed in n offset step sizes. The offset step may be set between 50ms and 60ms, or may be set to other values, which is not limited in this embodiment. In one implementation, if n is 5 and the offset step is 60ms, the offset duration is-120 ms, -60ms, 0, 60ms, 120ms, respectively.

Optionally, the terminal may further set a maximum value of the offset duration, where an absolute value of the forward or backward offset duration is less than or equal to the offset duration.

In implementation, the terminal may shift the initial time of the pitch data stream based on the time axis, thereby implementing the shifting of the audio data stream. For example, if the pitch data stream is shifted by 50ms and 60ms before the shift, the pitch data stream is shifted by 110ms after the shift.

It should be noted that, the value of the initial sampling time of the live audio stream is greater than the maximum value of the n offset durations, that is, the terminal may set the value of the initial sampling time of the live audio stream to be greater than the maximum value of the n offset durations, so that when the pitch data stream corresponding to the live audio stream is shifted forward, the problem of abnormal shifting caused by the fact that the shifted initial time is on the negative axis of the time axis is not caused, thereby ensuring normal execution of the shifting.

Step 205, extracting n pitch segments corresponding to each lyric from the n offset data streams.

The pitch segment is obtained by converting a human voice data segment, and the human voice data segment is human voice data of a sentence of lyrics.

In this embodiment, extracting n pitch segments corresponding to each lyric from n offset data streams may include the following two sub-steps:

step 205a, a lyric file of a song is obtained.

The lyric file includes all lyrics of a song, and the lyrics are punctuated in sentence units.

Step 205b, for each offset data stream, extracting a pitch segment corresponding to the lyrics from the offset data stream when the current data of the offset data stream corresponds to the end of a sentence of lyrics in the lyrics file.

In implementation, the terminal extracts a pitch segment corresponding to a lyric from the offset data stream in real time whenever the current data of an offset data stream corresponds to the end of a lyric in the lyric file. For example, when the current data of the offset data stream corresponds to a first sentence of lyrics, extracting a pitch segment corresponding to the first sentence of lyrics; when the current data of the offset data stream corresponds to the second lyric, extracting a pitch segment corresponding to the second lyric, and so on until a pitch segment corresponding to the last lyric is extracted, and then executing step 206.

And step 206, scoring n pitch segments corresponding to each lyric, and taking the singing score with the highest score as the singing score with the highest lyric.

In this embodiment, scoring the n pitch segments corresponding to each lyric may include the following two sub-steps:

step 206a, a standard pitch file of the song is obtained.

The standard pitch file is set at the time of composition and can indicate a standard pitch at each time.

And step 206b, scoring n pitch segments corresponding to the lyrics through the standard pitch file to obtain n singing scores for each sentence of the lyrics.

Since each offset data stream can extract a pitch segment corresponding to the lyric, the terminal can finally extract n pitch segments corresponding to the lyric from the n offset data streams, and the terminal scores the n pitch segments to obtain n singing scores.

When the standard pitch is realized, the terminal obtains the standard pitch segment corresponding to the lyric in the standard pitch file, and the standard pitch segment is utilized to respectively score the n pitch segments to obtain n singing scores. Thus, when the pitch of the pitch segment does not correspond to the pitch of the standard pitch segment at the same moment due to production errors or the fact that a user sings on time or the sound card has time delay, the moment of the pitch segment can be adjusted through n offset time lengths. When the adjusted pitch segment corresponds to the standard pitch segment at the same pitch, the singing score of the pitch segment is the highest, and the singing score can reflect the real singing level of the user, so that the singing score is used as the singing score of the lyrics, and the singing score is improved by widening the scoring standard.

It should be noted that, in the related art, the time interval between adjacent pitches is 50-60ms, so that, in the present embodiment, the shift step is set between 50-60ms, so that when the pitch segment is adjusted by using the shift step as a unit, the pitch segment is actually adjusted by using the pitch as a unit, so that the scoring result obtained by the scoring algorithm based on the pitch is more accurate.

After

steps

205 and 206 are executed to obtain the singing score of one lyric, the terminal executes

steps

205 and 206 again to obtain the singing score of the next lyric, and so on until the singing scores of all the lyrics are obtained, and executes step 207.

And step 207, calculating the singing scores of the songs according to the singing scores of all the lyrics in the songs.

And the terminal performs weighted average on the singing scores of all the lyrics and takes the calculated result as the singing score of the song.

Please refer to the schematic flow diagram of singing scoring shown in fig. 3, wherein n scoring instances are set in the terminal, each scoring instance obtains a lyric file, a standard pitch file and a path offset data stream, a pitch segment corresponding to a lyric is extracted from the path offset data stream according to the lyric file, the pitch segment is scored according to the standard pitch file, a singing score of the lyric is output, the terminal receives the singing score of the lyric output by each scoring instance, stores the obtained n singing scores, selects the singing score with the highest score as the highest singing score of the lyric, and the terminal performs weighted averaging on the singing scores of all the lyrics to obtain a final singing score of the song.

For each lyric, extracting n pitch segments corresponding to the lyric from n offset data streams, wherein the initial moments of the n pitch segments are different, then obtaining a standard pitch segment corresponding to the lyric from a standard pitch file, scoring the n pitch segments by using the standard pitch segment to obtain n singing scores, and then using the singing score with the highest score as the highest singing score of the lyric by the terminal.

The numerical value of the initial sampling time of the live audio stream is greater than the maximum value of the n offset time durations, so that when the pitch data stream corresponding to the live audio stream is offset forwards, the problem of abnormal offset caused by the fact that the offset initial time is on the negative axis of the time axis is avoided, and normal execution of the offset is guaranteed.

Referring to fig. 4, a block diagram of a singing scoring device according to an embodiment of the present invention is shown, where the singing scoring device may be applied to a terminal, where the terminal may be a smart television, a smart phone, or a tablet computer. This singing grading device includes:

the obtaining module 410 is configured to obtain a voice data stream generated when a user sings a song in real time;

a conversion module 420, configured to convert the human voice data stream obtained by the obtaining module 410 into a pitch data stream;

the shifting module 430 is configured to shift the pitch data stream based on n shifting durations, respectively, to obtain n shifting data streams, where the n shifting durations are different from each other, and n is greater than or equal to 2;

a first calculating module 440, configured to calculate a singing score of each lyric according to the n offset data streams obtained by the offset module 430;

and the second calculating module 450 is configured to calculate a singing score of the song according to the singing scores of all the lyrics in the song.

In summary, in the singing scoring device provided in the embodiment of the present invention, since the n offset durations are different from each other, the n different offset data streams can be obtained based on the n offset durations, so that the terminal can calculate the highest singing score of each lyric according to the n different offset data streams, and thus, by scoring different offset data streams for multiple times, the singing score of each lyric is improved, and finally, the singing score of the song is improved, instead of calculating the singing score of each lyric according to only one pitch data stream, the problem that the singing score of the song is not high when the singing score of the lyric is not accurate is solved, and the effect of improving the singing score of the song is achieved.

Referring to fig. 5, a block diagram of a singing scoring device according to still another embodiment of the present invention is shown, where the singing scoring device may be applied to a terminal, where the terminal may be a smart television, a smart phone, or a tablet computer. This singing grading device includes:

an obtaining module 510, configured to obtain, in real time, a voice data stream generated when a user sings a song;

a conversion module 520, configured to convert the human voice data stream obtained by the obtaining module 510 into a pitch data stream;

a shifting module 530, configured to shift the pitch data stream based on n shifting durations, respectively, to obtain n shifting data streams, where the n shifting durations are different from each other, and n is greater than or equal to 2;

a first calculating module 540, configured to calculate a singing score of each lyric according to the n offset data streams obtained by the offset module 530;

and a second calculating module 550, configured to calculate a singing score of the song according to the singing scores of all the lyrics in the song.

Optionally, the first calculating module 540 includes:

a first extraction unit 541, configured to extract n pitch segments corresponding to each lyric from the n offset data streams;

the calculating unit 542 is configured to score n pitch segments corresponding to each lyric obtained by the first extracting unit 541, and use the singing score with the highest score as the singing score with the highest lyric.

Optionally, the first extracting unit 541 is specifically configured to:

acquiring a lyric file of a song;

for each offset data stream, when the current data of the offset data stream corresponds to the end of a sentence of lyrics in the lyrics file, a pitch segment corresponding to the lyrics is extracted from the offset data stream.

Optionally, the calculating unit 542 is specifically configured to:

acquiring a standard pitch file of a song;

and for each sentence of lyrics, scoring n pitch segments corresponding to the lyrics through a standard pitch file to obtain n singing scores.

Optionally, the obtaining module 510 includes:

the recording unit 511 is configured to record external sound in real time when the user sings a song, so as to obtain a live broadcast audio stream, where a value of an initial sampling time of the live broadcast audio stream is greater than a maximum value of the n offset durations;

a second extracting unit 512, configured to extract a vocal data stream from the live audio stream obtained by the recording unit 511 according to the accompaniment of the song.

Referring to fig. 6, a block diagram of a terminal 600 according to an embodiment of the present invention is shown, where the terminal may include Radio Frequency (RF) circuit 601, a memory 602 including one or more computer-readable storage media, an input unit 603, a display unit 604, a sensor 605, an audio circuit 606, a Wireless Fidelity (WiFi) module 607, a processor 609 including one or more processing cores, and a power supply 609. Those skilled in the art will appreciate that the terminal structure shown in fig. 6 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the RF circuit 601 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink information from a base station and then processing the received downlink information by one or more processors 609; in addition, data relating to uplink is transmitted to the base station. In general, the RF circuit 601 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 601 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), and the like.

The memory 602 may be used to store software programs and modules, and the processor 609 executes various functional applications and data processing by operating the software programs and modules stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal device, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 602 may also include a memory controller to provide the processor 608 and the input unit 603 access to the memory 602.

The input unit 603 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, input unit 603 may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device and converts it to touch point coordinates, which are provided to processor 609, and can receive and execute commands from processor 608. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 603 may include other input devices in addition to the touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 604 may be used to display information input by or provided to the user and various graphical user interfaces of the terminal device, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 604 may include a Display panel, and optionally, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel according to the type of touch event. Although in FIG. 6 the touch-sensitive surface and the display panel are two separate components to implement input and output functions, in some embodiments the touch-sensitive surface may be integrated with the display panel to implement input and output functions.

The terminal may also include at least one sensor 605, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or the backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal, detailed description is omitted here.

Audio circuitry 606, a speaker, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 606 may transmit the electrical signal converted from the received audio data to a speaker, and convert the electrical signal into a sound signal for output; on the other hand, the microphone converts the collected sound signal into an electric signal, which is received by the audio circuit 606 and converted into audio data, which is then processed by the audio data output processor 609, and then transmitted to, for example, another terminal via the RF circuit 601, or the audio data is output to the memory 602 for further processing. The audio circuit 606 may also include an earbud jack to provide communication of peripheral headphones with the terminal.

WiFi belongs to short-distance wireless transmission technology, and the terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 607, and provides wireless broadband internet access for the user. Although fig. 6 shows the WiFi module 607, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 608 is a control center of the terminal, connects various parts of the entire handset using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the handset. Optionally, processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.

The terminal also includes a power supply 609 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 608 via a power management system that may be used to manage charging, discharging, and power consumption. The power supply 609 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, and the like, which will not be described herein. Specifically, in this embodiment, the processor 608 in the terminal may execute one or more program instructions stored in the memory 602 to implement the singing scoring method provided in the above-described method embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

An embodiment of the present invention provides a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, loaded and executed by the processor to implement the singing scoring method as described above.

One embodiment of the present invention provides a terminal comprising a processor and a memory, wherein the memory stores at least one instruction, and the instruction is loaded by the processor and executes the singing scoring method.

It should be noted that: when the singing scoring device provided by the embodiment performs singing scoring, the function distribution can be completed by different function modules according to needs, namely, the internal structure of the singing scoring device is divided into different function modules to complete all or part of the functions described above. In addition, the singing scoring device and the singing scoring method provided by the embodiment belong to the same concept, and specific implementation processes are detailed in the method embodiment and are not described again.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A singing scoring method, comprising:

acquiring a voice data stream generated when a user sings a song in real time;

converting the human voice data stream into a pitch data stream indicating a pitch of a sound at each time on a time axis;

shifting the initial time of the pitch data stream based on a time axis according to each of n shift durations to obtain n shift data streams, wherein each shift duration corresponds to one shift data stream, the n shift durations are different, and n is greater than or equal to 2;

2. The method of claim 1, wherein calculating the highest singing score for each lyric from the n offset data streams comprises:

extracting n pitch segments corresponding to each sentence of lyrics from the n offset data streams;

and respectively scoring n pitch segments corresponding to each lyric, and taking the singing score with the highest score as the singing score with the highest lyric.

3. The method of claim 2, wherein the extracting n pitch segments for each lyric from the n offset data streams comprises:

acquiring a lyric file of the song;

for each offset data stream, when the current data of the offset data stream corresponds to the end of a lyric in the lyric file, extracting a pitch segment corresponding to the lyric from the offset data stream.

4. The method of claim 2, wherein the scoring n pitch segments for each lyric comprises:

acquiring a standard pitch file of the song;

and for each sentence of lyrics, scoring n pitch segments corresponding to the lyrics through the standard pitch file to obtain n singing scores.

5. The method according to any one of claims 1 to 4, wherein the real-time acquisition of the vocal data stream generated by the user in singing the song comprises:

recording external sound when the user sings the song in real time to obtain a live broadcast audio stream, wherein the value of the initial sampling time of the live broadcast audio stream is greater than the maximum value of the n offset durations;

and extracting the voice data stream from the live broadcast audio stream according to the accompaniment of the song.

6. A singing scoring device, the device comprising:

a conversion module, configured to convert the human voice data stream obtained by the obtaining module into a pitch data stream, where the pitch data stream is used to indicate a pitch of a sound at each time on a time axis;

the offset module is used for offsetting the initial time of the pitch data stream based on a time axis according to each offset time length in the n offset time lengths to obtain n offset data streams, each offset time length corresponds to one offset data stream, the n offset time lengths are different from each other, and n is more than or equal to 2;

7. The apparatus of claim 6, wherein the first computing module comprises:

a first extraction unit, configured to extract n pitch segments corresponding to each lyric from the n offset data streams;

and the calculation unit is used for respectively scoring the n pitch segments corresponding to each lyric obtained by the first extraction unit and taking the singing score with the highest score as the singing score with the highest lyric.

8. The apparatus according to claim 7, wherein the first extraction unit is specifically configured to:

acquiring a lyric file of the song;

9. The apparatus according to claim 7, wherein the computing unit is specifically configured to:

acquiring a standard pitch file of the song;

10. The apparatus according to any one of claims 6 to 9, wherein the obtaining module comprises:

the recording unit is used for recording external sound when the user sings the song in real time to obtain a live broadcast audio stream, and the numerical value of the initial sampling time of the live broadcast audio stream is greater than the maximum value of the n offset durations;

and the second extraction unit is used for extracting the voice data stream from the live broadcast audio stream obtained by the recording unit according to the accompaniment of the song.

11. A computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a singing scoring method according to any one of claims 1 to 5.

12. A terminal, characterized in that it comprises a processor and a memory, said memory having stored therein at least one instruction, said instruction being loaded by said processor and executing the singing scoring method according to any one of claims 1 to 5.