项目名称: sly clear voice:
文档管理: 显示文档
摘要|
管理|
首页|
论坛|
跟踪|
错误|
支持|
补丁|
电子邮件|
任务|
文档 |
调查|
新闻|
CVS|
软件包|
提交新文档 | 查看文档 | 管理 >
> DTW for Speech Recognition
>
>
> --------------------------------------------------------------------------------
>
> 1. Parameters for Speech Recognition
>
>
> LPCC
> LPCC = Linear Prediction Coding Coefficients
>
> Low complexity, easy to be implemented on low speed DSP or MCU.
>
>
> MFCC
> MFCC = Mel-Frequency Cepstrum Coefficients
>
> Most used parameter for speech recognition in recent years. Because of calling DFT, MFCC need relatively higher MIPs.
>
> Extracting MFCC from .WAV files
> Using 'melcepst' in VoiceBox toolbox.
>
> Example:
>     x = wavread('xxx.wav');
>     x = filter([1 -0.9375], 1, x);
>     m = melcepst(x,8000,'M',12,24,256,80);
>    
> Endpoint detection
> Endpoint detection is to remove silence and noise segments and keep only the voiced segment. It's also known as Voice Activity Detection(VAD) in speech coding. But in speech recognition, higher accuracy is needed than VAD in speech coding, because speech recognition by DTW is highly dependent on good endpoints.
>
> Here is a vad program using short time energy and zero crossing rate.
> Click vad.m to view the source.
>
> To find the start frame x1 and end frame x2 from speech signal x, use:
>
>     [x1 x2] = vad(x);
>    
> Note: in this demo, x1 and x2 are in frames, 1 frame = 80 points.
> 2. Prepare of Reference Models
> (1) Read .WAV files
> (2) Call vad.m to find start point and endpoint of speech signal
> (3) Call melcepst.m to calculate MFCC parameters
> (4) Save the parameters in a structure
>
>
> Example:
>     for i=1:10
>         fname = sprintf('%da.wav',i-1);
>         x = wavread(fname);
>         [x1 x2] = vad(x);
>         x = filter([1 -0.9375], 1, x);
>         m = melcepst(x,8000,'M',12,24,256,80);
>         m = m(x1:x2,:);
>         ref(i).mfcc = m;
>     end
>    
> 3. Standard DTW
> Use a MxN matrix to save the matching scores, where M and N are length of reference model and test model.
> Click dtw.m to see the source.
>
> 4. Efficient DTW
> Use 2 Mx1 vectors to save accumulate distances. Warping is constrained in a rhombus.
> This is a fast and memory efficient DTW, which is easy to be implemented in DSP system.
>
> Click dtw2.m to see the source.
>
> 5. Example
> Download dtw.zip
> This zip file includes:
>
> dtw.m
> dtw2.m
> vad.m
> testdtw.m
> Download wav.zip and extract it.
> This zip file includes:
>
> 0a.wav ~ 9a.wav, 10 mandarin digits used for reference
> 0b.wav ~ 9b.wav, 10 mandarin digits used for test
> Extract the files to the same folder. To use dtw2.m instead of dtw.m, edit testdtw.m yourself.
> In MATLAB, run testdtw to get the results.
>
>
>     >> testdtw
>     Prepare reference model...
>     Calculate test model...
>     Matching...
>     Result...
>     Test model 1 is recognized as:1
>     Test model 2 is recognized as:2
>     Test model 3 is recognized as:3
>     Test model 4 is recognized as:4
>     Test model 5 is recognized as:5
>     Test model 6 is recognized as:6
>     Test model 7 is recognized as:7
>     Test model 8 is recognized as:8
>     Test model 9 is recognized as:9
>     Test model 10 is recognized as:10
>     >>
>    
> 6. Algorithm Description
> Will be uploaded soon.
>
> --------------------------------------------------------------------------------
>
> Looking for Research Scientist or Postdoc Position in Speech Recognition Area
> Click here for details, thanks!
>
> --------------------------------------------------------------------------------
|