联盟首页 协同开发 共创软件 开放源码 软件工程 共创论坛 关于联盟
  您的位置 » 2004年05月17日  
通过SSL登录
新用户通过SSL注册

完全匹配   
项目摘要

项目维护

软件分类表
新发布软件
代码片断

项目名称: sly clear voice:
文档管理: 显示文档


摘要| 管理| 首页| 论坛| 跟踪| 错误| 支持| 补丁| 电子邮件| 任务| 文档 |  调查| 新闻| CVS| 软件包|

提交新文档 | 查看文档 | 管理
>
> DTW for Speech Recognition
>
>
> --------------------------------------------------------------------------------
>
> 1. Parameters for Speech Recognition
>
>
> LPCC
> LPCC = Linear Prediction Coding Coefficients
>
> Low complexity, easy to be implemented on low speed DSP or MCU.
>
>
> MFCC
> MFCC = Mel-Frequency Cepstrum Coefficients
>
> Most used parameter for speech recognition in recent years. Because of calling DFT, MFCC need relatively higher MIPs.
>
> Extracting MFCC from .WAV files
> Using 'melcepst' in VoiceBox toolbox.
>
> Example:
>     x = wavread('xxx.wav');
>     x = filter([1 -0.9375], 1, x);
>     m = melcepst(x,8000,'M',12,24,256,80);
>    
> Endpoint detection
> Endpoint detection is to remove silence and noise segments and keep only the voiced segment. It's also known as Voice Activity Detection(VAD) in speech coding. But in speech recognition, higher accuracy is needed than VAD in speech coding, because speech recognition by DTW is highly dependent on good endpoints.
>
> Here is a vad program using short time energy and zero crossing rate.
> Click vad.m to view the source.
>
> To find the start frame x1 and end frame x2 from speech signal x, use:
>
>     [x1 x2] = vad(x);
>    
> Note: in this demo, x1 and x2 are in frames, 1 frame = 80 points.
> 2. Prepare of Reference Models
> (1) Read .WAV files
> (2) Call vad.m to find start point and endpoint of speech signal
> (3) Call melcepst.m to calculate MFCC parameters
> (4) Save the parameters in a structure
>
>
> Example:
>     for i=1:10
>         fname = sprintf('%da.wav',i-1);
>         x = wavread(fname);
>         [x1 x2] = vad(x);
>         x = filter([1 -0.9375], 1, x);
>         m = melcepst(x,8000,'M',12,24,256,80);
>         m = m(x1:x2,:);
>         ref(i).mfcc = m;
>     end
>    
> 3. Standard DTW
> Use a MxN matrix to save the matching scores, where M and N are length of reference model and test model.
> Click dtw.m to see the source.
>
> 4. Efficient DTW
> Use 2 Mx1 vectors to save accumulate distances. Warping is constrained in a rhombus.
> This is a fast and memory efficient DTW, which is easy to be implemented in DSP system.
>
> Click dtw2.m to see the source.
>
> 5. Example
> Download dtw.zip
> This zip file includes:
>
> dtw.m
> dtw2.m
> vad.m
> testdtw.m
> Download wav.zip and extract it.
> This zip file includes:
>
> 0a.wav ~ 9a.wav, 10 mandarin digits used for reference
> 0b.wav ~ 9b.wav, 10 mandarin digits used for test
> Extract the files to the same folder. To use dtw2.m instead of dtw.m, edit testdtw.m yourself.
> In MATLAB, run testdtw to get the results.
>
>
>     >> testdtw
>     Prepare reference model...
>     Calculate test model...
>     Matching...
>     Result...
>     Test model 1 is recognized as:1
>     Test model 2 is recognized as:2
>     Test model 3 is recognized as:3
>     Test model 4 is recognized as:4
>     Test model 5 is recognized as:5
>     Test model 6 is recognized as:6
>     Test model 7 is recognized as:7
>     Test model 8 is recognized as:8
>     Test model 9 is recognized as:9
>     Test model 10 is recognized as:10
>     >>
>    
> 6. Algorithm Description
> Will be uploaded soon.
>
> --------------------------------------------------------------------------------
>
> Looking for Research Scientist or Postdoc Position in Speech Recognition Area
> Click here for details, thanks!
>
> --------------------------------------------------------------------------------


  » 合 作 伙 伴