`
zuoshu
  • 浏览: 195950 次
  • 性别: Icon_minigender_1
  • 来自: 武汉
社区版块
存档分类
最新评论

Android平台使用PocketSphinx做离线语音识别,小范围语音99%识别率

 
阅读更多

 最近语音识别很火,但是都是用的在线语音识别,研究了一下离线语音识别,小范围内的语音识别率还不错,在此记录一下

首先本文要说的两个前提1.android平台离线语音识别 2.小范围语音

小范围语音指的是相对固定的命令。本文的例子大概实现了20条语音命令,超出范围的无法识别。因此本文中离线语音的使用范围也有限,对于一些固定的输入可能有用,比如用语音命令代替打开,播放,重启这些简单的固定的命令。

先上个例子

1.按照 http://leiwuluan.iteye.com/blog/1287305 的方法,先跑一个PocketSphinxDemo的例子起来。跑起来之后会发现语音识别率很低,大概20%不到。下面来优化一下

2.编写自己的命令集

<s>百度</s>
<s>谷歌</s>
<s>音乐</s>
<s>抬头</s>
<s>低头</s>

保存为command.txt

http://www.speech.cs.cmu.edu/tools/lmtool.html上点Browse,提交command.txt,在线生成语言模型文件。这里只要生成的lm文件,命名为test.lm。从这里下载pocketsphinx-win32,解压后在/model/lm/zh_cn有个mandarin_notone.dic的文件,打开后,搜索command.txt里面的词,然后替换相应的内容,替换后的内容如下

 存为test.dic

3.替换语言模型文件。下载附件中的data.zip,解压后文件如下(之前附件中的data不可用,已更新)

分别放到一下目录

/sdcard/Android/data/test/hmm/tdt_sc_8k
/sdcard/Android/data/test/lm/test.dic
/sdcard/Android/data/test/lm/test.lm

如果要换目录的话,对应修改RecognizerTask.RecognizerTask()里的如下代码

  c.setString("-hmm", "/sdcard/Android/data/test/hmm/tdt_sc_8k");
  c.setString("-dict", "/sdcard/Android/data/test/lm/test.dic");
  c.setString("-lm", "/sdcard/Android/data/test/lm/test.lm");

lm和dic文件即3中生成的文件,tdt_sc_8k也可以从这里下载。

4.文件准备完毕,重新跑1中的demo。语音输入2中的命令,识别率99%以上,但是输入命令集以外的无法识别。



 

6.附件为工程文件,将data解压,按照3里面写的位置放到sd卡里面即可。以下字典内的词可以识别

 

 

  • 大小: 4.6 KB
  • 大小: 13.4 KB
  • 大小: 29.1 KB
  • 大小: 5.5 KB
分享到:
评论
18 楼 纯洁的坏蛋 2012-11-04  
你好 求教育  我跑起来识别不了。。。。log显示 'EH'这几个字母识别不了  所以忽略了左转 右转那些命令
17 楼 gcj55586 2012-10-25  
你好 你给的两个网址打开后看到的内容和你说的都不一样啊 提交command.txt文件不是点击Browse,pocketsphinx下面没有model/hmm/zh/tdt_sc_8k目录。
16 楼 alen_feng 2012-08-16  
如 10 楼所言

test.dic 文件内容 如下,

<S>RIGHT</S> S R AY T S
<S>右转</S> EH S EH S
<S>向右转</S> EH S EH S
<S>向左转</S> EH S EH S
<S>左转</S> EH S EH S
<S>打开</S> EH S EH S
<S>搜索</S> EH S EH S
<S>播放</S> EH S EH S

是不是这里错了?
15 楼 alen_feng 2012-08-16  
执行后的  pocketsphinx.log  日志信息 ,一个也没有识别出来。 念的是 向右转 , 123 等等内容。

请 LZ 帮忙看下, 日志是否有错误 ?


INFO: acmod.c(242): Parsed model-specific feature parameters from /sdcard/Android/data/test/hmm/tdt_sc_8k/feat.params
INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition: /sdcard/Android/data/test/hmm/tdt_sc_8k/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(330): Reading binary model definition: /sdcard/Android/data/test/hmm/tdt_sc_8k/mdef
INFO: bin_mdef.c(507): 70 CI-phone, 65021 CD-phone, 3 emitstate/phone, 210 CI-sen, 5210 Sen, 11271 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /sdcard/Android/data/test/hmm/tdt_sc_8k/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/test/hmm/tdt_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/test/hmm/tdt_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(908): Loading senones from dump file /sdcard/Android/data/test/hmm/tdt_sc_8k/sendump
INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(105): State beam -230231 Phone exit beam -115115 Insertion penalty 0
INFO: dict.c(306): Allocating 4111 * 20 bytes (80 KiB) for word entries
INFO: dict.c(321): Reading main dictionary: /sdcard/Android/data/test/lm/test.dic
ERROR: "dict.c", line 194: Line 1: Phone 'S' is mising in the acoustic model; word '<S>RIGHT</S>' ignored
ERROR: "dict.c", line 194: Line 2: Phone 'EH' is mising in the acoustic model; word '<S>右转</S>' ignored
ERROR: "dict.c", line 194: Line 3: Phone 'EH' is mising in the acoustic model; word '<S>向右转</S>' ignored
ERROR: "dict.c", line 194: Line 4: Phone 'EH' is mising in the acoustic model; word '<S>向左转</S>' ignored
ERROR: "dict.c", line 194: Line 5: Phone 'EH' is mising in the acoustic model; word '<S>左转</S>' ignored
ERROR: "dict.c", line 194: Line 6: Phone 'EH' is mising in the acoustic model; word '<S>打开</S>' ignored
ERROR: "dict.c", line 194: Line 7: Phone 'EH' is mising in the acoustic model; word '<S>搜索</S>' ignored
ERROR: "dict.c", line 194: Line 8: Phone 'EH' is mising in the acoustic model; word '<S>播放</S>' ignored
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(324): 0 words read
INFO: dict.c(330): Reading filler dictionary: /sdcard/Android/data/test/hmm/tdt_sc_8k/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 7 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 59080 bytes (57 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 59080 bytes (57 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=10, 2=16, 3=8
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516):       10 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533):       16 = #bigrams created
INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555):        8 = #trigrams created
INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 0 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 8 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 8 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary
INFO: ngram_search_fwdtree.c(338): after: 0 root, 0 non-root channels, 7 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: pocketsphinx.c(299): zslog use ngs
14 楼 porenasckx 2012-08-06  
LZ项目运行不起来啊!

08-06 17:18:53.457: I/AndroidRuntime(4060): Heap size: -Xmx32m
08-06 17:18:54.167: W/ActivityManager(185): No content provider found for: 
08-06 17:18:54.197: I/PackageHelper(3197): Size of container 3 MB 1244286 bytes
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /system/bin/newfs_msdos: warning, /dev/block/dm-1 is not a character device
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /system/bin/newfs_msdos: Skipping mount checks
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): Bogus heads from kernel - setting sane value
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): Bogus sectors from kernel - setting sane value
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /dev/block/dm-1: 6248 sectors in 781 FAT32 clusters (4096 bytes/cluster)
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): bps=512 spc=8 res=32 nft=2 sec=6300 mid=0xf0 spt=63 hds=64 hid=0 bspf=7 rdcl=2 infs=1 bkbs=2
08-06 17:18:54.317: I/Vold(131): Filesystem formatted OK
08-06 17:18:54.327: I/PackageHelper(3197): Created secure container smdl2tmp1 at /mnt/asec/smdl2tmp1
08-06 17:18:54.327: I/DefContainer(3197): Created container for smdl2tmp1 at path : /mnt/asec/smdl2tmp1
08-06 17:18:54.748: I/DefContainer(3197): Copied /data/local/tmp/edu.cmu.pocketsphinx.demo.PocketSphinxDemo.apk to /mnt/asec/smdl2tmp1/pkg.apk
08-06 17:18:54.768: I/DefContainer(3197): Finalized container smdl2tmp1
08-06 17:18:54.778: I/DefContainer(3197): Unmounting smdl2tmp1 at path /mnt/asec/smdl2tmp1
08-06 17:18:55.088: W/ActivityManager(185): No content provider found for: 
08-06 17:18:55.909: W/PackageManager(185): Mounting container edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.390: I/PackageManager(185): Succesfully renamed smdl2tmp1 to edu.cmu.pocketsphinx.demo-2 at new path: /mnt/asec/edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.390: I/PackageHelper(185): Forcibly destroying container edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.810: I/AndroidRuntime(4060): NOTE: attach of thread 'Binder Thread #3' failed
13 楼 zuoshu 2012-06-25  
272426068 写道
如果没有装NDK和下载https://nodeload.github.com/cjac/cmusphinx/zipball/trunk
这个是不是运行不了DEMO?

是的,需要下载ndk编译工程生成apk
12 楼 272426068 2012-06-04  
如果没有装NDK和下载https://nodeload.github.com/cjac/cmusphinx/zipball/trunk
这个是不是运行不了DEMO?
11 楼 rtygbwwwerr 2012-05-31  
用楼主的方法和字典文件试了下,没成功,一个字都不能识别。。。。。。
10 楼 rtygbwwwerr 2012-05-31  
1929.dic 这个文件里边 每个中文对应的值都是:EH S EH S   ,是不是有问题?
9 楼 rtygbwwwerr 2012-05-31  
楼主,这个字典文件你自己用过吗?
8 楼 ganfei1983 2012-05-23  
HI zuoshu  我留下我QQ  704977332  我想和你交流一下  可以吗? 多谢
7 楼 cs3230524 2012-04-21  
zuoshu 写道
cs3230524 写道
楼主,你所述的方法根本不能用于生成中文声学模型,得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。

是无法生成声学文件,声学文件用的还是默认的,改变的是语言模型和字典,缩小搜索范围达到提高识别率的目的


用lmtool在线生成语言模型无法应用于中文。

log file:
WARN> cannot access hand-tuned dictionary file: ./8563.hdict // 8563.hdict
I think this is a non-word: 不倒翁
pronounce: verbosity is 1
I think this is a non-word: 东西
I think this is a non-word: 苹果
I think this is a non-word: 显示器
I think this is a non-word: 一年

6 楼 zuoshu 2012-04-21  
cs3230524 写道
楼主,你所述的方法根本不能用于生成中文声学模型,得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。

是无法生成声学文件,声学文件用的还是默认的,改变的是语言模型和字典,缩小搜索范围达到提高识别率的目的
5 楼 cs3230524 2012-04-17  
楼主,你所述的方法根本不能用于生成中文声学模型,得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。
4 楼 cs3230524 2012-04-17  
亲!你工程里面的路径是/sdcard/Android/data/zuoshu/..怪不得我一直报InputDispatcher异常
3 楼 cs3230524 2012-04-17  
请问tdt_sc_8k 里面是什么?
2 楼 zuoshu 2012-04-16  
zhoudong123 写道
怎么报错?

Log贴来看看?运行的前提是先跑通PocketSphinxDemo,见1
1 楼 zhoudong123 2012-04-16  
怎么报错?

相关推荐

Global site tag (gtag.js) - Google Analytics