Android平台使用PocketSphinx做离线语音识别，小范围语音99%识别率 -

zuoshu

浏览: 195950 次
性别:
来自: 武汉

最近访客更多访客>>

mical_chen

wulei1988

jie312808288

cjsjason

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Android平台使用PocketSphinx做离线语音识别，小范围语音99%识别率

博客分类：

Android
语音识别

最近语音识别很火，但是都是用的在线语音识别，研究了一下离线语音识别，小范围内的语音识别率还不错，在此记录一下

首先本文要说的两个前提1.android平台离线语音识别 2.小范围语音

小范围语音指的是相对固定的命令。本文的例子大概实现了20条语音命令，超出范围的无法识别。因此本文中离线语音的使用范围也有限，对于一些固定的输入可能有用，比如用语音命令代替打开，播放，重启这些简单的固定的命令。

先上个例子

1.按照 http://leiwuluan.iteye.com/blog/1287305 的方法，先跑一个PocketSphinxDemo的例子起来。跑起来之后会发现语音识别率很低，大概20%不到。下面来优化一下

2.编写自己的命令集

保存为command.txt

在http://www.speech.cs.cmu.edu/tools/lmtool.html上点Browse，提交command.txt，在线生成语言模型文件。这里只要生成的lm文件,命名为test.lm。从这里下载pocketsphinx-win32,解压后在/model/lm/zh_cn有个mandarin_notone.dic的文件，打开后，搜索command.txt里面的词，然后替换相应的内容，替换后的内容如下

存为test.dic

3.替换语言模型文件。下载附件中的data.zip，解压后文件如下（之前附件中的data不可用，已更新）

分别放到一下目录

/sdcard/Android/data/test/hmm/tdt_sc_8k
/sdcard/Android/data/test/lm/test.dic
/sdcard/Android/data/test/lm/test.lm

如果要换目录的话，对应修改RecognizerTask.RecognizerTask()里的如下代码

  c.setString("-hmm", "/sdcard/Android/data/test/hmm/tdt_sc_8k");
  c.setString("-dict", "/sdcard/Android/data/test/lm/test.dic");
  c.setString("-lm", "/sdcard/Android/data/test/lm/test.lm");

lm和dic文件即3中生成的文件，tdt_sc_8k也可以从这里下载。

4.文件准备完毕，重新跑1中的demo。语音输入2中的命令，识别率99%以上，但是输入命令集以外的无法识别。

6.附件为工程文件，将data解压，按照3里面写的位置放到sd卡里面即可。以下字典内的词可以识别

data.zip (2.1 MB)
下载次数: 2142

PocketSphinxAndroidDemo.rar (1.1 MB)
下载次数: 1242

查看图片附件

分享到：

android 4.0 屏蔽home键实现 | Android本地语音识别引擎PocketSphinx-语 ...

2012-03-26 11:17
浏览 60892
评论(58)
分类:移动开发
查看更多

18 楼纯洁的坏蛋 2012-11-04

你好求教育我跑起来识别不了。。。。log显示 'EH'这几个字母识别不了所以忽略了左转右转那些命令

17 楼 gcj55586 2012-10-25

你好你给的两个网址打开后看到的内容和你说的都不一样啊提交command.txt文件不是点击Browse，pocketsphinx下面没有model/hmm/zh/tdt_sc_8k目录。

16 楼 alen_feng 2012-08-16

如 10 楼所言

test.dic 文件内容如下，

<S>RIGHT</S> S R AY T S
<S>右转</S> EH S EH S
<S>向右转</S> EH S EH S
<S>向左转</S> EH S EH S
<S>左转</S> EH S EH S
<S>打开</S> EH S EH S
<S>搜索</S> EH S EH S
<S>播放</S> EH S EH S

是不是这里错了？

15 楼 alen_feng 2012-08-16

执行后的 pocketsphinx.log 日志信息，一个也没有识别出来。念的是向右转 , 123 等等内容。

请 LZ 帮忙看下，日志是否有错误？

INFO: acmod.c(242): Parsed model-specific feature parameters from /sdcard/Android/data/test/hmm/tdt_sc_8k/feat.params
INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition: /sdcard/Android/data/test/hmm/tdt_sc_8k/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(330): Reading binary model definition: /sdcard/Android/data/test/hmm/tdt_sc_8k/mdef
INFO: bin_mdef.c(507): 70 CI-phone, 65021 CD-phone, 3 emitstate/phone, 210 CI-sen, 5210 Sen, 11271 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /sdcard/Android/data/test/hmm/tdt_sc_8k/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/test/hmm/tdt_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /sdcard/Android/data/test/hmm/tdt_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(908): Loading senones from dump file /sdcard/Android/data/test/hmm/tdt_sc_8k/sendump
INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: phone_loop_search.c(105): State beam -230231 Phone exit beam -115115 Insertion penalty 0
INFO: dict.c(306): Allocating 4111 * 20 bytes (80 KiB) for word entries
INFO: dict.c(321): Reading main dictionary: /sdcard/Android/data/test/lm/test.dic
ERROR: "dict.c", line 194: Line 1: Phone 'S' is mising in the acoustic model; word '<S>RIGHT</S>' ignored
ERROR: "dict.c", line 194: Line 2: Phone 'EH' is mising in the acoustic model; word '<S>右转</S>' ignored
ERROR: "dict.c", line 194: Line 3: Phone 'EH' is mising in the acoustic model; word '<S>向右转</S>' ignored
ERROR: "dict.c", line 194: Line 4: Phone 'EH' is mising in the acoustic model; word '<S>向左转</S>' ignored
ERROR: "dict.c", line 194: Line 5: Phone 'EH' is mising in the acoustic model; word '<S>左转</S>' ignored
ERROR: "dict.c", line 194: Line 6: Phone 'EH' is mising in the acoustic model; word '<S>打开</S>' ignored
ERROR: "dict.c", line 194: Line 7: Phone 'EH' is mising in the acoustic model; word '<S>搜索</S>' ignored
ERROR: "dict.c", line 194: Line 8: Phone 'EH' is mising in the acoustic model; word '<S>播放</S>' ignored
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(324): 0 words read
INFO: dict.c(330): Reading filler dictionary: /sdcard/Android/data/test/hmm/tdt_sc_8k/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(333): 7 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 70^3 * 2 bytes (669 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 59080 bytes (57 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 59080 bytes (57 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=10, 2=16, 3=8
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516):       10 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533):       16 = #bigrams created
INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555):        8 = #trigrams created
INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 0 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 8 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 8 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary
INFO: ngram_search_fwdtree.c(338): after: 0 root, 0 non-root channels, 7 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: pocketsphinx.c(299): zslog use ngs

14 楼 porenasckx 2012-08-06

LZ项目运行不起来啊！

08-06 17:18:53.457: I/AndroidRuntime(4060): Heap size: -Xmx32m
08-06 17:18:54.167: W/ActivityManager(185): No content provider found for:
08-06 17:18:54.197: I/PackageHelper(3197): Size of container 3 MB 1244286 bytes
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /system/bin/newfs_msdos: warning, /dev/block/dm-1 is not a character device
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /system/bin/newfs_msdos: Skipping mount checks
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): Bogus heads from kernel - setting sane value
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): Bogus sectors from kernel - setting sane value
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): /dev/block/dm-1: 6248 sectors in 781 FAT32 clusters (4096 bytes/cluster)
08-06 17:18:54.287: I//system/bin/newfs_msdos(131): bps=512 spc=8 res=32 nft=2 sec=6300 mid=0xf0 spt=63 hds=64 hid=0 bspf=7 rdcl=2 infs=1 bkbs=2
08-06 17:18:54.317: I/Vold(131): Filesystem formatted OK
08-06 17:18:54.327: I/PackageHelper(3197): Created secure container smdl2tmp1 at /mnt/asec/smdl2tmp1
08-06 17:18:54.327: I/DefContainer(3197): Created container for smdl2tmp1 at path : /mnt/asec/smdl2tmp1
08-06 17:18:54.748: I/DefContainer(3197): Copied /data/local/tmp/edu.cmu.pocketsphinx.demo.PocketSphinxDemo.apk to /mnt/asec/smdl2tmp1/pkg.apk
08-06 17:18:54.768: I/DefContainer(3197): Finalized container smdl2tmp1
08-06 17:18:54.778: I/DefContainer(3197): Unmounting smdl2tmp1 at path /mnt/asec/smdl2tmp1
08-06 17:18:55.088: W/ActivityManager(185): No content provider found for:
08-06 17:18:55.909: W/PackageManager(185): Mounting container edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.390: I/PackageManager(185): Succesfully renamed smdl2tmp1 to edu.cmu.pocketsphinx.demo-2 at new path: /mnt/asec/edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.390: I/PackageHelper(185): Forcibly destroying container edu.cmu.pocketsphinx.demo-2
08-06 17:18:56.810: I/AndroidRuntime(4060): NOTE: attach of thread 'Binder Thread #3' failed

13 楼 zuoshu 2012-06-25

272426068 写道

如果没有装NDK和下载https://nodeload.github.com/cjac/cmusphinx/zipball/trunk
这个是不是运行不了DEMO？

是的，需要下载ndk编译工程生成apk

12 楼 272426068 2012-06-04

如果没有装NDK和下载https://nodeload.github.com/cjac/cmusphinx/zipball/trunk
这个是不是运行不了DEMO？

11 楼 rtygbwwwerr 2012-05-31

用楼主的方法和字典文件试了下，没成功，一个字都不能识别。。。。。。

10 楼 rtygbwwwerr 2012-05-31

1929.dic 这个文件里边每个中文对应的值都是：EH S EH S ，是不是有问题？

9 楼 rtygbwwwerr 2012-05-31

楼主，这个字典文件你自己用过吗？

8 楼 ganfei1983 2012-05-23

HI zuoshu 我留下我QQ 704977332 我想和你交流一下可以吗？多谢

7 楼 cs3230524 2012-04-21

zuoshu 写道

cs3230524 写道

楼主，你所述的方法根本不能用于生成中文声学模型，得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。

是无法生成声学文件，声学文件用的还是默认的，改变的是语言模型和字典，缩小搜索范围达到提高识别率的目的

用lmtool在线生成语言模型无法应用于中文。

log file：
WARN> cannot access hand-tuned dictionary file: ./8563.hdict // 8563.hdict
I think this is a non-word: 不倒翁
pronounce: verbosity is 1
I think this is a non-word: 东西
I think this is a non-word: 苹果
I think this is a non-word: 显示器
I think this is a non-word: 一年

6 楼 zuoshu 2012-04-21

cs3230524 写道

楼主，你所述的方法根本不能用于生成中文声学模型，得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。

是无法生成声学文件，声学文件用的还是默认的，改变的是语言模型和字典，缩小搜索范围达到提高识别率的目的

5 楼 cs3230524 2012-04-17

楼主，你所述的方法根本不能用于生成中文声学模型，得用CMUCLMTK自己做。。。不知道你是怎么成功的。。。

4 楼 cs3230524 2012-04-17

亲！你工程里面的路径是/sdcard/Android/data/zuoshu/..怪不得我一直报InputDispatcher异常

3 楼 cs3230524 2012-04-17

请问tdt_sc_8k 里面是什么？

2 楼 zuoshu 2012-04-16

zhoudong123 写道

怎么报错？

Log贴来看看？运行的前提是先跑通PocketSphinxDemo，见1

1 楼 zhoudong123 2012-04-16

怎么报错？

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Android平台使用PocketSphinx做离线语音识别，小范围语音99%识别率

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Android平台使用PocketSphinx做离线语音识别，小范围语音99%识别率

评论

发表评论

相关推荐

Android调用webservice代码生成工具

科大讯飞开发包进行代码混淆后不可用

调用Google手机版语音识别服务

Andriod逆向学习笔记

java aes cfb 256加密

android跨进程事件注入(程序模拟用户输入)

android监听软键盘+吐槽！

google streaming voice recognition分析

android+mina 开发环境搭建

Android本地语音识别引擎PocketSphinx-语言建模

Android 源码开发调试方法

最近访客更多访客>>