- 浏览: 180217 次
- 性别:
- 来自: 上海
文章分类
- 全部博客 (174)
- rails (25)
- js (15)
- ruby (30)
- webserver (5)
- mysql (13)
- security (5)
- thinking (5)
- common sense (2)
- linux (18)
- android (26)
- web browser (1)
- config and deploy (1)
- mac (5)
- css (2)
- db (8)
- version manager (1)
- editor (1)
- job (1)
- OOA (1)
- php (1)
- apache (2)
- mongrel (1)
- Mongodb (1)
- facebook (1)
- 架构 (1)
- 高并发 (1)
- twitter (1)
- Erlang (1)
- Scala (1)
- Lua (1)
- ubuntu (3)
- cache (1)
- 面试题 (2)
- android layout (2)
- android控件属性 (2)
- java (5)
- customize view (1)
- advanced (2)
- python (2)
- 机器学习 (5)
最新评论
Decoding CAPTCHA's
extract captcha image
OCR (Optical Character Recognition) is pretty accurate these days and can easily read printed text.
rails ocr
ruby ocr
break google captcha
http://stackoverflow.com/search?q=rails+ocr
http://www.wausita.com/captcha/
-----------------------------------------------------------
1.tesseract-x.xx.tar.gz contains all the source code.
2.tesseract-2.xx.<lang>.tar.gz contains the Tesseract 2 language data files for <lang>. You need at least one of these or tesseract 2 will not work.
3. <lang>.traineddata.gz contains the Tesseract 3 language data file for <lang>. You need at least one of these or tesseract 3 will not work.
4.Note that tesseract-2.04.tar.gz unpacks to the tesseract-2.04 directory.
tesseract-2.01.<lang>.tar.gz unpacks to the tessdata directory which belongs inside your tesseract-2.04 directory. It is therefore best to download them
into your tesseract-2.04 directory, so you can use unpack here or equivalent.
You can unpack as many of the language packs as you care to, as they all
contain different files. Note that if you are using make install you should
unpack your language data to your source tree before you run make install.
If you unpack them as root to the destination directory of make install,
then the user ids and access permissions might be messed up.
If they are not already installed, you need the following libraries (Ubuntu):
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlibg-dev
E: 无法找到软件包 zlibg-dev => download source
sudo apt-get install zlib1g-dev
download Leptonica from http://www.leptonica.org/source/leptonlib-1.67.tar.gz
tar zxvf leptonlib-1.67.tar.gz
You also need to install Leptonica. There is an apt-get package (name unknown), or the sources are at http://www.leptonica.org/. The instructions at Leptonica README are clear, but basically it is the usual
./configure
make
sudo make install
sudo ldconfig
Now back to Tesseract. Download the source from svn:
svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only
or package tesseract-3.00.tar.gz from download page. The same build process as usual applies:
http://code.google.com/p/tesseract-ocr/downloads/list
./runautoconf
./configure
make
sudo make install
sudo vi /etc/profile
vi ~/.bashrc
gunzip FileName.gz
1. Download langugage data file (e.g. 'wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz')
2. Decompress it ('gzip -d eng.traineddata.gz')
3. Move it to instalation tessdata (e.g. 'mv eng.traineddata $TESSDATA_PREFIX' if defined TESSDATA_PREFIX)
You may still get an error when trying to run tesseract:
$ tesseract foo.png bar
tesseract: error while loading shared libraries: libtesseract_api.so.3 cannot open shared object file: No such file or directory
You need to update the cache for the runtime linker. The following should get you up and running:
$ sudo ldconfig
--------------------------------------------------
copy eng.traineddata to /usr/local/share/tessdata
pwd
/usr/local/share/tessdata
ls
configs eng.traineddata tessconfigs
-------------------------------------------------
tesseract digit only
improve tesseract digits accuracy
use tesseract to get plain ascii text out of the bitmap.
`curl 'http://www.stc.gov.cn/search/image_code.asp?rnd=0.7641146600113322' > /home/simon/Desktop/weizh/ca.jpg`
tesseract ca.bmp outputbase -l eng
more outputbase.txt
tesseract ca.bmp outputbase nobatch digits
more outputbase.txt
only support jpg:
curl 'http://www.stc.gov.cn/search/image_code.asp?rnd=0.7641146600111234' > ca.jpg
tesseract ca.jpg outputbase nobatch digits
cat outputbase.txt
Reloading /etc/profile
source ~/.profile
$ source /etc/profile
.profile settings overwrite those in /etc/profile. You can also use .bash_profile in your home directory to customize your bash shell's profile.
Basically, if you need to load shell variables from any file just run the .
(dot) command, followed by space and (the absolute path is necessary) the path
to the file. (Be carefull what file you're loading variables from because
you meight overwrite some important environment variables and your system
could become unstable).
$ tesseract wenzhou.jpeg outputbase -l eng
Error openning data file /usr/local/sharetessdata/eng.traineddata
=> cp eng.traineddata to /usr/local/sharetessdata
cd /home/simon/Desktop/weizh
curl 'http://117.36.53.122:9081/wfcx/servlet/ValidateCodeServlet?t=1304472587796' > xian.png
tesseract xian.png out /usr/local/share/tessdata/tessconfigs/nobatch /usr/local/share/tessdata/configs/digits
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<script>
alert("验证码错误!");
window.close();
</script>
</head>
</html>
curl --cookie-jar newcookies.txt 'http://117.36.53.122:9081/wfcx/servlet/ValidateCodeServlet?t=1304494360513' > xian.png
curl --cookie newcookies.txt 'http://117.36.53.122:9081/wfcx/query.do?actiontype=vioSurveil&vcode=2148&hpzl=02&hphm=AUL695&tj=CLSBDH&tj_val=LFV2A11GX93178557'
tesseract xian.png out /usr/local/share/tessdata/tessconfigs/nobatch /usr/local/share/tessdata/configs/digits
-----------------------------------
cd /usr/local/sharetessdata:
eng.traineddata
/usr/local/share/tessdata:
chi_sim.traineddata
configs
eng.traineddata
tessconfigs
-----------------------------------
$ sudo apt-get install imagemagick
$ dpkg -l |grep imagemagick
imagemagick
imagemagick-doc
$ convert
$ whereis convert
$ which is convert
$ convert -compress none -depth 8 -alpha off zhejiang.gif zhejiang.tif
enlarge the image can improve ocr accuracy
I believe the real challenge to apply ocr for plate recognition is
that the plate image are "too dirty" comparing to paper documents.
There are frames, skews, un-even shadows, etc. You have to do your own
work to parse the plate into separate chars and feed the ocr engine. I
don't think tesseract itself can handle this automatically given the
raw image. But I believe it will do pretty well once you get the
binarized separate chars. Basically, plate recognition is more a image
processing problem than ocr problem.
You can use the grammar as post-process to make corrections.
to convert the pdf I used Image Magick convert application. bellow the set command that I use.
convert -density 288 src.pdf -colorspace Gray -depth 8 -alpha off tmp.tif
tesseract tmp.tif out.txt
how to eliminate noise
发表评论
-
git命令
2015-06-06 15:05 809git命令: man git例如:工作目录下有个zh目录, ... -
搭建git服务器
2015-06-05 10:32 597原文:http://blog.chinaunix.net/ ... -
ubuntu下SVN服务器安装配置
2015-06-04 20:34 488一、SVN安装1.安装包$ sudo apt-get inst ... -
eth0 Device not found
2014-05-03 20:38 2653查看CPU信息(型号)# cat /proc/cpuinf ... -
webserver负载均衡
2012-03-29 16:11 880LVS是Linux Virtual Server的缩写,意思是 ... -
sed命令
2012-03-16 17:05 809------------------------------- ... -
安装apt应用
2012-03-16 16:07 765sudo apt-get install google-ch ... -
ubuntu下安装mongoDB
2011-09-08 00:05 1141ubuntu下安装mongoDB $ id sim ... -
重要概念
2011-07-21 20:04 728原文:http://bbs.chinaitlab.com/vi ... -
定时任务
2011-06-08 18:21 932crontab crontab log Redhat (R ... -
ubuntu file encoding
2011-05-24 18:02 961ubuntu file encoding sudo apt- ... -
config ssh auto login
2011-03-29 23:22 1262http://baike.baidu.com/view/161 ... -
配置CentOS
2011-03-19 18:19 1009root帐号登录服务器 查看版本 cat /etc/iss ... -
linux commands
2011-03-19 18:04 829最基本的是cat、more和less。 1. ... -
Linux系统命令Top/free
2011-03-19 18:02 1109Defunct processes are corrupted ... -
vi基本命令
2011-03-19 17:40 1014* ★命 ... -
mount命令挂载共享文件
2011-01-18 10:55 1728机器重启 网络共享功能失效 必须重新mount ...
相关推荐
在项目中,`CaptchaImage`可能是一个包含Captcha图像生成代码的文件或者是一个类库,包含了创建和显示验证码的完整逻辑。开发者可以将这个模块集成到自己的ASP.NET应用程序中,以提供安全的用户验证功能。 总的来说...
Linux下captchaimage-1.4安装包 python-captchaimage is a fast and easy to use Python extension for creating images with distorted text that are easy for humans and difficult for computers to read.
: " :copyright: Dhruv " , " font " : " arial.ttf " , " img_url " : " https://Captcha-Image-Api.dhruvnation1.repl.co/captchame/FkciuPXxCnJ5d9Dyg4UA2Dr6d4e5cPWla9A2eABEp0ZdSYs4bmFIVab5iCg "} Dhruv...
php验证码
进一步配置 image captcha 模块,进入 "Configuration" > "People" > "CAPTCHA" > "Image CAPTCHA"。你可以自定义验证码的特性,如在 "Characters to use in the code" 中设定验证码的字符集,"Codelength" 设置...
总结起来,"Zend_captcha_image点击刷新图片验证码(dojo_ajax)"涉及到的技术包括PHP的Zend Framework用于创建和管理验证码,利用Dojo进行前端交互,以及Ajax实现无刷新的图像刷新和验证。这种组合提供了高效且安全的...
在本资源包"captcha.rar"中,我们可以找到与Python编程语言相关的验证码实现和处理工具。 Python是一种高级编程语言,由于其简洁明了的语法和丰富的库支持,它被广泛用于开发各种应用,包括网络安全领域。在处理...
Captcha breaker can identify the number in captcha image and label them.CNN was trained on custom dataset made out of captcha image
赠送jar包:captcha-1.3.0.jar; 赠送原API文档:captcha-1.3.0-javadoc.jar; 赠送源代码:captcha-1.3.0-sources.jar; 赠送Maven依赖信息文件:captcha-1.3.0.pom; 包含翻译后的API文档:captcha-1.3.0-javadoc-...
cool-php-captcha 是一个很酷的 PHP 用来生成验证码的库。示例代码:session_start();$captcha = new SimpleCaptcha();// Change configuration...//$captcha->... // Change session variable$captcha->CreateImage();
switch($captcha->validate_submit($_POST['image'],$_POST['attempt'])) { // form was submitted with incorrect key case 0: echo '<p><br>Sorry. Your code was incorrect.'; echo ' <br...
captcha 验证码识别
res.type('image/png').send(captcha.image); }); app.post('/validate', async (req, res) => { const { data, solution } = req.body; const isValid = AjCaptcha.validate(data, solution); if (isValid) { ...
python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库python的captcha库
`captcha_src()` 和 `captcha_img()` 是ThinkPHP5框架中的两个重要函数,它们与图片验证码的生成和显示密切相关。本文将详细讲解这两个函数的工作原理以及如何在项目中正确使用它们。 `captcha_src()` 函数是用于...
AJ-Captcha行为验证码是一款用于网站安全验证的工具,版本为1.3.0。这款验证码系统旨在防止自动化脚本或机器人进行恶意操作,如垃圾邮件发送、账户注册、恶意登录等。它通过检测用户在输入验证码时的行为模式来判断...
import cloud.tianai.captcha.spring.annotation.Captcha; import cloud.tianai.captcha.spring.request.CaptchaRequest; import org.springframework.web.bind.annotation.PostMapping; import org.springframework...
###参数s: user defined captcha text c: captcha type 可以在课堂上更改更多设置... ###如何使用它只需调用 captcha.php 文件并传递所需的类型和/或预定义的验证码文本。 captcha.php?s=123456 输出: ...
行为验证码AJ-Captcha 1.3.0是一种用于网络安全验证的解决方案,旨在防止自动化脚本或机器人进行恶意操作。此版本提供了丰富的功能和多种平台的支持,包括前后端交互,以及前端框架如Vue.js、H5、Android、iOS、...
在Laravel框架中,Captcha是一个非常重要的组件,主要用于防止自动化程序(如机器人)进行恶意操作,例如批量注册、垃圾评论等。Captcha通常要求用户输入图像上显示的一串随机字符,以此验证用户是真实的人而不是...