`
dengminhui
  • 浏览: 169451 次
  • 来自: ...
社区版块
存档分类
最新评论

使用tesseract破解checkCode

阅读更多

tesseract是一个非常强大的图片识别工具,有较大的几率将图片中的字符抓取出来,在对付验证码上,有较好的效果。使用批处理命令结合这个工具,我们就可以再程序中破解得到我们想要的图片了。

 

import java.awt.image.BufferedImage;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Locale;

import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.metadata.IIOMetadata;
import javax.imageio.stream.ImageInputStream;
import javax.imageio.stream.ImageOutputStream;

import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.sun.media.imageio.plugins.tiff.TIFFImageWriteParam;

public class OCRUtil {

	private static final Log logger = LogFactory.getLog(OCRUtil.class);
	private static final String LANG_OPTION = "-l";
	private static final String EOL = File.separator;
	private static final String IMAGE_FORMAT = "jpg";

	public static String recognizeValidation(InputStream in) throws Exception {
		File tmpFile = File.createTempFile("img", "." + IMAGE_FORMAT);
		OutputStream out = new FileOutputStream(tmpFile);

		IOUtils.copy(in, out);
		IOUtils.closeQuietly(out);
		return format(recognizeText(tmpFile, IMAGE_FORMAT));
	}

	private static String format(String str) {
		if (StringUtils.isBlank(str)) {
			return null;
		}
		StringBuffer sb = new StringBuffer(str.length());
		for (int i = 0; i < str.length(); i++) {
			char c = str.charAt(i);
			if (Character.isDigit(c) || Character.isLetter(c)) {
				sb.append(c);
			}
		}
		return sb.toString();
	}

	public static String recognizeText(File imageFile, String imageFormat)
			throws Exception {
		File tempImage = createImage(imageFile, imageFormat);

		File outputFile = new File(imageFile.getParentFile(), "output");
		StringBuffer strB = new StringBuffer();

		List<String> cmd = new ArrayList<String>();
		cmd.add(SystemUtil.getUserDir() + "tesseract/tesseract.exe");
		cmd.add("");
		cmd.add(outputFile.getName());
		cmd.add(LANG_OPTION);
		cmd.add("eng");

		ProcessBuilder pb = new ProcessBuilder();
		pb.directory(imageFile.getParentFile());

		cmd.set(1, tempImage.getName());
		pb.command(cmd);
		pb.redirectErrorStream(true);
		Process process = pb.start();

		int w = process.waitFor();

		// delete temp working files
		tempImage.delete();

		if (w == 0) {
			BufferedReader in = new BufferedReader(new InputStreamReader(
					new FileInputStream(outputFile.getAbsolutePath() + ".txt"),
					"UTF-8"));

			String str;

			while ((str = in.readLine()) != null) {
				strB.append(str).append(EOL);
			}
			in.close();
		} else {
			String msg;
			switch (w) {
			case 1:
				msg = "Errors accessing files. There may be spaces in your image's filename.";
				break;
			case 29:
				msg = "Cannot recognize the image or its selected region.";
				break;
			case 31:
				msg = "Unsupported image format.";
				break;
			default:
				msg = "Errors occurred.";
			}
			tempImage.delete();
			throw new RuntimeException(msg);
		}

		new File(outputFile.getAbsolutePath() + ".txt").delete();
		// logger.info("图像识别结果:" + strB);
		return strB.toString();
	}

	public static File createImage(File imageFile, String imageFormat) {
		File tempFile = null;
		try {
			Iterator<ImageReader> readers = ImageIO
					.getImageReadersByFormatName(imageFormat);
			ImageReader reader = readers.next();

			ImageInputStream iis = ImageIO.createImageInputStream(imageFile);
			reader.setInput(iis);
			// Read the stream metadata
			IIOMetadata streamMetadata = reader.getStreamMetadata();

			// Set up the writeParam
			TIFFImageWriteParam tiffWriteParam = new TIFFImageWriteParam(
					Locale.US);
			tiffWriteParam.setCompressionMode(ImageWriteParam.MODE_DISABLED);

			// Get tif writer and set output to file
			Iterator<ImageWriter> writers = ImageIO
					.getImageWritersByFormatName("tiff");
			ImageWriter writer = writers.next();

			BufferedImage bi = reader.read(0);
			// bi = new ImageFilter(bi).changeGrey();
			IIOImage image = new IIOImage(bi, null, reader.getImageMetadata(0));
			tempFile = tempImageFile(imageFile);
			ImageOutputStream ios = ImageIO.createImageOutputStream(tempFile);
			writer.setOutput(ios);
			writer.write(streamMetadata, image, tiffWriteParam);
			ios.close();

			writer.dispose();
			reader.dispose();
		} catch (Exception exc) {
			logger.error("异常:", exc);
		}
		return tempFile;
	}

	private static File tempImageFile(File imageFile) {
		String path = imageFile.getPath();
		StringBuffer strB = new StringBuffer(path);
		strB.insert(path.lastIndexOf('.'), 0);
		return new File(strB.toString().replaceFirst("(?<=\\.)(\\w+)$", "tif"));
	}

	public static void main(String[] args) throws Exception {
		 String maybe = recognizeValidation(new	URL("http://passport.360buy.com/ImageVerifier.axd?uid=c360a45f-02b2-4255-8f2e-61191bfc3866").openStream());
		 String maybe2 = new OCRUtil().recognizeText(new  File("c:/1.jpg"), "jpg");
		 System.out.println(maybe2);
	}
}
 

工具见附件。

 

 

分享到:
评论
9 楼 xusong_zidingyi 2012-08-21  
如果在linux下面的话就没有办法运行了,因为linux没有办法运行.exe文件 
8 楼 javajava22 2011-09-10  
Errors accessing files. There may be spaces in your image's filename

老是抛这个错误
7 楼 suncathay 2011-06-23  
qljobs 写道
cmd.add("E:/Workspaces/MyEclipse/web_ocr/tesseract/tesseract.exe");我直接 写的绝对路径,咋个都还报错呢,谢谢!java.lang.RuntimeException: Errors accessing files. There may be spaces in your image's filename.

我也报这个错误
6 楼 zuoxu128 2011-03-02  
TIFFImageWriteParam ,这个类是哪个包的啊
5 楼 qljobs 2010-12-02  
cmd.add("E:/Workspaces/MyEclipse/web_ocr/tesseract/tesseract.exe");我直接 写的绝对路径,咋个都还报错呢,谢谢!java.lang.RuntimeException: Errors accessing files. There may be spaces in your image's filename.
4 楼 herryhaixiao 2010-06-17  
com.sun.media.imageio.plugins.tiff.TIFFImageWriteParam
这个包能提供下么?
3 楼 dengminhui 2010-01-28  
你好,这段代码本身依赖到另外一个类SystemUtil,是我的疏忽
cmd.add(SystemUtil.getUserDir() + "tesseract/tesseract.exe"); 
这句代码只是要找到这个exe执行文件的位置,其中SystemUtil.getUserDir()就是返回的目录,很容易的,如果我把SystemUtil贴出来就又要依赖别的类了,同学你试的时候按照本地的目录写死即可
2 楼 362217990 2010-01-27  
SystemUtil.getUserDir()  这个方法不存在
1 楼 362217990 2010-01-27  
严重: 异常:
java.lang.NullPointerException
at com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(JPEGImageReader.java:309)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.getStreamMetadata(JPEGImageReader.java:886)
at OCRUtil.createImage(OCRUtil.java:133)
at OCRUtil.recognizeText(OCRUtil.java:62)
at OCRUtil.main(OCRUtil.java:172)
java.lang.NullPointerException
at OCRUtil.recognizeText(OCRUtil.java:78)
at OCRUtil.main(OCRUtil.java:172)

怎么回事

相关推荐

Global site tag (gtag.js) - Google Analytics