浏览 1958 次
锁定老帖子 主题:自动投票——种种
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2010-11-04
最近闲余时间做了一些功能——主角本来是php的curl extension的,后来因为投票的网站突然加上了验证码,而我实在不会用php写识别部分,就换到了groovy的httpbuilder上了,一样好用,尤其是开了多线程,效率不错,用jconsole监视下,还可以。
下面描述下过程并贴点代码:
很多投票时根据IP和投票间隔时间做限制的,所以——
1. 弄到一些代理的ip/port/schema——这个推荐在http://www.5uproxy.net找。 用URL获取html源码正则匹配取得。
2. 找到最终投票的URL和表单参数,需要post的,需要一些token或额外字段的,弄好。
3. 如果需要验证码的,简单点的推荐去看http://fireinwind.iteye.com/blog/766260,我稍微修改了下,代码如下
import java.awt.Color; import java.awt.image.BufferedImage; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import javax.imageio.ImageIO; public class ImagePreProcess { public static int isWhite(int colorInt) { Color color = new Color(colorInt); if (color.getRed() + color.getGreen() + color.getBlue() > 320) { return 1; } return 0; } public static int isBlack(int colorInt) { Color color = new Color(colorInt); if (color.getRed() + color.getGreen() + color.getBlue() <= 100) { return 1; } return 0; } public static BufferedImage removeBackgroud(InputStream is) throws Exception { BufferedImage img = ImageIO.read(is); int width = img.getWidth(); int height = img.getHeight(); for (int x = 0; x < width; ++x) { for (int y = 0; y < height; ++y) { if (isWhite(img.getRGB(x, y)) == 1) { img.setRGB(x, y, Color.WHITE.getRGB()); } else { img.setRGB(x, y, Color.BLACK.getRGB()); } } } return img; } public static List<BufferedImage> splitImage(BufferedImage img) throws Exception { List<BufferedImage> subImgs = new ArrayList<BufferedImage>(); subImgs.add(img.getSubimage(6, 4, 9, 12)); subImgs.add(img.getSubimage(19, 4, 9, 12)); subImgs.add(img.getSubimage(32, 4, 9, 12)); subImgs.add(img.getSubimage(45, 4, 9, 12)); return subImgs; } public static Map<BufferedImage, String> loadTrainData() throws Exception { Map<BufferedImage, String> map = new HashMap<BufferedImage, String>(); File dir = new File("num"); File[] files = dir.listFiles(); for (File file : files) { map.put(ImageIO.read(file), file.name[0]); } return map; } public static String getSingleCharOcr(BufferedImage img, Map<BufferedImage, String> map) { String result = ""; int width = img.getWidth(); int height = img.getHeight(); int min = width * height; for (BufferedImage bi : map.keySet()) { int count = 0; Label1: for (int x = 0; x < width; ++x) { for (int y = 0; y < height; ++y) { if (isWhite(img.getRGB(x, y)) != isWhite(bi.getRGB(x, y))) { count++; if (count >= min) break Label1; } } } if (count < min) { min = count; result = map.get(bi); } } return result; } public static String getAllOcr(InputStream is) { try { BufferedImage img = removeBackgroud(is); List<BufferedImage> listImg = splitImage(img); Map<BufferedImage, String> map = loadTrainData(); String result = ""; for (BufferedImage bi : listImg) { result += getSingleCharOcr(bi, map); } return result; }catch (ex) { ex.printStackTrace(); return '' }finally { is.close(); } } /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { String ff = "**.jpeg"; String text = getAllOcr(new FileInputStream(ff)); System.out.println(text); } }
其中splitImage方法里,的像素级别的参数,一定在Photoshop或类似软件里看好了; 而且还要准备元图片(jpg)——就是如果是0-9数字类的,就需要10个数字单独的图以和splitImage后的BufferImage做像素级别的模糊匹配度计算。这样就能稍微解决下简单的图片数字识别了——
4. 下面就是Groovy的HttpBuilder做http模拟操作了——至于Groovy的HttpBuilder,看下官方的例子,很容易,你懂的。。。
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.5.0-RC2' ) import groovyx.net.http.* import static groovyx.net.http.ContentType.* import static groovyx.net.http.Method.* def postVote(String line){ if(!line) return File logOkFile = new File("./log/ok.resp.txt") File logErrorFile = new File("./log/error.resp.txt") final String domain = '****' def http = new HTTPBuilder(domain) String[] arr = line.split(':') http.setProxy(arr[0], Integer.parseInt(arr[1]), 'http') String vc = '' // 验证码 try { http.request( GET ) { req -> req.getParams().setParameter("http.socket.timeout", new Integer(10000)) uri.path = 'get_verifycode_url.do' headers.'User-Agent' = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 http://bsalsa.com/ EmbeddedWB- 14.59 from: http://bsalsa.com/ ; Mozilla/4.0(Compatible Mozilla/4.0EmbeddedWB- 14.59 from: http://bsalsa.com/ ; IEShow Toolbar; IEShow stock01ToolBar)' response.success = { resp, reader -> ByteArrayOutputStream bbos = new ByteArrayOutputStream() bbos << reader byte[] bb = bbos.toByteArray() InputStream is = new ByteArrayInputStream(bb) vc = ImagePreProcess.getAllOcr(is) } } if(vc){ // 如果取得了验证码则进行投票 http.request( POST, HTML ) { uri.path = 'target_vote.do' uri.query = [param1:'val1'] headers.'User-Agent' = 'Mozilla/5.0 Ubuntu/8.10 Firefox/3.0.4' response.success = { resp, reader -> logOkFile.append new Date().toString() + ' - ' + reader.text } response.failure = { resp -> logErrorFile.append "Unexpected error: ${resp.statusLine.statusCode} : ${resp.statusLine.reasonPhrase}" } } } }catch (ConnectException ex) { logErrorFile.append 'Connect failed! ' + line }catch (SocketTimeoutException ex) { logErrorFile.append 'Connect timeout! ' + line } } int lineCount = 0 new File("./ip_ll2.txt").eachLine{line -> // 代理ip列表 lineCount++ if(lineCount >= 150 && lineCount < 200){ // 控制使用范围,即开启线程数量 Thread.start('post_thread_' + lineCount){ Random r = new Random() 20.times{tt -> postVote(line) sleep(1000 * r.nextInt(20)) } println 'End thread for ' + lineCount } } }
声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |