论坛首页 入门技术论坛

用HttpClient抓取人人网高校数据库(省,高校,院系三级级联)--更新1

浏览 36930 次
该帖已经被评为新手帖
作者 正文
   发表时间:2010-12-01  
现在的网站越来越复杂,集成的东西越来越多,有些事情httpunit是做不了的,

HttpClient is NOT a browser.
HttpClient's purpose is to transmit and receive HTTP messages.
HttpClient will not attempt to cache content, execute javascript embedded in HTML pages, try to guess content type, or reformat request / redirect location URIs, or other functionality unrelated to the HTTP transport.

如果需要模拟浏览器操作,该使用什么工具?

JWebUnit?

PHP - HttpUnit?

或者是其他的?
0 请登录后投票
   发表时间:2010-12-01  
yang02301 写道
satikey 写道
yang02301 写道
请教一下LZ:

RenRen网可以用附加的程序处理Login,
但是t.qq.com则是使用Get的方法,另附加Cookie,不知道LZ是如何处理的,请赐教。

多谢!


如果是GET方法,可以再后面添加参数例如
t.qq.com?username=xxx&password=xxx
大概这个样子的。你用网页登陆一下。注意浏览器的地址变化吧。
等我有时间了再弄腾讯微博吧。


针对 t.qq.com LOGON:

Step1):

sb = notify("http://ptlogin2.qq.com/check?uin=@hdrive20&appid=46000101&r=0.617148618189815");
System.out.println("Verify Code = '" + sb.substring(18, 22)+ "'");

返回验证码,如:

ptui_checkVC('0','!BMF');

ptui_checkVC存于login_div.js文件中。

问题出现:httpcomponents-client-4.0.3是否可以执行ptui_checkVC?如何执行?


Step2):得到验证码后,驶入口令,点“登录”见后,应该向browser发送get方法,如:
url = "http://ptlogin2.qq.com/login?u=@hdrive20&p=67E5A3B52AE29D6FC6FAFB1587F8D8F3&verifycode=" + sb.substring(18, 22) + "&low_login_enable=1&low_login_hour=720&aid=46000101&u1=http%3A%2F%2Ft.qq.com&ptredirect=1&h=1&from_ui=1&dumy=&fp=loginerroralert";
sb = notify(url);
System.out.println(sb);

返回:
ptuiCB('3','0','','0','您输入的密码有误,请重试。');

没有关系,注意参数‘p’是用户口令经过验证码处理后的数值,本人还不知道如何得到,所以返回错误。

请各位指教






这个还真有意思啊。。嘿嘿。改天有时间研究一下。。最近在弄别的东西了。。
0 请登录后投票
   发表时间:2010-12-01  
网友说,用HttpClient抓取腾讯微博的 数据很难,我想试试。哪些人报名,一起研究一下?
0 请登录后投票
   发表时间:2010-12-01  
satikey 写道
网友说,用HttpClient抓取腾讯微博的 数据很难,我想试试。哪些人报名,一起研究一下?


难点在于用JavaScript写的MD5代码太恶心了(在login_div.js文件中),在ajax_Submit()中生成Password代码段如下:

        if(E[A].name=="p"){
            alert(E.verifycode.value);
            alert(E.p.value)
            var F="";
            F+=E.verifycode.value;
            F=F.toUpperCase();
            B+=md5(md5_3(E.p.value)+F)
        }

E.p.value是实际口令,E.verifycode.value是返回的确认吗,4次使用MD5,
function md5_3(B){
    var A=new Array;

    A=core_md5(A,B.length*chrsz);
 
    A=core_md5(A,16*chrsz);
 
    A=core_md5(A,16*chrsz);
 
    return binl2hex(A);
}

请哪位将MD5翻译好的Java代码贴出来共享一下,谢谢!

0 请登录后投票
   发表时间:2010-12-02  
非常有意思,在Break in t.qq.com的时候发现:当连续LOGON账户多次时,页面会出现要求输入图形认证码(显示代码在JavaScript中,当键入password时向主机请求认证码图形,修改主页HTML代码应该可以屏蔽掉),防止机器人大量Sign In,不过已经实现Java自动Logon t.qq.com功能,随后发帖。

请楼主继续支持如何从t.qq.com自动下载“高校”数据部分。
0 请登录后投票
   发表时间:2010-12-02   最后修改:2010-12-02
yang02301 写道
非常有意思,在Break in t.qq.com的时候发现:当连续LOGON账户多次时,页面会出现要求输入图形认证码(显示代码在JavaScript中,当键入password时向主机请求认证码图形,修改主页HTML代码应该可以屏蔽掉),防止机器人大量Sign In,不过已经实现Java自动Logon t.qq.com功能,随后发帖。

请楼主继续支持如何从t.qq.com自动下载“高校”数据部分。


现附上自动Logon到t.qq.com的Java代码,需要修改username和password,继续努力!


import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import org.apache.http.Header;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import java.security.*;

public class QQNotify {

    private static HttpResponse response;
    private static DefaultHttpClient httpClient;

    public static void main(String[] args) {
        String username = "your username";
        String password = "your password";
        QQNotify notify = new QQNotify(username, password);

        if (true) {
            return;
        }

//        String code = new String(notify.notify("http://s.xnimg.cn/a13819/allunivlist.js"));
//        // 转换16进制的Unicode,
//        StringBuffer sb = new StringBuffer(code);
//        System.out.println(sb.toString());
//        int pos;
//        while ((pos = sb.indexOf("\\u")) > -1) {
//            String tmp = sb.substring(pos, pos + 6);
//            sb.replace(pos, pos + 6, Character.toString((char) Integer.parseInt(tmp.substring(2), 16)));
//        }
//        code = sb.toString();
//        System.out.println(code);

        ///如果你要看下面代码的效果,你只需要 注释掉上面String code  到 System.out.println(code);
        //转换&#xxxxx;形式Unicode
//		String code = new String(notify
//				.notify("http://www.renren.com/GetDep.do?id=13003"));
//		StringBuffer sb=new StringBuffer(code);
//		int pos;
//		while ((pos=sb.indexOf("&#"))>-1) {
//			String tmp=sb.substring(pos+2, pos+7);
//			sb.replace(pos, pos+8, Character.toString((char)Integer.parseInt(tmp,10)));
//		}
//		code=sb.toString();
//		System.out.println(code);
    }

    public QQNotify(String userName, String password) {
        int i;

        this.httpClient = new DefaultHttpClient();

        Header[] headers;
        String url, sb, verifyCode;

        // Step 1: get verify code
        url = "http://ptlogin2.qq.com/check?uin=@" + userName + "&appid=46000101&r=0.617148618189815";
        sb = notify(url);
        i = sb.indexOf("'", 19);
        verifyCode = sb.substring(18, i).toUpperCase();
        System.out.println(sb);
        System.out.println("Verify Code = '" + verifyCode + "'");
        if (!false && verifyCode.length() > 4) {
            System.out.println("It seem you need input graphic verify code manually.");
            System.out.println("Wait a few minutes and try again.");
            System.out.println("Program abort!");
            return;
        }
        
        // Step 2: logon
        //
        // '!UAK' -> '67E5A3B52AE29D6FC6FAFB1587F8D8F3'
        //
        //String str = MD5_3(password) + "!UAK";
        //System.out.println("str = " + str);
        //System.out.println("MD5 = " + MD5(str));
        String str = MD5_3(password) + verifyCode;

        url = "http://ptlogin2.qq.com/login?u=@" +
                userName + "&p=" +
                MD5(str) + "&verifycode=" +
                verifyCode + "&low_login_enable=1&low_login_hour=720&aid=46000101&u1=http%3A%2F%2Ft.qq.com&ptredirect=1&h=1&from_ui=1&dumy=&fp=loginerroralert";
        sb = notify(url);
        System.out.println(sb);
        if (!true) {
            response = getMethod(url);
            System.out.println(response.getStatusLine());//返回302
            headers = response.getAllHeaders();
            for (i = 0; i < headers.length; i++) {
                Header header = headers[i];
                System.out.println(header.getName() + ": " + header.getValue());
            }
            System.out.println("-----------------------------");
        }
        if (true) {
            System.out.println("Already logon to '" + userName + "' @ t.qq.com successfully.");
            System.out.println("Next you need redirect to http://t.qq.com/setting_edu.php, and grap college data.");
            System.out.println("Good luck!");
            return;
        }

        return;
        // 读取跳转的地址
        // String redirectUrl = response.getFirstHeader("Location").getValue();
        // 查看一下跳转过后,都出现哪些内容.
        // response=getMethod(redirectUrl);//函数见后面
        // System.out.println(response.getStatusLine()); // HTTP/1.1 200 OK

        // 读取一下主页都有什么内容 已经登陆进去
        // System.out.println(readHtml("http://www.renren.com/home"));
    }

    // 嗅探指定页面的代码
    public String notify(String url) {
        HttpGet get = new HttpGet(url);
        ResponseHandler<String> responseHandler = new BasicResponseHandler();
        String txt = null;
        try {
            txt = httpClient.execute(get, responseHandler);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return txt;
    }

    // 用post方法向服务器请求 并获得响应,因为post方法要封装参数,因此在函数外部封装好传参
    public HttpResponse postMethod(HttpPost post) {
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(post);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            post.abort();
        }
        return resp;
    }

    // 用get方法向服务器请求 并获得响应
    public HttpResponse getMethod(String url) {
        HttpGet get = new HttpGet(url);
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(get);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return resp;
    }

    private String MD5_3(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            // first time
            byte b[] = md.digest();

            // Second Time
            b = md.digest(b);

            // Third Time
            b = md.digest(b);

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }

    private String MD5(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            byte b[] = md.digest();

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }
}





0 请登录后投票
   发表时间:2010-12-02  


已经搞定,可以自动LOGON,保存Cookies,Redirect URL,得到“高校”列表,得到“院系”列表部分还没有做,应该非常容易的啦,感谢LZ开阔思维!

在发送GET得到“高校”列表时,应该注意造一个请求头,假装使用浏览器。

请LZ在做一个简单的图形码验证LOGON例子,使用QQ所采用的。

最后附上源码,后增加的部分还没有整理,有些乱,请多多包含。



import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.http.Header;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import java.security.*;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.http.cookie.Cookie;

public class QQNotify {

    private static HttpResponse response;
    private static DefaultHttpClient httpClient;
    private static Map<String, String> cookies = new HashMap<String, String>();

    public static void main(String[] args) {
        String username = "your username";
        String password = "your password";
        QQNotify notify = new QQNotify(username, password);

        if (true) {
            return;
        }

//        String code = new String(notify.notify("http://s.xnimg.cn/a13819/allunivlist.js"));
//        // 转换16进制的Unicode,
//        StringBuffer sb = new StringBuffer(code);
//        System.out.println(sb.toString());
//        int pos;
//        while ((pos = sb.indexOf("\\u")) > -1) {
//            String tmp = sb.substring(pos, pos + 6);
//            sb.replace(pos, pos + 6, Character.toString((char) Integer.parseInt(tmp.substring(2), 16)));
//        }
//        code = sb.toString();
//        System.out.println(code);

        ///如果你要看下面代码的效果,你只需要 注释掉上面String code  到 System.out.println(code);
        //转换&#xxxxx;形式Unicode
//		String code = new String(notify
//				.notify("http://www.renren.com/GetDep.do?id=13003"));
//		StringBuffer sb=new StringBuffer(code);
//		int pos;
//		while ((pos=sb.indexOf("&#"))>-1) {
//			String tmp=sb.substring(pos+2, pos+7);
//			sb.replace(pos, pos+8, Character.toString((char)Integer.parseInt(tmp,10)));
//		}
//		code=sb.toString();
//		System.out.println(code);
    }

    public QQNotify(String userName, String password) {
        int i;

        this.httpClient = new DefaultHttpClient();

        Header[] headers;
        String url, sb, verifyCode;

        cookies.clear();

        // Step 1: get verify code
        url = "http://ptlogin2.qq.com/check?uin=@" + userName + "&appid=46000101&r=0.617148618189815";
        sb = notify(url);
        SaveCookies(httpClient.getCookieStore().getCookies());
        i = sb.indexOf("'", 19);
        verifyCode = sb.substring(18, i).toUpperCase();
        System.out.println(sb);
        System.out.println("Verify Code = '" + verifyCode + "'");
        if (!false && verifyCode.length() > 4) {
            System.out.println("It seem you need input graphic verify code manually.");
            System.out.println("Wait a few minutes and try again.");
            System.out.println("Program abort!");
            return;
        }

        // Step 2: logon
        //
        // '!UAK' -> '67E5A3B52AE29D6FC6FAFB1587F8D8F3'
        //
        //String str = MD5_3(password) + "!UAK";
        //System.out.println("str = " + str);
        //System.out.println("MD5 = " + MD5(str));
        String str = MD5_3(password) + verifyCode;

        url = "http://ptlogin2.qq.com/login?u=@"
                + userName + "&p="
                + MD5(str) + "&verifycode="
                + verifyCode + "&low_login_enable=1&low_login_hour=720&aid=46000101&u1=http%3A%2F%2Ft.qq.com&ptredirect=1&h=1&from_ui=1&dumy=&fp=loginerroralert";
        sb = notify(url);
        System.out.println(sb);

        SaveCookies(httpClient.getCookieStore().getCookies());
        PrintCookies();
        /*
ptuiCB('0','0','http://t.qq.com','1','登录成功!');
-------- Cookies begin ---------
Exception in thread "main" java.lang.NullPointerException
 0 : [ptvfsession] = 'a56b05373bffaf65643dbe875a1c9614226d1789c91ddd39134c5289878b087b3f8fd21670efcc430d111b63fa41274f'
 1 : [ptcz] = '06aa93cefb0fec33c298f13fecadb5792b7f7816adb11a5e9423e42cd4456115'
 2 : [skey] = '@na9wdcELd'
 3 : [pt2gguin] = 'o1093457233'
 4 : [lskey] = '00010000a1ac49b4a67ea43dde8d6985bb353584846c27cd4a57d63889c082d45da5540b1f78dc6c9f972099'
 5 : [luin] = 'o1093457233'
 6 : [uin] = 'o1093457233'
 7 : [ptuserinfo] = '6864726976653230'
 8 : [ptisp] = ''
-------- Cookies end ---------
-------- Cookies begin ---------
 0 : [ptvfsession] = 'cbebb4c13f69aaca9dabea361c77de60d0fb02bd9991902a7dc5abd486770613746651e4bbd99faebef2b77466e4649b'
 1 : [ptcz] = 'bf9bd2ac71844eae57221a750a7f5321f4e12bdcb0d7178d654160d175da7f3a'
 2 : [skey] = '@na9wdcELd'
 3 : [pt2gguin] = 'o1093457233'
 4 : [lskey] = '00010000035cf86f252e61d9e8f07aa2c39335e2890f01a2863caaffdb4d9e1aa64f2064ec518ccd9772d333'
 5 : [luin] = 'o1093457233'
 6 : [uin] = 'o1093457233'
 7 : [ptuserinfo] = '6864726976653230'
 8 : [ptisp] = ''
-------- Cookies end ---------
        */
        if (true) {
            // Now get country city list
            // sample get data
/*
GET /asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=& HTTP/1.1
Accept: *//*
Accept-Language: en-us
Referer: http://t.qq.com/setting_edu.php
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)
Host: t.qq.com
Connection: Keep-Alive
Cookie: ptui_loginuin2=hdrive20; mb_reg_from=8; pgv_pvid=9250536308; pgv_flv=10.0; pgv_r_cookie=10113040085471; pt2gguin=o1093457233; ptcz=3ac05d30f3e337e94f4fe93999002a23d31779fc45e7bb8f7d86b7a2e34e548d; o_cookie=1093457233; luin=o1093457233; lskey=00010000a90b1dff0aad30b3b96d163e6118418db84c6da83a219311bcb2ff34758e77cb1e3f57228e0d8522; pgv_info=ssid=s4051841450; verifysession=h0050e18f0403630ce623631bd8e1f0f51760865ce2107df6a4bbd1e10521919986ace31226351179618b6c20640a91a959; ptisp=; uin=o1093457233; skey=@na9wdcELd
p
 tui_loginuin2=hdrive20;
 mb_reg_from=8;
 pgv_pvid=9250536308;
 pgv_flv=10.0;
 pgv_r_cookie=10113040085471;
 pt2gguin=o1093457233;
 ptcz=3ac05d30f3e337e94f4fe93999002a23d31779fc45e7bb8f7d86b7a2e34e548d;
 o_cookie=1093457233;
 luin=o1093457233;
 lskey=00010000a90b1dff0aad30b3b96d163e6118418db84c6da83a219311bcb2ff34758e77cb1e3f57228e0d8522;
 pgv_info=ssid=s4051841450;
 verifysession=h0050e18f0403630ce623631bd8e1f0f51760865ce2107df6a4bbd1e10521919986ace31226351179618b6c20640a91a959;
 ptisp=;
 uin=o1093457233;
 skey=@na9wdcELd
 */

            //String redirectUrl = "http://t.qq.com/setting_edu.php";
            //http://t.qq.com/asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=&
//中国_北京
String redirectUrl = "http://t.qq.com/asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=&";
//美国
//String redirectUrl = "http://t.qq.com/asyn/schoolist.php?type=4&key=%E7%BE%8E%E5%9B%BD&letter=&";
HttpGet get = new HttpGet(redirectUrl);
get.setHeader("Accept", "*/*");
get.setHeader("Accept-Language", "en-us");
get.setHeader("Referer", "http://t.qq.com/setting_edu.php");
get.setHeader("User-Agent", "gzip, deflate");
get.setHeader("Accept-Language", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)");
get.setHeader("Host", "t.qq.com");
get.setHeader("Connection", "Keep-Alive");
            try {
                sb = httpClient.execute(get, new BasicResponseHandler());
            } catch (IOException ex) {
                Logger.getLogger(QQNotify.class.getName()).log(Level.SEVERE, null, ex);
            }
            //sb = notify(redirectUrl);
            System.out.println(sb);

        String regex2 = "title=\"(.*?)\">";
        Pattern pattern2 = Pattern.compile(regex2);
        Matcher matcher2 = pattern2.matcher(sb);
        while (matcher2.find()) {
            System.out.println(matcher2.group(1));

        }
            System.out.println("Already logon to '" + userName + "' @ t.qq.com successfully.");
            System.out.println("Next you need redirect to http://t.qq.com/setting_edu.php do grap colleg data.");
            System.out.println("Good luck!");
            return;
        }

        return;
        // 读取跳转的地址
        // String redirectUrl = response.getFirstHeader("Location").getValue();
        // 查看一下跳转过后,都出现哪些内容.
        // response=getMethod(redirectUrl);//函数见后面
        // System.out.println(response.getStatusLine()); // HTTP/1.1 200 OK

        // 读取一下主页都有什么内容 已经登陆进去
        // System.out.println(readHtml("http://www.renren.com/home"));
    }

    // 嗅探指定页面的代码
    public String notify(String url) {
        HttpGet get = new HttpGet(url);
        //get.setHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)");
        ResponseHandler<String> responseHandler = new BasicResponseHandler();
        String txt = null;
        try {
            txt = httpClient.execute(get, responseHandler);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return txt;
    }

    // 用post方法向服务器请求 并获得响应,因为post方法要封装参数,因此在函数外部封装好传参
    public HttpResponse postMethod(HttpPost post) {
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(post);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            post.abort();
        }
        return resp;
    }

    // 用get方法向服务器请求 并获得响应
    public HttpResponse getMethod(String url) {
        HttpGet get = new HttpGet(url);
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(get);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return resp;
    }

    private String MD5_3(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            // first time
            byte b[] = md.digest();

            // Second Time
            b = md.digest(b);

            // Third Time
            b = md.digest(b);

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }

    private String MD5(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            byte b[] = md.digest();

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }

    private void SaveCookies(List<Cookie> cs) {
        if (cs.isEmpty()) {
            System.out.println("None");
        } else {
            for (int i = 0; i < cs.size(); i++) {
                cookies.put(cs.get(i).getName(), cs.get(i).getValue());
            }
        }
    }

    private void PrintCookies() {

        int i = 0;

        //Get Map in Set interface to get key and value
        Set s = cookies.entrySet();

        //Move next key and value of Map by iterator
        Iterator it = s.iterator();

        System.out.println("-------- Cookies begin ---------");
        while (it.hasNext()) {
            // key=value separator this by Map.Entry to get key and value
            Map.Entry m = (Map.Entry) it.next();
            System.out.println(" " + i++ + " : [" + m.getKey() + "] = '" + m.getValue() + "'");
        }
        System.out.println("-------- Cookies end ---------");
    }
}
//Get Canada School List
//http://t.qq.com/asyn/schoolist.php?type=4&key=%E5%8A%A0%E6%8B%BF%E5%A4%A7&letter=&



0 请登录后投票
   发表时间:2010-12-02  
yang02301 写道


已经搞定,可以自动LOGON,保存Cookies,Redirect URL,得到“高校”列表,得到“院系”列表部分还没有做,应该非常容易的啦,感谢LZ开阔思维!

在发送GET得到“高校”列表时,应该注意造一个请求头,假装使用浏览器。

请LZ在做一个简单的图形码验证LOGON例子,使用QQ所采用的。

最后附上源码,后增加的部分还没有整理,有些乱,请多多包含。



import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.http.Header;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import java.security.*;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.http.cookie.Cookie;

public class QQNotify {

    private static HttpResponse response;
    private static DefaultHttpClient httpClient;
    private static Map<String, String> cookies = new HashMap<String, String>();

    public static void main(String[] args) {
        String username = "your username";
        String password = "your password";
        QQNotify notify = new QQNotify(username, password);

        if (true) {
            return;
        }

//        String code = new String(notify.notify("http://s.xnimg.cn/a13819/allunivlist.js"));
//        // 转换16进制的Unicode,
//        StringBuffer sb = new StringBuffer(code);
//        System.out.println(sb.toString());
//        int pos;
//        while ((pos = sb.indexOf("\\u")) > -1) {
//            String tmp = sb.substring(pos, pos + 6);
//            sb.replace(pos, pos + 6, Character.toString((char) Integer.parseInt(tmp.substring(2), 16)));
//        }
//        code = sb.toString();
//        System.out.println(code);

        ///如果你要看下面代码的效果,你只需要 注释掉上面String code  到 System.out.println(code);
        //转换&#xxxxx;形式Unicode
//		String code = new String(notify
//				.notify("http://www.renren.com/GetDep.do?id=13003"));
//		StringBuffer sb=new StringBuffer(code);
//		int pos;
//		while ((pos=sb.indexOf("&#"))>-1) {
//			String tmp=sb.substring(pos+2, pos+7);
//			sb.replace(pos, pos+8, Character.toString((char)Integer.parseInt(tmp,10)));
//		}
//		code=sb.toString();
//		System.out.println(code);
    }

    public QQNotify(String userName, String password) {
        int i;

        this.httpClient = new DefaultHttpClient();

        Header[] headers;
        String url, sb, verifyCode;

        cookies.clear();

        // Step 1: get verify code
        url = "http://ptlogin2.qq.com/check?uin=@" + userName + "&appid=46000101&r=0.617148618189815";
        sb = notify(url);
        SaveCookies(httpClient.getCookieStore().getCookies());
        i = sb.indexOf("'", 19);
        verifyCode = sb.substring(18, i).toUpperCase();
        System.out.println(sb);
        System.out.println("Verify Code = '" + verifyCode + "'");
        if (!false && verifyCode.length() > 4) {
            System.out.println("It seem you need input graphic verify code manually.");
            System.out.println("Wait a few minutes and try again.");
            System.out.println("Program abort!");
            return;
        }

        // Step 2: logon
        //
        // '!UAK' -> '67E5A3B52AE29D6FC6FAFB1587F8D8F3'
        //
        //String str = MD5_3(password) + "!UAK";
        //System.out.println("str = " + str);
        //System.out.println("MD5 = " + MD5(str));
        String str = MD5_3(password) + verifyCode;

        url = "http://ptlogin2.qq.com/login?u=@"
                + userName + "&p="
                + MD5(str) + "&verifycode="
                + verifyCode + "&low_login_enable=1&low_login_hour=720&aid=46000101&u1=http%3A%2F%2Ft.qq.com&ptredirect=1&h=1&from_ui=1&dumy=&fp=loginerroralert";
        sb = notify(url);
        System.out.println(sb);

        SaveCookies(httpClient.getCookieStore().getCookies());
        PrintCookies();
        /*
ptuiCB('0','0','http://t.qq.com','1','登录成功!');
-------- Cookies begin ---------
Exception in thread "main" java.lang.NullPointerException
 0 : [ptvfsession] = 'a56b05373bffaf65643dbe875a1c9614226d1789c91ddd39134c5289878b087b3f8fd21670efcc430d111b63fa41274f'
 1 : [ptcz] = '06aa93cefb0fec33c298f13fecadb5792b7f7816adb11a5e9423e42cd4456115'
 2 : [skey] = '@na9wdcELd'
 3 : [pt2gguin] = 'o1093457233'
 4 : [lskey] = '00010000a1ac49b4a67ea43dde8d6985bb353584846c27cd4a57d63889c082d45da5540b1f78dc6c9f972099'
 5 : [luin] = 'o1093457233'
 6 : [uin] = 'o1093457233'
 7 : [ptuserinfo] = '6864726976653230'
 8 : [ptisp] = ''
-------- Cookies end ---------
-------- Cookies begin ---------
 0 : [ptvfsession] = 'cbebb4c13f69aaca9dabea361c77de60d0fb02bd9991902a7dc5abd486770613746651e4bbd99faebef2b77466e4649b'
 1 : [ptcz] = 'bf9bd2ac71844eae57221a750a7f5321f4e12bdcb0d7178d654160d175da7f3a'
 2 : [skey] = '@na9wdcELd'
 3 : [pt2gguin] = 'o1093457233'
 4 : [lskey] = '00010000035cf86f252e61d9e8f07aa2c39335e2890f01a2863caaffdb4d9e1aa64f2064ec518ccd9772d333'
 5 : [luin] = 'o1093457233'
 6 : [uin] = 'o1093457233'
 7 : [ptuserinfo] = '6864726976653230'
 8 : [ptisp] = ''
-------- Cookies end ---------
        */
        if (true) {
            // Now get country city list
            // sample get data
/*
GET /asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=& HTTP/1.1
Accept: *//*
Accept-Language: en-us
Referer: http://t.qq.com/setting_edu.php
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)
Host: t.qq.com
Connection: Keep-Alive
Cookie: ptui_loginuin2=hdrive20; mb_reg_from=8; pgv_pvid=9250536308; pgv_flv=10.0; pgv_r_cookie=10113040085471; pt2gguin=o1093457233; ptcz=3ac05d30f3e337e94f4fe93999002a23d31779fc45e7bb8f7d86b7a2e34e548d; o_cookie=1093457233; luin=o1093457233; lskey=00010000a90b1dff0aad30b3b96d163e6118418db84c6da83a219311bcb2ff34758e77cb1e3f57228e0d8522; pgv_info=ssid=s4051841450; verifysession=h0050e18f0403630ce623631bd8e1f0f51760865ce2107df6a4bbd1e10521919986ace31226351179618b6c20640a91a959; ptisp=; uin=o1093457233; skey=@na9wdcELd
p
 tui_loginuin2=hdrive20;
 mb_reg_from=8;
 pgv_pvid=9250536308;
 pgv_flv=10.0;
 pgv_r_cookie=10113040085471;
 pt2gguin=o1093457233;
 ptcz=3ac05d30f3e337e94f4fe93999002a23d31779fc45e7bb8f7d86b7a2e34e548d;
 o_cookie=1093457233;
 luin=o1093457233;
 lskey=00010000a90b1dff0aad30b3b96d163e6118418db84c6da83a219311bcb2ff34758e77cb1e3f57228e0d8522;
 pgv_info=ssid=s4051841450;
 verifysession=h0050e18f0403630ce623631bd8e1f0f51760865ce2107df6a4bbd1e10521919986ace31226351179618b6c20640a91a959;
 ptisp=;
 uin=o1093457233;
 skey=@na9wdcELd
 */

            //String redirectUrl = "http://t.qq.com/setting_edu.php";
            //http://t.qq.com/asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=&
//中国_北京
String redirectUrl = "http://t.qq.com/asyn/schoolist.php?type=4&key=%E4%B8%AD%E5%9B%BD_%E5%8C%97%E4%BA%AC&letter=&";
//美国
//String redirectUrl = "http://t.qq.com/asyn/schoolist.php?type=4&key=%E7%BE%8E%E5%9B%BD&letter=&";
HttpGet get = new HttpGet(redirectUrl);
get.setHeader("Accept", "*/*");
get.setHeader("Accept-Language", "en-us");
get.setHeader("Referer", "http://t.qq.com/setting_edu.php");
get.setHeader("User-Agent", "gzip, deflate");
get.setHeader("Accept-Language", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)");
get.setHeader("Host", "t.qq.com");
get.setHeader("Connection", "Keep-Alive");
            try {
                sb = httpClient.execute(get, new BasicResponseHandler());
            } catch (IOException ex) {
                Logger.getLogger(QQNotify.class.getName()).log(Level.SEVERE, null, ex);
            }
            //sb = notify(redirectUrl);
            System.out.println(sb);

        String regex2 = "title=\"(.*?)\">";
        Pattern pattern2 = Pattern.compile(regex2);
        Matcher matcher2 = pattern2.matcher(sb);
        while (matcher2.find()) {
            System.out.println(matcher2.group(1));

        }
            System.out.println("Already logon to '" + userName + "' @ t.qq.com successfully.");
            System.out.println("Next you need redirect to http://t.qq.com/setting_edu.php do grap colleg data.");
            System.out.println("Good luck!");
            return;
        }

        return;
        // 读取跳转的地址
        // String redirectUrl = response.getFirstHeader("Location").getValue();
        // 查看一下跳转过后,都出现哪些内容.
        // response=getMethod(redirectUrl);//函数见后面
        // System.out.println(response.getStatusLine()); // HTTP/1.1 200 OK

        // 读取一下主页都有什么内容 已经登陆进去
        // System.out.println(readHtml("http://www.renren.com/home"));
    }

    // 嗅探指定页面的代码
    public String notify(String url) {
        HttpGet get = new HttpGet(url);
        //get.setHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; QQPinyin 730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)");
        ResponseHandler<String> responseHandler = new BasicResponseHandler();
        String txt = null;
        try {
            txt = httpClient.execute(get, responseHandler);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return txt;
    }

    // 用post方法向服务器请求 并获得响应,因为post方法要封装参数,因此在函数外部封装好传参
    public HttpResponse postMethod(HttpPost post) {
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(post);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            post.abort();
        }
        return resp;
    }

    // 用get方法向服务器请求 并获得响应
    public HttpResponse getMethod(String url) {
        HttpGet get = new HttpGet(url);
        HttpResponse resp = null;
        try {
            resp = httpClient.execute(get);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            get.abort();
        }
        return resp;
    }

    private String MD5_3(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            // first time
            byte b[] = md.digest();

            // Second Time
            b = md.digest(b);

            // Third Time
            b = md.digest(b);

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }

    private String MD5(String plainText) {

        StringBuffer buf = new StringBuffer("");

        try {
            MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(plainText.getBytes());

            byte b[] = md.digest();

            int i;
            for (int offset = 0; offset < b.length; offset++) {
                i = b[offset];
                if (i < 0) {
                    i += 256;
                }
                if (i < 16) {
                    buf.append("0");
                }
                buf.append(Integer.toHexString(i));
            }
            //System.out.println("32-bit result: " + buf.toString());//32位的加密
            //System.out.println("byte b[].size: " + b.length);
        } catch (NoSuchAlgorithmException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return buf.toString().toUpperCase();
    }

    private void SaveCookies(List<Cookie> cs) {
        if (cs.isEmpty()) {
            System.out.println("None");
        } else {
            for (int i = 0; i < cs.size(); i++) {
                cookies.put(cs.get(i).getName(), cs.get(i).getValue());
            }
        }
    }

    private void PrintCookies() {

        int i = 0;

        //Get Map in Set interface to get key and value
        Set s = cookies.entrySet();

        //Move next key and value of Map by iterator
        Iterator it = s.iterator();

        System.out.println("-------- Cookies begin ---------");
        while (it.hasNext()) {
            // key=value separator this by Map.Entry to get key and value
            Map.Entry m = (Map.Entry) it.next();
            System.out.println(" " + i++ + " : [" + m.getKey() + "] = '" + m.getValue() + "'");
        }
        System.out.println("-------- Cookies end ---------");
    }
}
//Get Canada School List
//http://t.qq.com/asyn/schoolist.php?type=4&key=%E5%8A%A0%E6%8B%BF%E5%A4%A7&letter=&





厉害,获取高校院系的那部分就很简单了。恩。基本上就是get post请求了。
0 请登录后投票
   发表时间:2010-12-02   最后修改:2010-12-02
请高手添加一段代码,从http://mat1.gtimg.com/www/mb/js/mi.City_100831.js导入JSON源数据到Java变量中。

http://mat1.gtimg.com/www/mb/js/mi.City_100831.js使用UTF-8编码。

拜托!
0 请登录后投票
   发表时间:2010-12-02  
yang02301 写道

请高手添加一段代码,从http://mat1.gtimg.com/www/mb/js/mi.City_100831.js导入JSON源数据到Java变量中。

拜托!



我不熟悉 Json数据到Java变量。上次准备弄,因为其他事情耽搁了。
0 请登录后投票
论坛首页 入门技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics