commons-httpclient和htmlparser应用之博客搬家

fangwei

浏览: 226254 次
性别:
来自: 深圳

最近访客更多访客>>

tntxia

xfworld

我是好人QAQ

Sdky

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

编程技术

Apache 百度 log4j Blog Windows

把以前在百度空间收集的文章搬到javaeye了，主要用到的lib就是commons-httpclient和htmlparser，在此记录下一些关键的代码片段。

jar包清单

commons-codec-1.3.jar
commons-httpclient-3.1.jar
commons-lang.jar
commons-logging-1.1.jar
htmlparser.jar
log4j-1.2.15.jar
slf4j-api-1.5.8.jar
slf4j-log4j12-1.5.8.jar

扩展 org.apache.commons.httpclient.HttpClient，覆盖其executeMethod方法处理cookie

package util;

import java.io.IOException;

import org.apache.commons.httpclient.Cookie;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpState;

public class HttpClientEx extends HttpClient {

    private HttpState httpState = new HttpState(); // http状态对象，主要保存cookie
    private String cookie = "";

    public int executeMethod(HttpMethod httpMethod) throws IOException, HttpException {
        String cookie = this.getCookie();
        String uri = httpMethod.getURI().getHost();
        httpState.addCookie(new Cookie(uri, "cookie", cookie, "/", null, false));
        this.setState(httpState);

        int statues = super.executeMethod(httpMethod);

        Header[] headerArray = httpMethod.getResponseHeaders();
        for (Header h : headerArray) {
            if (h.getName().trim().equalsIgnoreCase("Set-Cookie")) {
                if (!this.getCookie().equals("")) { // 如果值不为空
                    this.setCookie(this.getCookie() + ";" + h.getValue());
                } else {
                    this.setCookie(h.getValue());
                }
            }
        }
        return statues;
    }

    public String getCookie() {
        return cookie;
    }

    public void setCookie(String cookie) {
        this.cookie = cookie;
    }

}

get url

String url = HTTP_HI_BAIDU_COM + USER_ID + "/blog";
HttpClient client = new HttpClientEx();
GetMethod getMethod = new GetMethod(url);
client.executeMethod(getMethod);
String body = new String(getMethod.getResponseBody(), getMethod.getResponseCharSet());
getMethod.releaseConnection(); 
logger.debug("日志列表页面\n{}", body);

分析html页面中的div元素

Parser parser = Parser.createParser(body, getMethod.getResponseCharSet());
NodeFilter filter = new TagNameFilter("div");
NodeList nodeList = parser.parse(filter);
for (int i = 0; i < nodeList.size(); i++) {
   Div div = (Div) nodeList.elementAt(i);
   if ("m_blog".equals(div.getAttribute("id"))) {
     logger.debug("id为m_blog的div内容\n{}", div.toHtml()); 
   }
}

查找含有特定文字的节点集合

NodeList searchFor = div.searchFor("类别");

设置User-Agent和post数据字符编码

private static final String USER_AGENT = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; iCafeMedia; InfoPath.2)";
private static final String CHARSET = "UTF-8";

HostParams params = new HostParams();
params.setParameter(HttpMethodParams.USER_AGENT,USER_AGENT);
params.setParameter(HttpMethodParams.HTTP_CONTENT_CHARSET, CHARSET);
client.getHostConfiguration().setParams(params);

post url

String url = HOST + "/login";
PostMethod postMethod = new PostMethod(url);
postMethod.setParameter("name", "fangwei");
postMethod.setParameter("password", "******");
client.executeMethod(postMethod);

3
顶

0
踩

分享到：

验证码能实现验证功能吗？ | 开发效率和程序可读性

2009-08-05 21:10
浏览 2427
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

commons-httpclient和htmlparser应用之博客搬家

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

commons-httpclient和htmlparser应用之博客搬家

评论

发表评论

相关推荐

Java security KeyStore Cipher

DirectInfo.GetFiles 排序

linux 实时观察文件行数变化

org.apache.poi 读取 excel xls xlsx

mysql 列转行 GROUP_CONCAT

MMS 多媒体短信服务 彩信

SMS的体系结构

C# 事件 EventHanlder

linux shell 根据目录拼出 java classpath

apache resin 端口关联

ubuntu 用户相关

传递带空格的参数给linux shell中的java命令

ubuntu server版配置关闭系统自动更新

tomcat配置https ssl

junit4定义测试集TestSuite Declaration

jquery选中单选框、复选框、下拉框

使用这样的html注释把js代码注起来的作用

log4j布局PatternLayout详细手册

Runtime.getRuntime().exec(cmd)的超时处理

从结构化编程到面向对象编程

最近访客更多访客>>

MMS 多媒体短信服务彩信