浏览 3699 次
锁定老帖子 主题:Heritrix多线程的问题
该帖已经被评为新手帖
|
|
---|---|
作者 | 正文 |
发表时间:2007-11-16
另外我已经把Heritrix.properties文件和AbstractFrontier中相应的位置都已经改了,希望您能帮我看看,谢谢了。 /******************************************************************************* * 文件说明: * * 项目名: WebCrawler * 文件名: ELFHashAssignmentPolicy.java * 包名: com.hotct.heritrixExt.common.frontier * * 创建人: zhangzhenxin * 创建时间: 下午03:50:01 * 创建日期: 2007-10-30 ******************************************************************************/ package com.hotct.heritrixExt.common.frontier; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.commons.httpclient.URIException; import org.archive.crawler.datamodel.CandidateURI; import org.archive.crawler.framework.CrawlController; import org.archive.crawler.frontier.HostnameQueueAssignmentPolicy; import org.archive.crawler.frontier.QueueAssignmentPolicy; import org.archive.net.UURI; import org.archive.net.UURIFactory; /** * <h>类型描述</h> * * @author zhangzhenxin * @date 2007-10-30 */ public class ELFHashAssignmentPolicy extends QueueAssignmentPolicy { private static final Logger logger = Logger .getLogger(ELFHashAssignmentPolicy.class.getName()); private static String DEFAULT_CLASS_KEY = "default..."; private static final String DNS = "dns"; /** * */ @Override public String getClassKey(CrawlController controller, CandidateURI cauri) { String uri = cauri.getUURI().toString(); String scheme = cauri.getUURI().getScheme(); String candidate = null; try { if (scheme.equals(DNS)) { if (cauri.getVia() != null) { // Special handling for DNS: treat as being // of the same class as the triggering URI. // When a URI includes a port, this ensures // the DNS lookup goes atop the host:port // queue that triggered it, rather than // some other host queue UURI viaUuri = UURIFactory.getInstance(cauri.flattenVia()); candidate = viaUuri.getAuthorityMinusUserinfo(); // adopt scheme of triggering URI scheme = viaUuri.getScheme(); } else { candidate = cauri.getUURI().getReferencedHost(); } } else { // String uri = cauri.getUURI().toString(); long hash = ELFHash(uri); candidate = Long.toString(hash % 100); } if (candidate == null || candidate.length() == 0) { candidate = DEFAULT_CLASS_KEY; } } catch (URIException e) { logger.log(Level.INFO, "unable to extract class key; using default", e); candidate = DEFAULT_CLASS_KEY; } return candidate.replace(':', '#'); } public static long ELFHash(String str) { long hash = 0; long x = 0; for (int i = 0; i < str.length(); i++) { hash = (hash << 4) + str.charAt(i); if ((x = hash & 0xF0000000L) != 0) { hash ^= (x >> 24); hash &= ~x; } } return (hash & 0x7FFFFFFF); } } 声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |
发表时间:2008-04-06
我也遇到相同的问题 ,不知道lz有没有解决 ?
|
|
返回顶楼 | |