Heritrix 学习笔记1.Heritrix defined codes -

wangwei3

浏览: 123459 次
性别:
来自: 北京

最近访客更多访客>>

jeffkuang

蔚蓝之天空

whut0503

lincolnlee1982

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Heritrix 学习笔记1.Heritrix defined codes

博客分类：

heritrix

thread

本文为博主翻译，转载请注明出处。如有翻译不妥，请指出以便改正，谢谢。

1 Successful DNS lookup
DNS 查找成功

0 Fetch never tried (perhaps protocol unsupported or illegal URI)
从未获取（可能协议未授权或者不合法URI）

-1 DNS lookup failed
DNS 查找失败

-2 HTTP connect failed
HTTP连接失败

-3 HTTP connect broken
HTTP连接中断

-4 HTTP timeout (before any meaningful response received)
HTTP协议超时（在接收到响应之前）

-5 Unexpected runtime exception; see runtime-errors.log
未处理的运行时异常会记录在runtime-errors.log

-6 Prerequisite domain-lookup failed, precluding fetch attempt
运行先决条件，也就是没有得到域名的DNS

-7 URI recognized as unsupported or illegal
无支持或者非法的URI

-8 Multiple retries all failed, retry limit reached
多次尝试全部失败，重试次数（可以自己设置）达到限制

-50 Temporary status assigned URIs awaiting preconditions; appearance in logs may be a bug
临时的状态已分配的URIs等待先决条件（DNS）,出现在log可能是一个bug

-60 Failure status assigned URIs which could not be queued by the Frontier (and may in fact be unfetchable)
失败的状态已分配的URIs不能被Frontier(调度器)加入队列

-61 Prerequisite robots.txt-fetch failed, precluding a fetch attempt
运行先决条件（DNS）被robots.txt(爬虫协议)拒绝

-62 Some other prerequisite failed, precluding a fetch attempt
其他的一些获取先决条件（DNS）失败

-63 A prerequisite (of any type) could not be scheduled, precluding a fetch attempt
DNS在所有的类型中不能被加入列表

-3000 Severe Java 'Error' conditions (OutOfMemoryError, StackOverflowError, etc.) during URI processing.
-4000 'chaff' detection of traps/content of negligible value applied
-4001 Too many link hops away from seed
-4002 Too many embed/transitive hops away from last URI in scope
-5000 Out of scope upon reexamination (only happens if scope changes during crawl)
-5001 Blocked from fetch by user setting
-5002 Blocked by a custom processor
-5003 Blocked due to exceeding an established quota
-5004 Blocked due to exceeding an established runtime
-6000 Deleted from Frontier by user
-7000 Processing thread was killed by the operator (perhaps because of a hung condition)
-9998 Robots.txt rules precluded fetch
HTTP codes
1xx Informational
100 Continue
101 Switching Protocols
2xx Successful
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content
3xx Redirection
300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other
304 Not Modified
305 Use Proxy
307 Temporary Redirect
4xx Client Error
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed
5xx Server Error
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported

分享到：

UML备忘 | Heritrix去重

2010-07-13 20:06
浏览 1701
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Heritrix 学习笔记1.Heritrix defined codes

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Heritrix 学习笔记1.Heritrix defined codes

评论

发表评论

相关推荐

出售分布式网络爬虫程序

Heritrix中的SURT和SurtPrefixedDecideRule

heritrix设计详解(一) 总述

spider技术综述

Heritrix源码之 处理链

Heritrix源码分析(十三) Heritrix的控制中心(大脑)CrawlController(二)

爬虫基本原理及概念

heritrix 下载不通过服务器缓存

转 互联网反爬虫策略

爬虫被封原因

网站防爬虫

Heritrix去重

heritrix无法抓取中文URL的问题解决方案

heritrix 多个job合并的方案

继续抓取的一些问题及解决方案

heritrix在原有基础上抓取

job配置经验分享

elfhash多线程抓取

heritrix入门及配置

最近访客更多访客>>

Heritrix源码之处理链

转互联网反爬虫策略