nutch抓取下来，但搜索不到结果的解决方案

gstarwd

浏览: 1563642 次
性别:
来自: 杭州

最近访客更多访客>>

cl_andywin

sagadan

scj2cy

wangyy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Nucth

lucene XML Tomcat 编程 F#

nutch建立索引成功,通过lucene直接搜索索引文件可以搜索到相关结果,但是通过nutch搜索不到结果

解决方案:

<property>
<name>searcher.dir</name>
<value>crawl</value>
</property>

在nutch-default.xml中配置searcher.dir的默认路径为crawl,nutch会到crawl这个路径里面搜索,而你的索引文件跟本不在这个目录下面,所有搜索不到结果.

你可以在nutch-site.xml中加入searcher.dir这个属性的配置,value为nutch的索引文件的上一级目录.

比如:
我的索引文件为F:/cygwin/home/nutch-1.0/crawled/index,
配置为F:/cygwin/home/nutch-1.0/crawled
（注：我使用相对路径时也没搜到结果，后用绝对路径后可以了，推荐先用一下绝对路径。）
然后从nutch-default.xml中把包括名为search.dir的属性复制到
nutch-site.xml，修改<value></value>之间的内容，如下：
<property>
<name>searcher.dir</name>
<value>F:/cygwin/home/nutch-1.0/crawled</value>
<description>
Path to root of crawl. This directory is searched (in
order) for either the file search-servers.txt, containing a list of
distributed search servers, or the directory "index" containing
merged indexes, or the directory "segments" containing segment
indexes.
</description>
</property>

好，切入正题：
首先保证你说的抓取是正确的抓取，那样会在抓取目录下生成五个目录，分别是crawldb,index,indexes,linkdb,segments。

注：有两个地方存在nutch-site.xml文件，因为不了解，所以我把两个nutch-site.xml文件都作了修改。分别为~/nutch- 1.0conf/nutch-site.xml和~tomcat/webapps/ROOT/WEB-INF/classes/nutch- site.xml

摘自我在sogou上的回答。

分享到：

权衡的艺术-产品经理如何把产品做成功？ | Nutch Fetcher: No agents listed in ‘ht ...

2010-08-23 00:26
浏览 2507
评论(1)
分类:编程语言
查看更多

1 楼 matraxa 2010-11-04

nutch建立索引成功,通过lucene直接搜索索引文件可以搜索到相关结果 .
请问下这么用lucene直接搜索索引文件啊？我想通过编程实现但不知道字段名等信息。搜索不到啊。请指教下。谢谢了。

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论