nutch-1.2结合hadoop分布式搜索

全部 Linux 数据库敏捷编程数据结构软件测试项目管理 Oracle 编程综合互联网 Erlang MySQL

浏览 2558 次

锁定老帖子主题：nutch-1.2结合hadoop分布式搜索精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
p_x1984 等级: 性别: 文章: 207 积分: 850 来自: 北京	发表时间：2011-07-13 相关推荐: nutch1.2源码 Hadoop-0.20.2+ Nutch-1.2+Tomcat-7——分布式搜索配置 nutch1.2+hadoop0.20搭建分布式环境 Nutch1.8+Hadoop1.2+Solr4.3分布式集群配置 nutch-1.2-------nutch命令详解更多相关推荐编程综合 nutch-1.2结合hadoop分布式搜索。 1、网上关于nutch分布式搜索的配置有些BLOG写的很详细了。有那些地方有疑问的，我这里也给一个连接<<nutch分布式搜索配置>> 2、在这里主要想写下工作过程当中遇到的一些问题： ------0------- ------1------- ------2------- ------3------- java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226) at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:67) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1756) at java.io.DataInputStream.read(Unknown Source) at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:178) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:160) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:81) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:222) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:879) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:574) at org.apache.lucene.index.IndexReader.document(IndexReader.java:658) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:163) at org.apache.nutch.searcher.IndexSearcher.getDetails(IndexSearcher.java:110) at org.apache.nutch.searcher.LuceneSearchBean.getDetails(LuceneSearchBean.java:107) at org.apache.nutch.searcher.NutchBean.getDetails(NutchBean.java:359) at com.yichen.node.ThreadPoolTaskSearch.query(ThreadPoolTaskSearch.java:89) at com.yichen.node.ThreadPoolTaskSearch.query(ThreadPoolTaskSearch.java:59) at com.yichen.node.ThreadPoolTaskSearch.search(ThreadPoolTaskSearch.java:38) at com.yichen.node.ThreadPoolTaskSearch.run(ThreadPoolTaskSearch.java:130) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) nutchBean closed。。。。 nutchBean closed。。。。 {indexNo=0, uniqueKey=35, su=null, post=IT工程师, company=卡斯柯信号有限公司北京分公司, salary=(0-0), type=job, updatetime=20110621} no found result。。。。 {indexNo=0, uniqueKey=19, su=null, post=【知名合资IT企业】高级营销经理（安全）–CEN810, company=大连博科人才有限公司, salary=(0-0), type=job, updatetime=20110621} {indexNo=0, uniqueKey=18, su=null, post=【知名合资IT企业】高级拓展经理（安全）–CEN811, company=大连博科人才有限公司, salary=(0-0), type=job, updatetime=20110621} {indexNo=0, uniqueKey=20, su=null, post=【知名合资IT企业】高级规划经理（安全）–CEN809, company=大连博科人才有限公司, salary=(0-0), type=job, updatetime=20110621} {indexNo=0, uniqueKey=21, su=null, post=理财产品销售专员（综合金融）, company=平安金融服务公司, salary=(4000-50000), type=job, updatetime=20110621} {indexNo=0, uniqueKey=25, su=null, post=理财金融营销专员, company=平安金融服务公司, salary=(4000-50000), type=job, updatetime=20110620} {indexNo=0, uniqueKey=28, su=null, post=金融产品理财专员, company=平安金融服务公司, salary=(5000-20000), type=job, updatetime=20110620} {indexNo=0, uniqueKey=22, su=null, post=理财客户金融经理, company=平安金融服务公司, salary=(6001-8000), type=job, updatetime=20110620} {indexNo=0, uniqueKey=24, su=null, post=理财金融专员, company=平安金融服务公司, salary=(3000-20000), type=job, updatetime=20110621} {indexNo=0, uniqueKey=31, su=null, post=金融理财经理（综合金融）, company=平安金融服务公司, salary=(8001-10000), type=job, updatetime=20110620} 分析原因：单个线程在分布式中搜索没有出现问题，以上出现错误原因是多线程搜索时出现的。由于每次打开的连接次数太多，导致连接没有关闭。出现上面的错误。解决办法： 1、在servlet初始化中，加入： public void init(ServletConfig config) throws ServletException { try { this.conf = NutchConfiguration.get(config.getServletContext()); bean = NutchBean.get(config.getServletContext(), this.conf); } catch (IOException e) { throw new ServletException(e); } MAX_HITS_PER_PAGE = conf.getInt("searcher.max.hits.per.page", -1); } 2、修改web.xml，加入： <listener> <listener-class>org.apache.nutch.searcher.NutchBean$NutchBeanConstructor</listener-class> </listener> <servlet> <servlet-name>Cached</servlet-name> <servlet-class>org.apache.nutch.servlet.Cached</servlet-class> </servlet> 3、在自己的servlet中把NutchBean的实例和NutchConfiguration的实例传递过去。保证初始化时只打开一次index。 linux下如何配置分布式检索.pdf (40 KB) 下载次数: 96 声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → 综合技术版

跳转论坛: