如何安装heritrix3

eimhee

浏览: 2177919 次
性别:
来自: 北京

最近访客更多访客>>

loginboot

u012363178

feichuanliushi

xx5333

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

heritrix

SVN Eclipse Ubuntu maven Spring

使用svn,从sourceforget.net 上checkout 项目 https://archive-crawler.svn.sourceforge.net/svnroot/archive-crawler/trunk/heritrix3

Especially if you're customizing Heritrix (as seems to be the case from
setting up a dev environment), you should be basing your work off of
Heritrix 3.0.0/heritrix3 trunk (aka 'H3').

H3 is the main focus of our development going forward, and its
Spring-based configuration offers easier opportunities for incremental
extension.

It's also best to work from an SVN checkout, as the working source tree
has Eclipse project-support files (.project, .classpath) as used by the
Heritrix core team.

So my suggestions would be:

- discard any prior projects

- make sure your Eclipse install includes SVN and Maven support

- create a new project, SVN->"Checkout projects from SVN", using URL

https://archive-crawler.svn.sourceforge.net/svnroot/archive-crawler/trunk/heritrix3

- attempt one Maven2 install build from that checkout, to trigger
population of your local M2_REPO with all necessary 3rd-party libraries

- if Eclipse seems not to recognize paths it should, try one or all of:
- 'refresh' menupick on project
- restarting Eclipse
- toggling the 'build automatically' or 'clean...' options

These Ubuntu-centric notes from my colleague Steve may be helpful,
though they are still explicitly only regarding H1/H2:

https://webarchive.jira.com/wiki/display/~siznax/Heritrix+in+Eclipse

If anyone can verify/update these prior guides to work with H3, bringing
a developer from ground state to a working Eclipse H3 dev project,
that'd be greatly appreciated.

分享到：

关于Heritrix的Extractor中文乱码 | 把时间投资在自己的未来

2010-08-23 18:22
浏览 4076
评论(6)
分类:编程语言
查看更多

6 楼 IT民工% 2012-02-07

eimhee 写道

IT民工% 写道

你好，你的H3的增量抓取实现了吗？可否分享经验

heritrix3 已经有增量抓取实现(HistoryProcesser), 但我是简单用MYSQL保存以前的信息。

HistoryProcesser 你说的这个HistoryProcesser，我在3.0里面根本没找到啊？

5 楼 IT民工% 2012-02-02

eimhee 写道

IT民工% 写道

你好，你的H3的增量抓取实现了吗？可否分享经验

heritrix3 已经有增量抓取实现(HistoryProcesser), 但我是简单用MYSQL保存以前的信息。

加个qq好吗？136899184

4 楼 eimhee 2012-02-01

IT民工% 写道

你好，你的H3的增量抓取实现了吗？可否分享经验

heritrix3 已经有增量抓取实现(HistoryProcesser), 但我是简单用MYSQL保存以前的信息。

3 楼 IT民工% 2012-01-11

你好，你的H3的增量抓取实现了吗？可否分享经验

2 楼 maskainv 2011-03-15

Missing:
----------
1) com.anotherbigidea:javaswf:jar:CVS-SNAPSHOT-1

Try downloading the file manually from the project website.

Then, install it using the command:
      mvn install:install-file -DgroupId=com.anotherbigidea -DartifactId=javaswf
-Dversion=CVS-SNAPSHOT-1 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:

      mvn deploy:deploy-file -DgroupId=com.anotherbigidea -DartifactId=javaswf -
Dversion=CVS-SNAPSHOT-1 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -Drepos
itoryId=[id]

Path to dependency:
        1) org.archive.heritrix:heritrix-commons:jar:3.0.0
        2) com.anotherbigidea:javaswf:jar:CVS-SNAPSHOT-1

2) org.archive.overlays:archive-overlay-commons-httpclient:jar:3.1

Try downloading the file manually from the project website.

Then, install it using the command:
      mvn install:install-file -DgroupId=org.archive.overlays -DartifactId=archi
ve-overlay-commons-httpclient -Dversion=3.1 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:

      mvn deploy:deploy-file -DgroupId=org.archive.overlays -DartifactId=archive
-overlay-commons-httpclient -Dversion=3.1 -Dpackaging=jar -Dfile=/path/to/file -
Durl=[url] -DrepositoryId=[id]

Path to dependency:
        1) org.archive.heritrix:heritrix-commons:jar:3.0.0
        2) org.archive.overlays:archive-overlay-commons-httpclient:jar:3.1

3) org.dnsjava:dnsjava:jar:2.0.3

Try downloading the file manually from the project website.

Then, install it using the command:
      mvn install:install-file -DgroupId=org.dnsjava -DartifactId=dnsjava -Dvers
ion=2.0.3 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:

      mvn deploy:deploy-file -DgroupId=org.dnsjava -DartifactId=dnsjava -Dversio
n=2.0.3 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
        1) org.archive.heritrix:heritrix-commons:jar:3.0.0
        2) org.dnsjava:dnsjava:jar:2.0.3

4) org.archive.overlays:archive-overlay-commons-pool:jar:1.3

Try downloading the file manually from the project website.

Then, install it using the command:
      mvn install:install-file -DgroupId=org.archive.overlays -DartifactId=archi
ve-overlay-commons-pool -Dversion=1.3 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:

      mvn deploy:deploy-file -DgroupId=org.archive.overlays -DartifactId=archive
-overlay-commons-pool -Dversion=1.3 -Dpackaging=jar -Dfile=/path/to/file -Durl=[
url] -DrepositoryId=[id]

Path to dependency:
        1) org.archive.heritrix:heritrix-commons:jar:3.0.0
        2) org.archive.overlays:archive-overlay-commons-pool:jar:1.3

5) it.unimi.dsi:mg4j:jar:1.0.1

Try downloading the file manually from the project website.

Then, install it using the command:
      mvn install:install-file -DgroupId=it.unimi.dsi -DartifactId=mg4j -Dversio
n=1.0.1 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:

      mvn deploy:deploy-file -DgroupId=it.unimi.dsi -DartifactId=mg4j -Dversion=
1.0.1 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
        1) org.archive.heritrix:heritrix-commons:jar:3.0.0
        2) it.unimi.dsi:mg4j:jar:1.0.1

----------
5 required artifacts are missing.

1 楼 maskainv 2011-03-15

不知道你是否正常安装我这边有些依赖包无法下载比如com.noelios.restlet-1.1.10.jar
com.noelios.restlet.ext.jetty-1.1.10.jar 等

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论