`

IWorkloadStorable接口

阅读更多
IWorkloadStorable接口:

Spider的主要工作之一就是处理访问过的和将要访问的站点的列表,称为作业。IWorkloadStorable接口定义的对象就可以存入和取出作业中的网页.

最主要的两个方法:

public String assignWorkload();//取出作业里的网页

public void addWorkload(String url);//向作业里存入网页;

除此以外,还有其他的一些方法和成员变量:

   1. package com.heaton.bot;
   2.
   3. /**
   4.  * This interface defines a class that can
   5.  * be used to store a spider's workload(作业).
   6.  * The Bot package currently supports two
   7.  * different workload stores:
   8.  *
   9.  * SpiderInternalWorkload - Stores the
  10.  *   contents of the workload in memory.
  11.  *
  12.  * SpiderSQLWorkload - Stores the contents
  13.  *   of the workload in an SQL database.
  14.  */
  15. public interface IWorkloadStorable {
  16.
  17.   /**
  18.    * A workload entry has a status of running
  19.    * if the spider worker is opening or downloading
  20.    * that page. This state usually goes to COMPLETE
  21.    * or ERROR.
  22.    */
  23.   public static final char RUNNING = 'R';
  24.
  25.   /**
  26.    * Processing of this URL resulted in an
  27.    * error.
  28.    */
  29.   public static final char ERROR = 'E';
  30.
  31.   /**
  32.    * This URL is waiting for a spider
  33.    * worker to take it on.
  34.    */
  35.   public static final char WAITING = 'W';
  36.
  37.   /**
  38.    * This page is complete and should not
  39.    * be redownloaded.
  40.    */
  41.   public static final char COMPLETE = 'C';
  42.
  43.   /**
  44.    * The status is unknown.
  45.    */
  46.   public static final char UNKNOWN = 'U';
  47.
  48.   /**
  49.    * Call this method to request a URL
  50.    * to process. This method will return
  51.    * a WAITING URL and mark it as RUNNING.
  52.    *
  53.    * @return The URL that was assigned(指定,分配).
  54.    * 相当于getworkload()。
  55.    */
  56.   public String assignWorkload();
  57.
  58.   /**
  59.    * Add a new URL to the workload, and
  60.    * assign it a status of WAITING.
  61.    *
  62.    * @param url The URL to be added.
  63.    */
  64.   public void addWorkload(String url);
  65.
  66.   /**
  67.    * Called to mark this URL as either
  68.    * COMPLETE or ERROR.
  69.    *
  70.    * @param url The URL to complete.
  71.    * @param error true - assign this workload a status of ERROR.
  72.    * false - assign this workload a status of COMPLETE.
  73.    */
  74.   public void completeWorkload(String url,boolean error);
  75.
  76.   /**
  77.    * Get the status of a URL.
  78.    *
  79.    * @param url Returns either RUNNING, ERROR
  80.    * WAITING, or COMPLETE. If the URL
  81.    * does not exist in the database,
  82.    * the value of UNKNOWN is returned.
  83.    * @return Returns either RUNNING,ERROR,
  84.    * WAITING,COMPLETE or UNKNOWN.
  85.    */
  86.   public char getURLStatus(String url);
  87.
  88.   /**
  89.    * Clear the contents of the workload store.
  90.    */
  91.   public void clear();
  92. }
93.

其他几个方法不是常用到,倒是那几个状态变量可以留作理清思路之用。

That‘s OK!
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics