英文原文出处:
DissectingTheNutchCrawler 转载本文请注明出处:http://blog.csdn.net/pwlazy
Aside: net.nutch.util.NutchConfig
If you have been reading the code along with our discussion, you may have noticed several "private static final" variables at the start of the "command" class definitions. For example, net.nutch.db.WebDBInjector has these definitions for DEFAULT_INTERVAL and NEW_INJECTED_PAGE_NAME:
private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);
private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);
The values are loaded by calls to net.nutch.util.NutchConf, which is, intuitively enough, a class that loads configuration files. It has two static variables, "List resourceNames" and "Properties properties".The class has several static methods to manipulate these variables. Here's a summary of its operations:
-
resourceNames is initialized with the strings "nutch-default.xml" and "nutch-site.xml"
-
"properties" is initially null
-
A call to one of the "getXXX" methods results in a call to getProps(). If (properties == null), loadResource() is successively called with the values from "resourceNames".
-
loadResource() loads each file, parses theXML, and sets values in "properties" per the config
附上 net.nutch.util.NutchConfig
如果你随着我们的讨论看代码,你会在几个与命令对应的类的开始处看到几个 "private static final"变量。例如 net.nutch.db.WebDBInjector类的DEFAULT_INTERVAL和 NEW_INJECTED_PAGE_NAME属性就有这种限制符,看以下代码:
private static final byte DEFAULT_INTERVAL =
(byte)NutchConf.getInt("db.default.fetch.interval", 30);
private static final float NEW_INJECTED_PAGE_SCORE =
NutchConf.getFloat("db.score.injected", 2.0f);
通过调用net.nutch.util.NutchConf可以加载上面那些变量的值,你完全可以凭直觉知道net.nutch.util.NutchConf就是一个加载配置文件的类。它有两个静态变量: "resourceNames(List 类型)" 和 "properties(Properties 类型)"。该类有些静态方法可以操作这些变量。以下是操作的总结:
- 通过"nutch-default.xml" 和 "nutch-site.xml" 初始化resourceNames
- properties开始是null
- 对getXXX方法的调用会首先调用getProps,如果properties == null,那么接着调用loadResource并传入resourceNames的各个值
- 针对resourceNames中定义的每个配置文件,loadResource方法回加载,然后解析,最后将解析结果植入到properties中
分享到:
相关推荐
这份报告“信息安全_数据安全_us-18-Goland-Dissecting-Non-Mali.pdf”主要由研究人员Ido Naor和Dani Goland探讨了一个鲜为人知的问题:非恶意工件(Non-malicious Artifacts)如何导致敏感数据泄露,并提出了如何...
"Dissecting the Hotspot JVM" 本文档是关于 Java 虚拟机(JVM)的深入分析,作者 Martin Toshev 通过分享 JVM 的架构、实现机理和调试技术,帮助读者更好地理解 JVM,并为其提供了实践经验。 虚拟机基础 虚拟机...
C++ For Artists: The Art, Philosophy, and Science of Object-Oriented Programming by Rick Miller ISBN:1932504028 Biblio Distribution ? 2003 (590 pages) Intended as both a classroom and reference ...
解剖图像作物这是B. Van Hoorick和C. Vondrick的正式资料库,“解剖图像作物”, arXiv预印本arXiv:2011.11831,2020 。简而言之,我们研究了视觉裁剪留下的痕迹。基本用法说明步骤1:使用高分辨率图像文件填充data...
Real World Java EE Night Hacks--Dissecting the Business Tier.jpg(电子书的封面图片)
GTC 2018Dissecting the Volta GPU Architecture throughMicrobenchmarkingZhe Jia, Marco Maggioni, Benjamin Staiger, Daniele P. ScarpazzaHigh-Performance Computing Group• Micro-architectural details ...
Offensive Malware Analysis - Dissecting OSX/FruitFly via a Custom C&C Server 在这篇文章中,我们将深入分析 OSX/FruitFly恶意软件的内部工作原理,并探索如何使用自定义的C&C服务器来进行恶意软件分析。本文将...
In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want ...
Completely updated and featuring 12 new chapters, Gray Hat Hacking: The Ethical Hacker's Handbook, Fourth Edition explains the enemy’s current weapons, skills, and tactics and offers field-tested ...
这篇文档主要讨论的是一个关于信息安全和数据安全的主题,特别是在云连接设备,如电动滑板车(E-Scooter)上的应用。演讲者Nikias Bassen是一位来自德国的IT专家,拥有计算机科学学位,并在逆向工程(RE)和安全研究...
Offensive Malware Analysis - Dissecting OSXFruitFly Via A Custom C&C Server OSXFruitFly是一种复杂的恶意软件,最初由Malwarebytes发现。该恶意软件使用了自定义的C&C服务器,以绕过传统的安全防护机制。为了...
Tricks of the Windows video Game Programming <br>PART I Windows Programming Foundations 7 1 Journey into the Abyss 9 A Little History.............................................................
reported in an IEEE conference paper entitled Dissecting Android Malware: Characterization and Evolution, which was presented at the IEEE Symposium on Security and Privacy (often mentioned as Oakland ...
### MS11-046: 深度解析零日攻击 #### 摘要 本文将深入探讨一种利用MS11-046漏洞进行的零日攻击,该攻击能够实现权限提升,使攻击者能够在受限用户账户下运行原本无法执行的命令。所涉及的特定漏洞为“MS11-046: ...
《Dissecting MFC 2e简体-呕心沥血版.pdf》是一部深度解析Microsoft Foundation Classes (MFC) 的技术书籍,专为C++开发者提供。MFC是微软为Windows平台开发的一个类库,它封装了Windows API,使得开发者可以更加...
dissecting C programs into assembly language code. The chapters in the first section are as follows: Chapter 1, “What Is Assembly Language?” starts the section off by ensuring that you understand...