- 浏览: 1663262 次
- 性别:
- 来自: 北京
-
文章分类
- 全部博客 (405)
- C/C++ (16)
- Linux (60)
- Algorithm (41)
- ACM (8)
- Ruby (39)
- Ruby on Rails (6)
- FP (2)
- Java SE (39)
- Java EE (6)
- Spring (11)
- Hibernate (1)
- Struts (1)
- Ajax (5)
- php (2)
- Data/Web Mining (20)
- Search Engine (19)
- NLP (2)
- Machine Learning (23)
- R (0)
- Database (10)
- Data Structure (6)
- Design Pattern (16)
- Hadoop (2)
- Browser (0)
- Firefox plugin/XPCOM (8)
- Eclise development (5)
- Architecture (1)
- Server (1)
- Cache (6)
- Code Generation (3)
- Open Source Tool (5)
- Develope Tools (5)
- 读书笔记 (7)
- 备忘 (4)
- 情感 (4)
- Others (20)
- python (0)
最新评论
-
532870393:
请问下,这本书是基于Hadoop1还是Hadoop2?
Hadoop in Action简单笔记(一) -
dongbiying:
不懂呀。。
十大常用数据结构 -
bing_it:
...
使用Spring MVC HandlerExceptionResolver处理异常 -
一别梦心:
按照上面的执行,文件确实是更新了,但是还是找不到kernel, ...
virtualbox 4.08安装虚机Ubuntu11.04增强功能失败解决方法 -
dsjt:
楼主spring 什么版本,我的3.1 ,xml中配置 < ...
使用Spring MVC HandlerExceptionResolver处理异常
A Guide to Information Retrieval
Organized by Hongfei Yan
Last updated on July 27, 2007
http://sewm.pku.edu.cn/IR-Guide.txt
Contents
Books
+ Finding Out About: Search Engine Technology from a cognitive
Perspective (Belew, R.K., 2000)
http://www-cse.ucsd.edu/~rik/foa/
+ Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
+ Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
(full text)
http://www.dcs.gla.ac.uk/Keith/Preface.html
+ Information Retrieval: A Survey (Ed Greengrass, 2000)
http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf
+ Information Retrieval: Data Structures & Algorithms
(Frakes, W. and Baeza-Yates, R., 1992)
http://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html
+ Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)
http://www.db.dk/pi/iri/
+ Introduction to Information Retrieval
(Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze, 2007)
http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
+ Managing Gigabytes:compressing and indexing documents and images,
2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)
+ Mining the Web: Discovering Knowledge from Hypertext Data
(Soumen Chakrabarti, 2003)
+ Modeling the Internet and the Web:
probabilistic Methods and Algorithms
(Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)
+ Modern Information Retrieval
(Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)
+ Readings in Information Retrieval.
(Sparck-Jones, K. and Willett, P., 1997)
+ Search Engine: Principle,Technology and Systems
搜索引擎-原理、技术与系统
(Xiaoming Li,et al., 2005 ), (full text)
http://sewm.pku.edu.cn/book/dlbook.html
+ The Geometry of Information Retrieval
(C.J. van Rijsbergen, 2004)
http://ir.dcs.gla.ac.uk/GeometryOfIR/
+ The Turn: Integration of Information Seeking and Retrieval in Context
(Ingwersen, P., and Jarvelin, K., 2005)
+ TREC: Experiment and Evaluation in Information Retrieval
(Voorhees, E.M., and Harman, D.K., 2005)
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10667
Conferences and Workshops
+ CIKM: Conference on Information and Knowledge Management
http://www.csee.umbc.edu/cikm/
+ SIGIR: Special Interest Group on Information Retrieval
http://www.sigir.org/
+ SIGKDD: Knowledge Discovery and Data Mining
http://www.kdd.org/
+ World Wide Web
http://www.iw3c2.org/
+ SEWM: Symposium of Search Engine and WebMining
全国搜索引擎和网上信息挖掘学术研讨会
http://net.pku.edu.cn/~sewm/
Courses
+ CMU Information Retrieval
http://nyc.lti.cs.cmu.edu/classes/11-741/ (Spring 2006)
Instructors: Jamie Callan and Yiming Yang
+ Cornell University The Structure of Information Networks (Spring 2006)
http://www.cs.cornell.edu/courses/cs685/2006sp/
Instructor: Jon Kleinberg
+ Peking University Web Based Information Architectures (Fall 2006)
http://net.pku.edu.cn/~wbia/
Instructor: Xiaoming Li, Jimin Wang and Bo Peng
+ Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)
http://www.stanford.edu/class/cs276/
Instructor: Christopher Manning and Prabhakar Raghavan
+ UIUC Introduction to Text Information Systems (Spring 2007)
http://sifaka.cs.uiuc.edu/course/410s07/
Instructor: ChengXiang Zhai
+ UMass Univ. Information retrieval course (Spring 2005)
http://ciir.cs.umass.edu/cmpsci646/
Instructors: James Allan
+ Washington Univ. Search Engines course
http://courses.washington.edu/lis544/
Evaluation Resources
+ CLEF: Cross-Language Evaluation Forum
http://clef.iei.pi.cnr.it/
+ CWIRF: Chinese Web Information Retrieval Forum
http://www.cwirf.org/
+ DUC: Document Understanding Conferences
http://duc.nist.gov/
+ INEX: INitiative for the Evaluation of XML Retrieval
http://inex.is.informatik.uni-duisburg.de/
+ NTCIR: NII-NACSIS Test Collection for IR Systems
http://research.nii.ac.jp/ntcir/
+ TREC: Text REtrieval Conference
http://trec.nist.gov/
Journals
+ Briefings in Bioinformatics (full text)
http://bib.oxfordjournals.org/archive/
+ Computational Linguistics, The MIT Press
http://mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=10
+ Data & Knowledge Engineering (DKE), Elsevier
http://www.elsevier.com/wps/find/journaldescription.cws_home/505608/description?navopenmenu=-2
+ D-Lib Magazine
http://www.dlib.org/
+ Information Processing Letters, Elsevier
http://www.elsevier.com/locate/issn/00200190
+ Information Processing and Management (IP&M), Elsevier
http://www.elsevier.com/locate/infoproman
+ Information Retrieval, Springer
http://www.springer.com/sgw/cda/frontpage/0,11855,3-0-70-35744790-detailsPage%253Djournal%257Cdescription%257Cdescription,00.html
+ Information Research
http://informationr.net/ir
+ International Journal on Digital Libraries, Springer
http://link.springer.de/link/service/journals/00799/index.htm
+ International Journal of Cooperative Information Systems (IJCIS),
World Scientific
http://ejournals.wspc.com.sg/ijcis/ijcis.shtml
+ International Journal on Document Analysis and Recognition, Springer
http://link.springer.de/link/service/journals/10032/index.htm
+ International Journal of Intelligent Systems, Wiley
http://www3.interscience.wiley.com/cgi-bin/jhome/36062
+ International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
http://ejournals.wspc.com.sg/ijufks/ijufks.shtml
+ Journal of the American Society for Information Science and Technology (JASIST), Wiley
http://www3.interscience.wiley.com/cgi-bin/jhome/76501873
+ Journal of Documentation (JDoc). Emerald
http://www.emeraldinsight.com/0022-0418.htm
+ Journal of Intelligent Information Systems (JIIS), Springer
http://www.wkap.nl/journalhome.htm/0925-9902
+ Knowledge and Information Systems (KAIS), Springer
http://link.springer.de/link/service/journals/10115/index.htm
+ Natural Language Engineering, Cambridge University Press
http://www.cambridge.org/journals/journal_catalogue.asp?mnemonic=NLE
+ Transactions On Information Systems (TOIS), ACM
http://www.acm.org/tois/
+ Transactions on Knowledge and Data Engineering (TKDE), IEEE
http://www.computer.org/tkde/
List Archives
+ SIG-IRList, http://www.sigir.org/sigirlist/index.html
Organizations and Special Interest Groups
+ Cambridge NLIP, http://www.cl.cam.ac.uk/Research/NL/
+ CMU LTI, http://www.lti.cs.cmu.edu/
+ DEC laboratories in Palo Alto, Calif.
+ Glasgow Information Retrieval Group, http://www.dcs.gla.ac.uk/ir/
+ Google Labs, http://labs.google.com/
+ LTI, http://www.lti.cs.cmu.edu/
+ Massachusetts CIIR, http://ciir.cs.umass.edu/
+ MSR Asia, Web Search & Data Mining Group
http://research.microsoft.com/wsm/
+ Standford InfoLab, http://infolab.stanford.edu/
+ UIUC Information Retrieval Group, http://sifaka.cs.uiuc.edu/ir/
+ 北大天网组, http://sewm.pku.edu.cn/
+ 北京大学计算语言学研究所, http://icl.pku.edu.cn/
+ 复旦大学信息检索和自然语言处理组,
http://www.cs.fudan.edu.cn/mcwil/irnlp/
+ 哈工大信息检索组, http://ir.hit.edu.cn/
+ 清华大学智能技术与系统国家重点实验室
http://www.csai.tsinghua.edu.cn/
#+ 中科院大规模内容计算组, http://159.226.40.18/ (fail to visit)
Researchers
+ Andrew McCallum,
http://www.cs.umass.edu/~mccallum/
+ ChengXiang Zhai, developing Lemur
http://www-faculty.cs.uiuc.edu/~czhai/
+ Gerard Salton
http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html
+ Karen Sparck, developing IDF
http://www.cl.cam.ac.uk/users/ksj/
+ Keith van Rijsbergen
http://www.dcs.gla.ac.uk/~keith/
+ Jamie Callan,
http://www.cs.cmu.edu/~callan/
+ Jon Kleinberg, developing HIT
http://www.cs.cornell.edu/home/kleinber/
+ Li Xiaoming, developing Tianwang & Infomall
+ Nick Craswell, developing Terabyte Track
http://research.microsoft.com/~nickcr
+ Susan Dumais, developing LSI
http://research.microsoft.com/~sdumais/
+ Yiming Yang, developing text categorization
http://www.cs.cmu.edu/~yiming/
+ Stephen Robertson,
http://research.microsoft.com/users/robertson/
+ Tefko Saracevic
http://www.scils.rutgers.edu/~tefko/
+ W. Bruce Croft
http://ciir.cs.umass.edu/personnel/croft.html
Research-related Resources
+ http://www-faculty.cs.uiuc.edu/~czhai/research.html
Software
+ Apache Lucene: a full-featured text search engine library
http://lucene.apache.org/java/docs/index.html
+ Gate: a general architecture for text engineering
http://gate.ac.uk/
+ Lemur: A full-text search engine
http://www.lemurproject.org/
+ MG: A full-text search engine
http://www.math.utah.edu/pub/mg/
+ Porter Stemmer: English stemming algorithm
http://www.tartarus.org/martin/PorterStemmer/
+ Nutch: an open source web search engine
http://sourceforge.net/projects/nutch/
+ TSE: A Tiny Search Engine
http://sewm.pku.edu.cn/src/TSE/
---------------------
References:
[1] Information Retrieval Resources, http://www.sigir.org/resources.html
[2] http://ir.dcs.gla.ac.uk/resources.html
[3] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[4] Diekemar, Information Retrieval Links, Jan. 28, 1999.
http://web.syr.edu/~diekemar/ir.html
[5] 陈鸿标,网上研习信息检索,1999年11月.
http://159.226.40.18/freshman/resources/网上研习信息检索.doc
[6] 数据挖掘研究院, http://www.dmresearch.net/
[7] 语音自然语言在线, http://www.snlpinfo.com/index.php
[8] PKU SEWM Group, http://sewm.pku.edu.cn/
[9] http://www.cs.cmu.edu/~callan/Teaching/Resources.html
[10] http://icl.pku.edu.cn/member/lisujian/maincontent.htm
[11] http://www.cs.fudan.edu.cn/mcwil/irnlp/link.htm
[12] Robert Krovetz, A Guide to the Literature of Information Retrieval,
http://159.226.40.18/freshman/resources/guide-to-ir-lit.ps
[13] ACM Digital Library,
http://portal.acm.org/portal.cfm
http://acm.lib.tsinghua.edu.cn/acm/
[14] http://www.sigir.org/proceedings/Proc-Browse.html
[15] SIGIR,
http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES278&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[16] WWW, International World Wide Web Conference
http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES968&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563
[17] China Digital Journal Community, http://wanfang.calis.edu.cn/wf/szhqk/index.html
---------------------
More details are listed as follows
====================
CIIR
(The Center for Intelligent Information Retrieval,
美国Massachusetts大学的智能信息检索中心)
http://ciir.cs.umass.edu/
The Center for Intelligent Information Retrieval, a National Science
Foundation-created S/IUCRC Center, is one of the leading information retrieval
research labs in the world. The CIIR develops tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.
Glasgow Information Retrieval Group
http://www.dcs.gla.ac.uk/ir/
由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。
这个小组理论和实践并重,旨在建造一个高效、新颖、成功的多媒体信息检索系统,
为终极用户服务。
The Information Retrieval Group led by Professor Keith van Rijsbergen has a
vigorous programme of research, based on both theory and experiment, aimed at
giving end-users novel, effective, and efficient access to the world of
multi-media information. The group, part of the Department of Computing Science,
University of Glasgow, has a strong research history in a wide area of
information retrieval research from theoretical modelling of the retrieval
process to advanced system building and to the user-oriented evaluation of
information retrieval systems. The group's interests also include many areas
of Web information retrieval such as link analysis, summarisation and the
development of novel interaction techniques (e.g., ostension, implicit feedback
and graphical visualisation). Our research preserves a strong emphasis on
the evaluation of interactive IR systems, and the group maintains strong links
with researchers in Human-Computer Interaction and Psychology.
------
Keith van Rijsbergen, http://www.dcs.gla.ac.uk/~keith/
英国格拉斯哥大学。概率IR的逻辑推理学派代表人,出版了著名的IR经典教材
INFORMATION RETRIEVAL, 重点介绍用概率研究信息检的方法。
=====================
Cambridge NLIP Group
(Natural Language and Information Processing Group)
http://www.cl.cam.ac.uk/Research/NL/
Karen Sparck Jones, http://www.cl.cam.ac.uk/users/ksj/
Karen Sparck Jones has been one of the most influential figures in Computing
since the 1950’s. Her work on Information Retrieval and Natural Language Processing
has never been so central as it is are today, with its implications for
search engine technology, the semantic web and even bioinformatics.
In 1972, Karen Sparck Jones published in the Journal of Documentation the paper
which defined the term weighting scheme now known as inverse document frequency (IDF).
1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树,
而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。
====================
LTI
CMU (Carnegie Mellon Universit) Language Technologies Institute,
http://www.lti.cs.cmu.edu/
The Language Technologies Institute (LTI) of the School of Computer Science at
Carnegie Mellon University conducts research and provides graduate education
in all aspects of language technology and information management. The LTI was
established in 1996, as an expansion of the Center for Machine Translation
(CMT).
------
Lemur Toolkit
Lemur is a collection of search engine algorithms and information retrieval
applications used for IR research, development and education. Lemur provides a
rich query language that supports search against simple texts, structured
(XML) texts, and texts annotated with part-of-speech, named-entity, and other
annotations used in NLP and text-mining applications. Lemur's search engines
comfortably support collections ranging from a few gigabytes to a few
terabytes of text. The software is distributed under open-source license, and
is used widely in the IR research community.
====================
Standford InfoLab
http://infolab.stanford.edu/
The Stanford WebBase Project
http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/
The Stanford WebBase project is investigating various issues in crawling,
storage, indexing, and querying of large collections of Web pages. The project
builds on the previous Google activity that was part of the DLI1 initiative.
The DLI2 WebBase project aims to build the necessary infrastructure to
facilitate the development and testing of new algorithms for clustering,
searching, mining, and classification of Web content.
====================
北大天网组, http://sewm.pku.edu.cn/
北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发,
技术积累深厚,综合实力和学术影响在国内一直处于领先地位。我们研发的
“天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎,从1997年10月
开始一直运行至今。“天网”在增量搜索技术、快速检索技术,海量信息存储
技术等方面都具有较强的优势,她的不断发展培育了一批批在海量网络文本
信息处理方面有实战经验的学生,受到中外IT企业的普遍欢迎。
从2001年开始,本研究组在搜索引擎技术的基础上,展开了中国互联网
信息历史的收集与存档工作,形成了“中国互联网信息博物馆”,至今已
收藏20亿在不同时期出现过的中文网页,是目前全国规模最大的历史网页收藏
与回放系统。同时,我们还尝试了在其基础上进行多学科交叉的研究。
====================
中科院大规模内容计算组
http://159.226.40.18/
信息检索小组主要针对文本信息的检索开展研究,多次参加TREC会议,
取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门
得到了广泛的应用,目前主要的研究方向包括WEB信息的获取,WEB信息检索等。
信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面,
主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层
自然语言处理等。小组研制了一系列文本信息加工处理的实验平台,目前实验
平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码
计划,其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。
====================
复旦大学信息检索和自然语言处理组,
http://www.cs.fudan.edu.cn/mcwil/irnlp/
大规模文本处理主要研究自然语言(特别是中文信息)的处理技术和方法,
包括二个方面内容:首先是基础性工作,主要是基础性的理论和算法, 包括
自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括
语料库的搜集整理等;其次是中文信息处理的应用技术,包括自动索引、
文本检索、文本摘要、文本分类和文本过滤,特别是上述技术在网络环境下
的应用。这部分工作是文本方向的研究重点。
====================
HIT-IRLab, http://ir.hit.edu.cn/
哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向
包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等, 研究室以
语言分析为基础研究,以文本过滤为应用研究,以信息抽取为语言分析从
句子理解向 篇章理解的延伸,以句子检索为在语言分析和篇章理解的支持
下的智能化精准检索技术。
====================
SIGIR(美国计算机学会信息检索特别兴趣小组)、
TREC(文本检索学术年会)
MUC(消息理解学术年会)
TIPSTER(美国国防部高级研究计划署的IR实践基地)
====================
北京大学计算语言学研究所
http://icl.pku.edu.cn/
北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言
信息处理的基础资源和应用技术三方面的研究。
围绕计算语言学和自然语言处理,包括如下三个主要的方向:首先基础资源
的研究与建设:计算词典学与机器词典,综合型语言知识库,语料库语言学与
语料库加工技术,术语学、术语自动提取、术语标准化研究等。其次是基础理论、
NLP的模型和方法:计算语言学基础,自然语言处理核心技术,现代汉语语法,
汉语的词/句法/语义分析,NLP统计模型,语言处理的信息论方法等。另外是
应用技术:机器翻译的方法、技术与系统实现,信息检索与提取,自然语言
信息处理系统的评价方法和技术,受限汉语及其辅助写作系统,中国古诗词计算机
辅助研究等。
====================
清华大学智能技术与系统国家重点实验室
http://www.csai.tsinghua.edu.cn/
智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月
对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究,
包括智能信息处理、机器学习、智能控制,以及神经网络理论等,还从事与
人工智能有关的应用技术与系统集成技术的研究,主要有智能机器人、声音、
图形、图像、文字及语言处理等。
================
Susan Dumais,
http://research.microsoft.com/~sdumais/
I am interested in algorithms and interfaces for improved information
retrieval, as well as general issues in and human-computer interaction. I
joined Microsoft Research in July 1997. I work on a wide variety of
information access and management issues, including: personal information
management, web search, question answering, information retrieval, text
categorization, collaborative filtering, interfaces for improved search and
navigation, and user/task modeling.
===============
UIUC Information Retrieval Group
http://sifaka.cs.uiuc.edu/ir/
The Information Retrieval (IR) group is part of the Database and Information
Systems (DAIS) Lab of the Computer Science Department at University of
Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the
general area of text information management, including retrieval,
organization, filtering , and mining of textual information, aiming at
developing advanced text information management techniques and systems that
help people make better use of text information.
------
ChengXiang Zhai,
http://www-faculty.cs.uiuc.edu/~czhai/
Research Interests: Information Retrieval, Text Mining, Natural Language
Processing, Bioinformatics
===============
Stephen Robertson,
http://research.microsoft.com/users/robertson/
Stephen Robertson joined Microsoft Research Cambridge in April 1998.
In 1998, he was awarded the Tony Kent STRIX award by the Institute of
Information Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.
He is a Fellow of Girton College, Cambridge.
===================
Nick Craswell
http://research.microsoft.com/~nickcr
I am an associate researcher at Microsoft Research Cambridge, in the
Information Retrieval and Analysis Group.
Research Overview
===============
Web Search & Data Mining Group of MSR Asia
http://research.microsoft.com/wsm/
The goal of the Web Search & Data Mining Group of MSR Asia is to drive the
next generation of Web search by leveraging data mining, machine learning, and
knowledge discovery techniques for information analysis, organization,
retrieval, and visualization. In addition, in contrast with current Web search
methods, which essentially do document-level ranking and retrieval, the Web
Search & Data Mining Group has created search at the object level to bring
increased knowledge and intelligence to users.
A Glimpse at Several Core Innovations:
Large-scale Experimental Web Search Platform
The Web Search & Data Mining Group is creating a large scale search platform
to efficiently store, parse, index and search billions of Web pages and other
types of documents. The search platform is flexible enough to allow for
testing of various state-of-the-art search techniques that have been created
at the lab using new technologies.
------
Wei-Ying Ma
http://research.microsoft.com/users/wyma/
Senior Researcher, Research Manager, Microsoft Research Asia
====================
Google Labs
http://labs.google.com/
http://labs.google.com/papers/index.html
Passionate about these topics? You should work at Google.
http://www.google.com/press/podium.html
Google Press Center: The Google Podium
Here you'll find a selection of public presentations made by Google
executives. From time to time, we will continue to add transcripts, audio or
video clips and links to presentations hosted elsewhere.
====================
Jon Kleinberg
http://www.cs.cornell.edu/home/kleinber/
Professor of Computer Science, Cornell University
My research is concerned with algorithms that exploit the combinatorial
structure of networks and information. My recent work has included
* link analysis and modeling of the World Wide Web and related information networks;
* discrete optimization and network algorithms; and
* algorithmic approaches to clustering, indexing, and data mining.
发表评论
-
Lucene 索引格式
2013-06-25 20:11 0索引结构: 索引层次 ... -
计算广告学
2012-08-12 13:53 0计算广告学一: 1、核 ... -
《Lucene in Action》简单笔记
2011-12-22 09:19 0第一章 Meet Lucene -
Information Retrieval Resources
2011-04-07 16:40 1417Information Retrieval Resource ... -
使用Jsoup抽取数据
2011-03-20 19:22 4978Jsoup是一个Java的HTML解析器,提供了非常方便的抽取 ... -
常见文件类型识别
2010-09-22 20:09 11932根据文件的后缀名识别文件类型并不准确,可以使用文件的头信息进行 ... -
Introduce to Inforamtion Retrieval读书笔记(2)
2009-10-31 13:02 1972The term vocabulary and posting ... -
Introduce to Inforamtion Retrieval读书笔记(1)
2009-10-25 23:49 2071很好的一本书,介绍的非常全面,看了很久了,还没有看完,刚看完前 ... -
Query Log Mining notes
2009-10-02 18:08 1287Enhancing Efficiency of Search ... -
百度搜索的一些高级语法
2009-08-27 20:06 19651.title语法 就是在title ... -
Hadoop好书推荐:Hadoop The Definitive Guide
2009-08-16 22:49 3661第一本详细介绍Hadoop的书籍,从网上下来看了几章,作者是H ... -
Java开源搜索引擎[收藏]
2008-04-24 00:09 2924Egothor Egothor是一个用Java编写的开 ... -
分享一本斯坦福的信息检索的教材
2008-01-04 23:59 2487斯坦福的信息检索的教材,还没出版,先分享一下电子版原稿. 对于 ... -
分享一本搜索引擎的电子书
2007-12-29 19:42 2562还没有来得及看,但搜索引擎的书不是很好找,先放上,希望对大家能 ... -
分享一个Nutch入门学习的资料
2007-12-18 20:49 4289分享一个Nutch入门学习的资料,感觉写的还不错. -
搜索引擎Nutch源代码研究之一 网页抓取(4)
2007-12-17 22:37 8415今天来看看Nutch如何Parse网页的: Nutch使用了两 ... -
[转]MAP/REDUCE:Google和Nutch实现异同及其他
2007-12-15 19:21 3018设计要素 nutch包含以下几个部分: 辅助类 Log:记载运 ... -
Nutch源代码学习小小总结一下
2007-12-15 19:13 4493我现在看得源码主要是网页抓取部分,这部分相对比较容易。我首先定 ... -
搜索引擎Nutch源代码研究之一 网页抓取(3)
2007-12-15 16:39 4611今天我们看看Nutch网页抓取,所用的几种数据结构: 主要涉及 ... -
搜索引擎Nutch源代码研究之一 网页抓取(2)
2007-12-15 00:36 5592今天我们来看看Nutch的源代码中的protocol-h ...
相关推荐
1. 算法资料的重要性:文件标题提到了“算法参考资料”,这说明内容涉及到了与算法相关的资料,算法是计算机科学和信息技术领域的基础与核心,它在软件开发、数据处理、人工智能等多个方向上都发挥着关键作用。...
由于提供的文件信息中标题为"1_副本14.txt",描述为"11111111111222222222",标签为空,且部分内容为"zz14",这些信息过于简单和抽象,无法提供具体的、详细的知识点。为满足字数要求,以下内容将基于这些信息的简单...
- **综合素质提升**:除了专业知识外,还能提高学生的文献检索、资料整理、论文撰写等多方面的能力。 综上所述,全国大学生数学建模竞赛不仅是一项重要的学术活动,也是大学生展示自己能力、提升综合素质的良好平台...
EndNote是一款强大的文献管理软件,广泛应用于科研、教育等领域,尤其在撰写学术论文时,它能帮助用户高效地整理、引用和管理参考文献。本文将深入探讨EndNote如何设置并使用国标格式,以满足国内学术规范的要求。 ...
因此,能够有效地从PDF文件中提取文本信息,对于信息检索、数据挖掘等领域具有重要意义。 ### 描述解析:py源码实例自动办公PDF_识别并读取PDF中的文字提取方式 这段描述进一步强调了这一实践案例的具体应用场景...
这些案例的收集和整理,对于律师、法官以及法学研究人员来说,是理解和预测法律判决结果、提高司法效率的重要工具。通过对这些案例进行深度学习和模式识别,可以辅助建立更精准的法律预测模型,优化司法决策过程。 ...