`

Solr: Introduction

    博客分类:
  • Solr
 
阅读更多

Solr  is  Searching On Lucene w/Replication

Specifically, Solr is a scalable, ready-to-deploy enterprise search engine that’s optimized to search large volumes of text-centric data and return results sorted by relevance.

 

  • Scalable—Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.
  • Ready to deploy—Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get started.
  • Optimized for search—Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.
  • Large volumes of documents—Solr is designed to deal with indexes containing many millions of documents.
  • Text-centric—Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents, and social messages such as tweets or blogs.
  • Results sorted by relevance—Solr returns documents in ranked order based on how relevant each document is to the user’s query.

Search engines like Solr are optimized to handle data exhibiting four main characteristics:

  • Text-centric
  • Read-dominant
  • Document-oriented
  • Flexible schema

You also want to consider which fields in your documents must be stored in Solr and which should be stored in another system, such as a database. A search engine isn’t the place to store data unless it’s useful for search or displaying results.

 

 

 

 Building a web-scale inverted index


It might surprise you that search engines like Google also use an inverted index for searching the web. In fact, the need to build a web-scale inverted index led to the invention of MapReduce.


MapReduce is a programming model that distributes large-scale data-processing operations across a cluster of commodity servers by formulating an algorithm into two phases: map and reduce. With its roots in functional programming, MapReduce was adapted by Google for building its massive inverted index to power web search.


Using MapReduce, the map phase produces a unique term and document ID where the term occurs. In the reduce phase, terms are sorted so that all term/docID pairs are sent to the same reducer process for each unique term. The reducer sums up all term frequencies for each term to generate the inverted index.

 

 ------------------------------------------------------------------------------------------------------------------------------------

Diagram of the main components of Solr 4



 

 

 

  • 大小: 157.9 KB
分享到:
评论

相关推荐

    Solr in action.mobi

    1 ■ Introduction to Solr 3 2 ■ Getting to know Solr 26 3 ■ Key Solr concepts 48 4 ■ Configuring Solr 82 5 ■ Indexing 116 6 ■ Text analysis 162 PART 2 CORE SOLR CAPABILITIES ........................

    Apache Solr(Apress,2015)

    The book, which assumes a basic knowledge of Java, starts with an introduction to Solr, followed by steps to setting it up, indexing your first set of documents, and searching them. It then covers the...

    Apache.Solr.Search.Patterns.1783981849

    We begin with a brief introduction of analyzers and tokenizers to understand the challenges associated with implementing large-scale indexing and multilingual search functionality. We then move on to ...

    Apache Solr Essentials(PACKT,2015)

    an introduction to the administration, monitoring, and tuning of a Solr instance, as well as the concepts of sharding and replication. Next, you'll learn about various Solr extensions and how to ...

    solr in action

    - 第一章“Introduction to Solr”(Solr简介)从第20页开始,包括以下几个小节: - **为什么需要搜索引擎?**(Why do I need a search engine?)讨论了搜索引擎的重要性及其应用场景。 - **什么是Solr?**(What...

    Practical Hadoop Ecosystem

    Chapter 1: Introduction Chapter 2: HDFS and MapReduce Part II: Storing & Querying Chapter 3: Apache Hive Chapter 4: Apache HBase Part III: Bulk Transferring & Streaming Chapter 5: Apache ...

    apache-solr-ref-guide-7.1.pdf

    “Indexing Using Client APIs”和“Introduction to Solr Indexing”部分提供了关于如何使用客户端API进行索引的概览和简介。 “Post Tool”部分讲解了Post工具的使用,这是一个简单的命令行工具,用于向Solr发送...

    Apache Solr Search Patterns(PACKT,2015)

    We begin with a brief introduction of analyzers and tokenizers to understand the challenges associated with implementing large-scale indexing and multilingual search functionality. We then move on to ...

    Catmandu:加德满都-根特大学图书馆开发的数据处理工具包

    名称Catmandu :: Introduction-加德满都数据处理工具包简介地位介绍Catmandu是作为项目的一部分开发的数据处理工具包。 Catmandu提供命令行客户端和一套工具,以简化数据的导入,存储,检索,导出和转换。 例如,要...

    The.Definitive.Guide.to.Drupal.7

    6. **使用 Simpletest 进行功能测试 (Chapter 23: Introduction to Functional Testing with Simpletest)** - **测试基础知识**:介绍功能测试的基本概念。 - **Simpletest 使用**:指导如何使用 Simpletest 进行...

    the definitive guide to drupal 7

    - **第 31 章:搜索和 Apache Solr 集成**(Chapter 31: Search and Apache Solr Integration) - 讲解了如何利用 Apache Solr 提升 Drupal 网站的搜索功能,包括全文搜索和结果排序。 - **第 32 章:用户体验**...

    Introduction to Search with Sphinx

    与常见的搜索引擎如Elasticsearch和Solr相比,Sphinx在某些情况下能提供更快的查询速度,尤其是当对数据集进行实时更新时。 其次,Sphinx采用了倒排索引技术,这种技术通过构建一个从关键词到数据记录的映射关系,...

    Big Data Made Easy - A Working Guide To The Complete Hadoop Toolset

    #### Introduction to Big Data and Hadoop In today's digital era, the volume and variety of data generated by businesses and organizations have grown exponentially. Traditional data processing methods...

    Text processing in java

    It provides an introduction to natural language processing using Lucene and Solr. It gives you tools and techniques to manage large collections of text data, whether they come from news feeds, ...

    Monitoring ElasticSearch

    ElasticSearch is a distributed search server similar to Apache Solr with a focus on large datasets, a schema-less setup, and high availability. This schema-free architecture allows ElasticSearch to ...

    Drupal.8.Development.Beginners.Guide.2nd.Edition.epub

    We then cover some advanced search concepts and walk you through the installation and integration of the Java-based Apache Solr search engine. Finally, you will explore and configure the built-in ...

    第二节课商品&会员模块详解1

    商品表`t_product`则包含商品的基本信息,如`id`(商品ID),`catalogID`(商品类别ID),`name`(商品名称),`introduction`(商品简介),`price`(定价),`nowPrice`(现价),`picture`(小图片地址),`score`(赠送积分),`...

    Geoserver用户手册

    1 Introduction 3 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

Global site tag (gtag.js) - Google Analytics