`
lxwt909
  • 浏览: 572685 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

跟益达学Solr5之批量索引JSON数据

    博客分类:
  • Solr
阅读更多

        假定你有这样一堆JSON数据,

 

[
  {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00},
  {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00},
  {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00},
  {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00},
  {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00},
  {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00},
  {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00},
  {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00},
  {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00},
  {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00},
  {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00},
  {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50},
  {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00},
  {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50},
  {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00},
  {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00},
  {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
]

   你想导入到Solr中进行索引,怎么办?其实Solr的Web UI界面就可以操作,在左侧有个Documents菜单,表示导入Document(当然也支持Document更新)的意思,Document加个s即表示支持批量导入多个Document,如图:

 Document Type即表示你的Document数据来源是什么,是来自于JSON,来自于XML,来自于CVS等等,

 

 Commit Within表示document提交必须在指定的毫秒数内完成,否则提交操作视为超时;

 Overwriter即表示是否覆盖索引目录下已有的索引数据,设置为false即表示不覆盖已有索引只在原来的基础上追加索引数据;

 Boost:表示设置Document的权重,默认值为1.0;

 如果你只是单个JSON对象需要导入,那直接选择Document Type为JSON即可,当你选择Document Type为JSON后,Document(s)文本框会提示一个示例,如图:

 当然你也可以选择
Document Type为Solr Command(raw XML or JSON),只不过这时候JSON数据格式就有特殊要求了,你的JSON数据格式需要这样定义:

{
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
    "add": {
        "doc": {.......}
    },
   ............. and so on.
}

    其中{.........}部分就是你的Document对象,其余部分为固定格式。使用这种格式正好弥补了Document Type为JSON这种方式只能一条一条的导入,效率太低,当你需要批量导入多个Document时,采用这种格式支持批量导入多个Document。

 

    如果你需要导入XML数据,你需要选择Document Type为XML,如图:

 <doc></doc>标签之间的就是你的XML数据,不过它跟Document Type选择为JSON有同样的弊端就是只支持单条导入,如果你需要批量导入XML数据,你同样可以选择Document Type为Solr Command(raw XML or JSON),只不过这时候,数据格式应该是类似这样的:

<add>
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>

    <doc>
        <id>xxxx</id>
        <name>xxxxxxxx</name>
        <age>xxxxxxxx</age>
    </doc>
    
    ............ and so on
</add>

    如果你想更新Document,那就把<add>元素改成<update>即可,同理还有<delete>你懂的,先前在讲post.jar的时候我有提到过,具体请参阅《跟益达学Solr5之玩转post.jar》,OK,说了那么多,那现在我就以JSON数据为例进行一个操作示范,假定我有这样一个JSON数据,如图:

     首先我们需要从JSON数据中提炼出Field域,并在我们的Schema.xml配置文件定义域,如图:

   然后我们需要把传统的JSON数据转换成Solr能识别的格式,如图:

{
	"add": {
		"doc": {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00}
	},
	"add": {
		"doc": {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00}
	},
	"add": {
		"doc": {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00}
	},
	"add": {
		"doc": {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00}
	},
	"add": {
		"doc": {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00}
	},
	"add": {
		"doc": {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00}
	},
	"add": {
		"doc": {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00}
	},
	"add": {
		"doc": {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00}
	},
	"add": {
		"doc": {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00}
	},
	"add": {
		"doc": {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00}
	},
	"add": {
		"doc": {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50}
	},
	"add": {
		"doc": {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00}
	},
	"add": {
		"doc": {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50}
	},
	"add": {
		"doc": {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00}
	},
	"add": {
		"doc": {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00}
	},
	"add": {
		"doc": {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00}
	}
}

    然后启动你的Tomcat,然后如图操作:

 

    提交后,执行查询,如图:

 as

   请注意Document Type选择项,如果你选择为JSON,那你将会收到这样一个异常,如图: 

    示例相关的配置以及测试数据,请看底下的附件,如果你们在操作过程中,遇到任何问题,请联系我,同时也欢迎各路Java高手加群一起交流学习,

   益达Q-Q:                7-3-6-0-3-1-3-0-5

 

   益达的Q-Q群:      1-0-5-0-9-8-8-0-6

 

 

   

 

 

   

  • 大小: 45.9 KB
  • 大小: 38.6 KB
  • 大小: 46.6 KB
  • 大小: 82.9 KB
  • 大小: 21.1 KB
  • 大小: 39.7 KB
  • 大小: 62.5 KB
  • 大小: 55.4 KB
  • 大小: 68.9 KB
7
0
分享到:
评论
1 楼 jacky5288 2015-09-27  
示例很清楚,加油继续,益达!

相关推荐

    跟益达学Solr5之增量索引MySQL数据库表数据

    本教程以"跟益达学Solr5之增量索引MySQL数据库表数据"为主题,旨在教授如何利用Solr5来实现对MySQL数据库表数据的增量索引,以便在海量数据中快速检索。 首先,我们需要了解Solr的基本架构。Solr运行在Jetty服务器...

    跟益达学Solr5之从MySQL数据库导入数据并索引

    《跟益达学Solr5之从MySQL数据库导入数据并索引》这篇文章主要探讨了如何使用Apache Solr 5从MySQL数据库中导入数据并建立索引,以便进行高效的全文搜索。Solr是一款强大的开源搜索服务器,它提供了丰富的查询语言、...

    跟益达学Solr5之索引文件夹下所有文件

    总之,"跟益达学Solr5之索引文件夹下所有文件"教程涵盖了从安装配置Solr5,创建核心,配置文件索引,到数据导入,查询优化以及监控维护等一系列步骤。通过学习这些内容,你可以掌握如何利用Solr5构建一个强大的文件...

    跟益达学Solr5之索引网络上远程文件

    本篇我们将深入探讨如何利用Solr5来索引网络上的远程文件,让数据检索的触角延伸到互联网的每一个角落。 首先,理解Solr的基本架构至关重要。Solr是以Lucene为基础的搜索服务器,它提供了分布式、可扩展、高可用的...

    跟益达学Solr5之使用Tika从PDF中提取数据导入索引

    在本篇博文中,“跟益达学Solr5之使用Tika从PDF中提取数据导入索引”,我们将探讨如何利用Apache Solr 5和Tika这两个强大的开源工具,从PDF文档中抽取数据并将其有效地导入到Solr索引库中。Apache Solr是一款功能...

    转自:跟益达学Solr5之玩转post.jar

    《跟益达学Solr5之玩转post.jar》这篇博文主要探讨了如何利用Solr的`post.jar`工具进行数据导入,这是Solr提供的一个非常实用的功能,用于快速将各种格式的数据导入到Solr索引中。在这个过程中,我们不仅会了解`post...

    跟益达学Solr5之使用IK分词器

    本篇将围绕“跟益达学Solr5之使用IK分词器”这一主题,详细讲解如何在Solr5中集成并运用IK分词器,以及它的工作原理和优化技巧。 首先,让我们了解下什么是分词器。在中文搜索引擎中,由于中文句子没有明显的分隔符...

    跟益达学Solr5之使用Ansj分词器

    《跟益达学Solr5之使用Ansj分词器》 在中文信息检索和文本分析领域,分词是至关重要的第一步。Solr,作为一款强大的开源搜索平台,提供了多种分词器供用户选择,其中之一就是Ansj分词器。这篇文章将深入探讨如何在...

    跟益达学Solr5之拼音分词

    《Solr5拼音分词深度解析》 在深入探讨Solr5的拼音分词之前,首先需要理解什么是Solr。Apache Solr是一款基于Lucene的开源搜索引擎,它提供了全文搜索、命中高亮、 faceted search(面向切面的搜索)、自动完成、...

    跟益达学Solr5之使用MMSeg4J分词器

    《Solr5与MMSeg4J分词器深度解析》 在中文信息检索和文本分析领域,分词是至关重要的第一步。Solr,作为一款强大的开源全文搜索引擎,提供了多种分词器供用户选择,其中之一就是MMSeg4J。本篇文章将带你深入学习...

    Solr 权威指南上下卷

    国内较早接触Solr的技术专家之一,长期致力于Solr的技术研究、实践和生产环境部署,是Solr社区的积极参与者和实践者,以让Solr技术能够在中国得到广泛应用不遗余力并乐此不疲。现就职于国美金融,曾就职于各种大大...

    益达新产品男士益达推出市场广告策划书学习教案.pptx

    5. **广告策略**:虽然具体内容未详述,但可以推测文档中可能会讨论针对男士益达产品的广告创意、目标受众定位、媒介选择、广告执行计划等方面,这些都是成功推广新产品的关键步骤。 6. **市场潜力与机会**:男性...

    益达口香糖广告策划书.doc

    【益达口香糖广告策划书】是一份详细的市场分析与广告策略规划文档,旨在为益达口香糖制定有效的营销推广方案。该策划书涵盖了市场分析、广告策略、促销计划以及广告效果预测等多个关键环节。 在【市场分析】部分,...

    实益达:首次公开发行股票招股说明书.PDF

    实益达:首次公开发行股票招股说明书.PDF

Global site tag (gtag.js) - Google Analytics