elasticsearch 笔记

zhangcaiyanbeyond

浏览: 223165 次
性别:
来自: 山西

最近访客更多访客>>

songhait

agapple

limengna845567

wangyy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

其他

elasticsearch

https://ruby-china.org/topics/32428

界面工具： http://localhost:9200/_plugin/head/

es 中指定某个字段为空必须用missing

\"must_not\":[{\"missing\":{\"field\":\"gongqiu.tags\"}}]

配置：默认搜索和创建索引的分词算法:   index.analysis.analyzer.default.type : "mmseg"

010 63005005

索引分片数:   index.number_of_shards: 5
索引副本数:   index.number_of_replicas: 1

type：创建类型
as： “company.id”
analyzer: 指定分词
boost：查询权重
char_filter => ["html_strip"] 过滤掉html字符
source:    通过 includes包含创建索引的时候自动创建指定的字段，而exclueds就是排除某些字段，其余的字段自动创建索引
parent:    http://log.medcl.net/item/2011/10/diving-into-elasticsearch-8-parent-child-feature-uses/
ttl :    索引生存时间，超过时间后，这索引会被删掉，如果不通过default指定，那么不回被删除
index: 指定是否分词，如果分词，指定用什么分词,可以包含 not_analyzed, no,   analyzed, analyzed_no_norms, not_analyzed_no_norms. not_analyzed是不分词，no是不会被查到， analyzed是指具体的分词，如mmseg等, ANALYZED_NO_NORMS，分词索引，不存储NORMS；not_analyzed_no_norms，不分词，索引，不存储NORMS,NORMS存储了boost所需信,默认是配置文件中指定的分词,这里没有指定的话，则不分词。
store：包括yes和no和compress，yes为存储，no为不存储，COMPRESS为压缩存储,用于长文本或二进制，但性能受损
term_vector: http://blog.sina.com.cn/s/blog_6d01e21a0101iq4z.html
include_in_all: 制定是否包含到_all字段，默认自动包含

指定分词的情况下，对该词进行分词
http://218.240.21.106:9200/jobs/_analyze?analyzer=mmseg&text=%E4%B8%9C%E8%BD%AF

query

一旦使用search，必须至少提供query参数，然后在这个query的基础上进行接下来其他的检索。query参数又分三类：

    "match_all" : { } 直接请求全部；
    "term"/"text"/"prefix"/"wildcard" : { "key" : "value" } 根据字符串搜索(严格相等/片断/前缀/匹配符);
    terms :   精确匹配多个关键字，eg:   curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {terms: {name: ["zhang", "zhangcaiyan"]} }} }'
    term 是严格相等的查询，例如我查找中国人民，那么他就是使用中国人民去查询
    text   是片段查询，例如我同样查找中国人民，那么他有可能是用中国人民去查询
    query_string :

    "range" : { "@timestamp" : { "from" : "now-1d", "to" : "now" } } 根据范围搜索，如果type是时间格式，可以使用内置的now表示当前，然后用-1d/h/m/s来往前推,参数有from、 to、 include_lower、 include_upper、gt、gte、lt、lte。
    "range" : { "age" : { "from" : 10, "to" : 20 } }


    match 完全匹配，使用的时候，可以使用operator来指定是and 和 or，默认是or，例如下面 _all字段中必须同时含有zcy和z的数据，才可以被查出来
curl -XGET http://localhost:9200/infos/_search?pretty=true -d '{query: {match: {_all: {query: "zcy z", operator: "and"}}} }'
curl -XGET http://localhost:9200/infos/_search?pretty=true -d '{query: {match: {_all: "zcy z"} } }'
    match: 好像和text是一样的
    multi_match: 通过fields可以指定多个字段去匹配内容
    curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {multi_match: { query: "zcy z", fields: ["name", "desc"] }} }'

bool: must和must_not和should，通过must指定必须符合的条件，must_not指定必须不符合的条件，should 是指可以符合某个条件，这三个是可以结合使用，我们总会遇到一些这样的需求，必须。。。，或者可以。。。。，没有出现也没关系，出现的话就索引出来。

curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {bool: { must: {term: {name: "zcy"}} }} }'

ids: id查询，curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {ids: { values: ["2", "3", "31", "50"] }} }'


filter 过滤，这个查询很快，因为不需要执行打分过程，上面提到的query的参数，在filter中也都存在。此外，还有比较重要的参数就是连接操作： curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {wildcard: { name: "*z*" }}, filter: {range: {id: {from: 1, to: 10}} } }'

    "or"/"and" : [{"range":{}}, {"prefix":""}] 两个filter的查询，交集或者合集；
    "bool" : ["must":{},"must_not":{},"should":{}] 上面的and虽然更快，但是只能支持两个，超过两个的，要用 bool 方法；
    "not"/"limit" : {} 取反和限定执行数。注意这个limit和mysql什么的有点不同：它限定的是在每个上执行多少条。如果你有5个，其实对整个index是limit了5倍大小的设定值。

另一点比较关键的是：filter结果默认是不缓存的，如果常用，需要指定 "_cache" : true。

    fuzzy: 模糊查询，和min_similarity配合使用，当查找的数据是数字时，min_similarity可以指定数字，表示在该数字增加或减小这个值的范围内的查询，例如： curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {fuzzy: {age: {value: 10, min_similarity: 9 } }} }'，如果查找的是日期，那么min_similarity指定的是日期，这个日期代表查找该日期的上下浮动范围。例如： curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {fuzzy: {created_at: {value: "2013-05-07", min_similarity: "2d" } }} }'，如果是查找的是字符串，那我就不知道是啥意思了

query_string:   查询，这种查询非常灵活，可以match/text、wildcard查询，wildcard通配符查询，可以通过参数allow_leading_wildcard来实现，这个参数默认为true。    详情见http://www.elasticsearch.cn/guide/reference/query-dsl/query-string-query.html，拥有参数 fields, query 、default_field、 default_operator（默认or），eg： curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {query_string: { fields: ["name"], query: "z* zhang", default_operator: "and" } } }'

filtered: 过滤查询，参数有query和filter，filter中有and、or、bool、not等，and和or里有filters参数，通过range和prefix指定过滤条件。 http://www.elasticsearch.cn/guide/reference/query-dsl/filtered-query.html
eg:    and:     curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*"}}, filter:{and: [{prefix: {name: "zhangcaiyan"}}]} }}}'
          bool:    curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*zhang*"}}, filter:{bool: {must: {term: {name: "zhang"}}} } }}}'
           not:     curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*zhang*"}}, filter:{ not: {prefix: {name: "z"}} }}}}'
           numeric_rage:        curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*zhang*"}}, filter:{ numeric_range:{age: {from: 10, to: 20} } }}}}'
          or:    curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*"}}, filter:{or: [{prefix: {name: "zhangcaiyan"}}]} }}}'
           prefix:     curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*zhang*"}}, filter:{ not: {prefix: {name: "z"}} }}}}'
          range:       curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {filtered: {query: {wildcard: {_all: "*zhang*"}}, filter:{ range: {age: {from: "10"}} }}}}',   "range" : { "@timestamp" : { "from" : "now-1d", "to" : "now" } } 根据范围搜索，如果type是时间格式，可以使用内置的now表示当前，然后用-1d/h/m/s来往前推,参数有from、 to、 include_lower、 include_upper、gt、gte、lt、lte。
          term:        curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {constant_score: {filter: {term: {name: "zhang"}}} }}'

facets

facets接口可以根据query返回统计数据，最基础的是terms和statistical两种。不过在日志分析的情况下，最常用的是：

"histogram" : { "key_field" : "", "value_field" : "", "interval" : "" } 根据时间间隔返回柱状图式的统计数据；
"terms_stats" : { "key_field" : "", "value_field" : "" } 根据key的情况返回value的统计数据，类似group by的意思。

curl -XGET http://localhost:9200/jobs/_search?pretty=true -d'{"facets": {"cate": {"terms": {"field": "region_id"} } }}'

这里就涉及到前面mapping里为什么针对每个field都设定type的原因了。因为 histogram 里的 key_field 只能是 dateOptionalTime 格式的，value_field 只能是 string 格式的；而 terms_stats 里的 key_field 只能是 string 格式的，value_field 只能是 numberic 格式的。

而我们都知道，http code那些200/304/400/503神马的，看起来是数字，我们却需要的是他们的count数据，不是算他们的平均数。所以不能由ES动态的认定为long，得指定为string。

SpanFirstQuery仅取在开头部分包含查询词的文档，span_first、 end, span_term需要满足的查询，end 为integer，决定哪里算开头， eg： curl -XGET http://localhost:9200/infos/_search?pretty=true -d'{query: {span_first: {match: {span_term: {name: "yan"}},end: 3 }} }'

映`射_source和_all
Mappings定义了你的文档如何被索引和存储。你可以，比如说，定义每个字段的类型——比如你的syslog里，消息肯定是字符串，严重性可以是整数。怎么定义映射参见链接。
默认情况下，除了给你所有的字段分别创建索引，elasticsearch还会把他们一起放进一个叫_all的新字段里做索引。好处是你可以在_all里搜索那些你不在乎在哪个字段找到的东西。另一面是在创建索引和增大索引大小的时候会使用额外更多的CPU。所以如果你不用这个特性的话，关掉它。即使你用，最好也考虑一下定义清楚限定哪些字段包含进_all里。详见链接。

bin/elasticsearch
bin/elasticsearch -f 前台运行

https://github.com/elasticsearch/elasticsearch-servicewrapper

bin/service/elasticsearch start来运行elasticsearch
bin/service/elasticsearch stop来停止elasticsearch
bin/service/elasticsearch console让elasticsearch在前台运行
bin/service/elasticsearch install让elasticsearch安装为服务
bin/service/elasticsearch remove删除elasticsearch服务

安装为服务以后就可以通过

service elasticsearch start来开启elasticsearch
service elasticsearch restart来重启elasticsearch
service elasticsearch stop来关闭elasticsearch

Es Mapping篇主要是讲解Mapping的一些相关配置与需要注意的地方，说到Mapping大家可能觉得有些不解，其实我大体上可以将Es 理解为一个数据管理平台，那么index 当然就是库了，type可以理解为表,mapping可以理解为表的结构和相关设置的信息（当然mapping有更大范围的意思）。Mapping的作用域也是从cluster、node、index、type。

每个索引在创建时可以让一个特定的设置项与其关联。

curl -XPUT 'http://localhost:9200/twitter/' -d '
index :
    number_of_shards : 3
    number_of_replicas : 2
'

curl -XPUT 'http://localhost:9200/twitter/' -d '{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 2
        }
    }
}'

curl -XPUT 'http://localhost:9200/twitter/' -d '{
    "settings" : {
        "number_of_shards" : 3,
        "number_of_replicas" : 2
    }
}'

你不需要在 settings 项中显示的指定number_of_shards index

更新索引设置：

curl -XPUT 'localhost:9200/twitter/_settings' -d '{
    "index" : {
        "number_of_replicas" : 1
    }
}'

查看索引设置： curl -XGET 'localhost:9200/twitter/_settings'

删除索引：
curl -XDELETE http://localhost:9200/blog

关闭和打开索引：

curl -XPOST 'http://localhost:9200/blog/_close'

curl -XPOST 'http://localhost:9200/blog/_open'

7 获取mapping

curl -XGET 'http://localhost:9200/blog/_mapping?pretty=true'         # 获取一个索引上定义的mapping

curl -XGET 'http://localhost:9200/blog/topic/_mapping?pretty=true'   # 获取一个类型上定义的mapping

curl -XGET 'http://localhost:9200/_mapping?pretty=true'              # 获取所有的mapping定义

8 获取状态

curl -XGET 'http://localhost:9200/blog/_status?pretty=true'

9 获取统计信息

curl -XGET 'http://localhost:9200/_stats?pretty=true'

boosting      http://www.elasticsearch.org/guide/reference/query-dsl/boosting-query/
field    http://www.elasticsearch.cn/guide/reference/query-dsl/field-query.html
fuzzy    如果查找的是字符串，那min_similarity是起什么作用的啊
has_child
more_like_this http://www.elasticsearch.cn/guide/reference/query-dsl/mlt-query.html
more_like_this_field
span_near
span_not
span_or
top_children
nested
custom_filters_score
script     http://www.elasticsearch.cn/guide/reference/query-dsl/script-filter.html

    @gongqius = Gongqiu.tire.search(page: (params[:page] || 1), per_page: params[:per_page] || 20) do |search|

      if @keywords.present?
        search.query do |q|
          q.string @keywords
        end
      end

      # s = Tire.search('articles') { query { string 'title:T*' } }
      # s.filter :terms, :tags => ['ruby']
      # p s.results
      # p s.to_curl

      # 某种条件下的总数，可以指定and或or,and为多个条件并且，or为或
      # key = "xinxi_status_id"
      # temp_hash = {"facet_filter"=>{"and"=>[
      #   {"terms"=>{:category_id=>[8]}},
      #   {"terms"=>{:price_unit_id=>[1]}}
      # ]}}
      # search.facet(key, temp_hash) do |facet|
      #   facet.terms key, size: 20
      # end

      # search.filter :terms, {xinxi_status_id: [1,2,3], name: ["11"]}   过滤

      # search.sort{by :name, 'desc'}    查找排序

      search.query do |q|
        q.string "name:#{@keywords}"
      end if nil

      search.query do |q|
        q.string @keywords
      end if nil

      search.query do |query|
        query.boolean do |b|
          if @keywords.present?
            b.must {|tm| tm.string @keywords }
            # b.must_not
            # b.should
          end
        end
      end if nil # bool

      if nil
        if @q.present? || @locations.present?
          search.query do |query|
            query.boolean do |b|
              if @q.present?
                b.must {|tm| tm.string @q }
              end

              if @locations.present?
                b.must {|tm| tm.string ("region:"+@locations) }
              end
            end
          end
        end

        search.highlight :description=>{"fragment_size" => 180, "number_of_fragments" => @filter_hash[:id].present? ? 0 : 1 },
          :name=>{"number_of_fragments" => 0},
          :company_name=>{"number_of_fragments" => 0},
          :region=>{"number_of_fragments" => 0},
          :industries_text=>{"number_of_fragments" => 0},
          :company_type=>{"number_of_fragments" => 0},
          :degree=>{"number_of_fragments" => 0},
          :job_class=>{"number_of_fragments" => 0},

ptions => { :tag => '<strong class="highlight">' }

        @facet_attributes.keys.each do |key|
          search.facet(key.to_s,temp_hash) do |facet|
            facet.terms key.to_s,:size=>20
          end
        end

        @filter_hash.each do |key, value|
          search.filter :terms, key => value
        end

        search.sort { by :published_at, 'desc' }

      end
    end

    @facets = @gongqius.facets

查找具体的分词结果： http://host:port/index/_analyze?text=someting&analyzer=yourAnalyzer
http://localhost:9200/_plugin/rtf/

分享到：

nfs挂载 | Capistrano 使用

2015-04-27 22:12
浏览 27786
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论