`

elk

阅读更多

       

日志监控系统

Collectd: gather statistics on performance, processes, and overall status of the system
Graphite: Store numeric time-series data & Render graphs of this data on demand
Grafana: visualization and dashboarding tool for time series data

一个日志监控系统包括:日志收集,日志存储,日志可视化。
分别对应Collectd,Graphite(不用graphite本身的web工具),Grafana。
时间序列数据库的替代方案有:elasticSearch,influxdb。
而logstash可以认为是日志收集和日志存储之间的桥梁,因为它提供了I/O配置方便地在不同系统中进行数据传输。
如果没有logstash作为桥梁,日志收集后怎么放到存储中是个问题,需要自己调用客户端API。

那么这些系统之间如何通信,如何组织?

  1. collectd负责收集数据,并通过network可以发送到logstash的指定端口
  2. logstash的输入是监听步骤1的端口,输出可以写到存储系统中,比如es,influxdb
  3. grafana通过配置数据源的方式,获取存储系统中的数据,进行可视化展现
Software Version 安装节点 软件说明
elasticSearch 2.3.3 192.168.6.52 索引数据库
logstash 2.3.2 本机 有Input/Outout,因此可以连接各种管道
collectd   192.168.6.52 收集机器的信息,性能,进程等
Grafana   192.168.6.52 可视化,可以接入不同的数据源es和influxdb
influxdb 0.13 192.168.6.52 时间序列数据库

ELK

LogStash

标准I/O

命令行输入输出,通过脚本执行,codec指定如何解码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cd logstash-2.3.2
$ bin/logstash -e "input {stdin{}} output {stdout{}}" <<< 'Hello World'
Settings: Default pipeline workers: 4
Pipeline main started
2016-05-25T04:00:29.531Z zqhmac Hello World
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

$ bin/logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}' <<< 'Hello World'
{
       "message" => "Hello World",
      "@version" => "1",
    "@timestamp" => "2016-05-25T04:01:09.095Z",
          "host" => "zqhmac"
}

$ vi logstash-simple.conf 
input { stdin {} }
output {
   stdout { codec=> rubydebug }
}
$ bin/logstash agent -f logstash-simple.conf --verbose

File Input

1
2
3
4
5
6
7
8
9
10
11
$ bin/logstash agent -f logstash-file.conf  --verbose
input {
    file {
        path => "/usr/install/cassandra/logs/system.log"
        start_position => beginning
        type => "cassandra"
    }
}
output {
   stdout { codec=> rubydebug }
}

日志文件

1
2
3
4
5
6
7
8
9
10
11
[qihuang.zheng@dp0652 logstash-1.5.0]$ head /usr/install/cassandra/logs/system.log
ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,502 GraphiteReporter.java:281 - Error sending to Graphite:
java.net.SocketException: 断开的管道
  at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]
  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_51]
  at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_51]
  at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[na:1.7.0_51]
  at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[na:1.7.0_51]
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[na:1.7.0_51]
  at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[na:1.7.0_51]
  at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) ~[na:1.7.0_51]

默认一行一个事件,对于有异常的日志文件来说,不经过任何处理肯定是不行的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Using version 0.1.x input plugin 'file'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x codec plugin 'plain'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x output plugin 'stdout'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x codec plugin 'rubydebug'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Registering file input {:path=>["/usr/install/cassandra/logs/system.log"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/qihuang.zheng/.sincedb_261f7476b9c9830f1fa5a51db2793e1e", :path=>["/usr/install/cassandra/logs/system.log"], :level=>:info}
Pipeline started {:level=>:info}
Logstash startup completed
{
       "message" => "ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,502 GraphiteReporter.java:281 - Error sending to Graphite:",
      "@version" => "1",
    "@timestamp" => "2016-06-20T08:42:35.543Z",
          "type" => "cassandra",
          "host" => "dp0652",
          "path" => "/usr/install/cassandra/logs/system.log"
}
{
       "message" => "java.net.SocketException: 断开的管道",
      "@version" => "1",
    "@timestamp" => "2016-06-20T08:42:35.544Z",
          "type" => "cassandra",
          "host" => "dp0652",
          "path" => "/usr/install/cassandra/logs/system.log"
}
{
       "message" => "\tat java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]",
      "@version" => "1",
    "@timestamp" => "2016-06-20T08:42:35.544Z",
          "type" => "cassandra",
          "host" => "dp0652",
          "path" => "/usr/install/cassandra/logs/system.log"
}

添加多行支持

1
2
3
4
5
6
7
8
9
10
11
input {
    file {
        path => "/usr/install/cassandra/logs/system.log"
        start_position => beginning
        type => "cassandra"
        codec => multiline {
          pattern => "^\s"
          what => "previous"
        }
    }
}

将任何以空白开始的行与上一行合并,但是这种方式还是不够理想。实际上下面两条记录应该属于一条

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
    "@timestamp" => "2016-06-20T08:49:54.521Z",
       "message" => "ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,850 GraphiteReporter.java:281 - Error sending to Graphite:",
      "@version" => "1",
          "type" => "cassandra",
          "host" => "dp0652",
          "path" => "/usr/install/cassandra/logs/system.log"
}
{
    "@timestamp" => "2016-06-20T08:49:54.521Z",
       "message" => "java.net.SocketException: 断开的管道\n\tat java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]\n\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_51]\n\tat java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[na:1.7.0_51]\n\tat java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[na:1.7.0_51]\n\tat java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) ~[na:1.7.0_51]\n\tat java.io.BufferedWriter.write(BufferedWriter.java:230) ~[na:1.7.0_51]\n\tat java.io.Writer.write(Writer.java:157) ~[na:1.7.0_51]\n\tat com.yammer.metrics.reporting.GraphiteReporter.sendToGraphite(GraphiteReporter.java:271) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.sendObjToGraphite(GraphiteReporter.java:265) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:304) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:26) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) [metrics-core-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:247) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:213) [metrics-graphite-2.2.0.jar:na]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]\n\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_51]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_51]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_51]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]\n\tat java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]",
      "@version" => "1",
          "tags" => [
        [0] "multiline"
    ],
          "type" => "cassandra",
          "host" => "dp0652",
          "path" => "/usr/install/cassandra/logs/system.log"
}

网上找到的一个Cassandra日志文件的配置:https://github.com/rustyrazorblade/dotfiles/blob/master/logstash.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
output {
    elasticsearch {
        hosts => ["192.168.6.52:9200"]
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
        document_type => "%{type}"
        workers => 2
        flush_size => 1000
        idle_flush_time => 5
        template_overwrite => true
    }  
    stdout { 
    }
}

input {
    file {
        path => "/usr/install/cassandra/logs/system.log"       
        start_position => beginning
        type => cassandra_system
    }
}

filter {
    if [type] == "cassandra" {
        grok {
            match => {"message" => "%{LOGLEVEL:level}  \[%{WORD:class}:%{NUMBER:line}\] %{TIMESTAMP_ISO8601:timestamp} %{WORD:file}\.java:%{NUMBER:line2} - %{GREEDYDATA:msg}"}
        }
    }
}

性能测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
input { 
  generator { 
    count => 30000000 
  } 
} 
output { 
  stdout { 
    codec => dots 
  } 
  kafka { 
    broker_list => "localhost:9092" 
    topic_id => "test" 
    compression_codec => "snappy" 
  } 
}

bin/logstash agent -f out.conf | pv -Wbart > /dev/null

1
2
3
4
5
6
7
8
9
10
11
12
topic_id => "test" 
compression_codec => "snappy" 
request_required_acks => 1 
serializer_class => "kafka.serializer.StringEncoder" 
request_timeout_ms => 10000 
producer_type => 'async' 
message_send_max_retries => 5 
retry_backoff_ms => 100 
queue_buffering_max_ms => 5000 
queue_buffering_max_messages => 10000 
queue_enqueue_timeout_ms => -1 
batch_num_messages => 1000

Collectd

在52上安装collectd,network插件表示要将当前机器的信息发送到远程服务器10.57.2.26的25856端口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ sudo yum install collectd
$ sudo mv /etc/collectd.conf /etc/collectd_backup.conf
$ sudo vi /etc/collectd.conf
Hostname "dp0652"
FQDNLookup   true
LoadPlugin interface
LoadPlugin cpu
LoadPlugin memory
LoadPlugin network
LoadPlugin df
LoadPlugin disk
<Plugin interface>
    Interface "eth0"
    IgnoreSelected false
</Plugin>
<Plugin network>
    Server "10.57.2.26" "25826"
</Plugin>
Include "/etc/collectd.d"
$ sudo service collectd start
Starting collectd:                                         [  OK  ]
$ sudo service collectd status
collectd (pid  9295) is running...

开发机器地址是10.57.2.26,监听25826端口,收集来自于collectd发送的信息。

即流程是:在192.168.6.52通过collectd收集系统信息,发送到10.57.2.26的logstash上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ bin/plugin list | grep collect
logstash-codec-collectd

$ vi collectd.conf
input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd { }
  }
}
output {
  stdout {
    codec => rubydebug 
  }
}
$ bin/logstash -f collectd.conf
Settings: Default pipeline workers: 4
Pipeline main started
{
               "host" => "dp0652",
         "@timestamp" => "2016-05-25T03:49:52.000Z",
             "plugin" => "cpu",
    "plugin_instance" => "21",
      "collectd_type" => "cpu",
      "type_instance" => "system",
              "value" => 1220568,
           "@version" => "1"
}....

ElasticSearch

启动ES

1
2
3
4
5
6
7
8
9
10
$ cd elasticsearch-2.3.3
$ vi config/elasticsearch.yml
cluster.name: es52
network.host: 192.168.6.52
#discovery.zen.ping.multicast.enabled: false
#http.cors.allow-origin: "/.*/"
#http.cors.enabled: true

$ bin/elasticsearch -d
$ curl http://192.168.6.52:9200/

LogStash+CollectD+ElasticSearch

上面把从collectd搜集到的数据打印到控制台,修改out转存到elasticsearch中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
$ vi elastic.conf
input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd { }
  }
}
output {
    elasticsearch {
        hosts => ["192.168.6.52:9200"]
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
        document_type => "%{type}"
        workers => 2
        flush_size => 1000
        idle_flush_time => 5
        template_overwrite => true
    }
}

$ bin/logstash -f elastic.conf

$ curl http://192.168.6.52:9200/_search?pretty

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 613,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "logstash-%{type}-2016.05.25",
      "_type" : "%{type}",
      "_id" : "AVTmTRv8ihrmowv7fKbo",
      "_score" : 1.0,
      "_source" : {
        "host" : "dp0652",
        "@timestamp" : "2016-05-25T05:04:42.000Z",
        "plugin" : "cpu",
        "plugin_instance" : "22",
        "collectd_type" : "cpu",
        "type_instance" : "steal",
        "value" : 0,
        "@version" : "1"
      }
    }, ......]
  }
}

最终的网络拓扑流程图如下:

实际上collectd是不同的服务器节点(比如Nginx服务器),而logstash和es只需要一台机器即可:

Kiabana

下载后如果ElasticSearch安装在本机,默认直接启动bin/kibana即可

https://www.elastic.co/guide/en/kibana/current/getting-started.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
wget -c https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json
wget -c https://github.com/bly2k/files/blob/master/accounts.zip?raw=true
wget -c https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
unzip accounts.zip
gunzip logs.jsonl.gz

shakespeare表结构

{
    "line_id": INT,
    "play_name": "String",
    "speech_number": INT,
    "line_number": "String",
    "speaker": "String",
    "text_entry": "String",
}

建立mapping表结构

curl -XPUT http://192.168.6.52:9200/shakespeare -d '
{
 "mappings" : {
  "_default_" : {
   "properties" : {
    "speaker" : {"type": "string", "index" : "not_analyzed" },
    "play_name" : {"type": "string", "index" : "not_analyzed" },
    "line_id" : { "type" : "integer" },
    "speech_number" : { "type" : "integer" }
   }
  }
 }
}
';

bank表结构

{
    "account_number": INT,
    "balance": INT,
    "firstname": "String",
    "lastname": "String",
    "age": INT,
    "gender": "M or F",
    "address": "String",
    "employer": "String",
    "email": "String",
    "city": "String",
    "state": "String"
}



//logstash-2015.05.18, logstash-2015.05.19, logstash-2015.05.20都要修改
curl -XPUT http://192.168.6.52:9200/logstash-2015.05.20 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';

curl -XPOST '192.168.6.52:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST '192.168.6.52:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST '192.168.6.52:9200/_bulk?pretty' --data-binary @logs.jsonl

curl '192.168.6.52:9200/_cat/indices?v'

[qihuang.zheng@dp0652 ~]$ curl '192.168.6.52:9200/_cat/indices?v'
health status index                                pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-%{type}-2016.05.25            5   1      12762            0      1.5mb          1.5mb
yellow open   logstash-cassandra_system-2016.06.20   5   1      17125            0      3.9mb          3.9mb
yellow open   logstash-cassandra_system-2016.06.21   5   1        262            0    233.7kb        233.7kb
yellow open   bank                                   5   1       1000            0    442.2kb        442.2kb
yellow open   .kibana                                1   1          5            0     23.3kb         23.3kb
yellow open   shakespeare                            5   1     111396            0     18.4mb         18.4mb
green  open   graylog_0                              1   0       8123            0      2.3mb          2.3mb
yellow open   logstash-2015.05.20                    5   1       4750            0     28.7mb         28.7mb
yellow open   logstash-2015.05.18                    5   1       4631            0     27.4mb         27.4mb
yellow open   logstash-cassandra_system-2016.06.22   5   1         42            0    146.5kb        146.5kb
yellow open   logstash-2015.05.19                    5   1       4624            0     27.8mb         27.8mb

InfluxDB

1
2
3
4
5
wget -c https://dl.influxdata.com/influxdb/releases/influxdb-0.13.0_linux_amd64.tar.gz
wget -c https://dl.influxdata.com/telegraf/releases/telegraf-0.13.1_linux_i386.tar.gz
wget -c https://dl.influxdata.com/chronograf/releases/chronograf-0.13.0-1.x86_64.rpm
wget -c https://dl.influxdata.com/kapacitor/releases/kapacitor-0.13.1_linux_amd64.tar.gz
sudo yum localinstall chronograf-0.13.0-1.x86_64.rpm

生成配置文件,启动时指定配置文件

1
2
3
$ cd influxdb-0.13.0-1
$ usr/bin/influxd config > influxdb.generated.conf
$ nohup usr/bin/influxd -config influxdb.generated.conf &

http://192.168.6.52:8083/

用命令行客户端创建数据库,插入数据,查询数据

1
2
3
4
5
6
7
8
9
10
11
$ usr/bin/influx
Connected to http://localhost:8086 version 0.13.x
InfluxDB shell 0.13.x
> CREATE DATABASE mydb
> USE mydb
> INSERT cpu,host=serverA,region=us_west value=0.64
> INSERT cpu,host=serverA,region=us_east value=0.45
> INSERT temperature,machine=unit42,type=assembly external=25,internal=37
> SELECT host, region, value FROM cpu
> SELECT * FROM temperature
> SELECT * FROM /.*/ LIMIT 1

在web页面也可以添加数据:

Grafana

grafana-1.9.1

由于Grafana是存静态的,你只需要下载源代码解压,将它部署在Nginx上面就可以了,或者可以用Python的SimpleHTTPServer来跑

1
2
3
$ wget http://grafanarel.s3.amazonaws.com/grafana-1.9.1.tar.gz
$ cd grafana-1.9.1
$ python -m SimpleHTTPServer 8383

http://192.168.6.52:8383

没有任何数据源时,页面是空白的:

添加了数据源后比如influxdb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cp config.sample.js config.js
$ vi config.js
datasources: {
  influxdb: {
    type: 'influxdb',
    url: "http://192.168.6.52:8086/db/cassandra-metrics",
    username: 'admin',
    password: 'admin',
  },
  grafana: {
    type: 'influxdb',
    url: "http://192.168.6.52:8086/db/grafana",
    username: 'admin',
    password: 'admin',
    grafanaDB: true
  },
},

重启python进程后,可以看到多了点东西(如果influxdb没有添加admin用户,上面的username和password可以去掉),

但是就是看不到配置相关的按钮,难道是没有权限?而且这个版本进来后,根本没有login页面。

grafana-2.x

https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.3-1463994644.linux-x64.tar.gz

如果用2.5以上的版本包括3.0,和1.9的目录结构相比发生很大变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[qihuang.zheng@dp0652 grafana-1.9.1]$ tree -L 1
.
├── app
├── build.txt
├── config.js
├── config.sample.js
├── css
├── font
├── img
├── index.html
├── LICENSE.md
├── NOTICE.md
├── plugins
├── README.md
├── test
└── vendor

[qihuang.zheng@dp0652 grafana-3.0.3-1463994644]$ tree -L 2
.
├── bin
│   ├── grafana-cli
│   ├── grafana-cli.md5
│   ├── grafana-server
│   └── grafana-server.md5
├── conf
│   ├── defaults.ini
│   └── sample.ini
├── LICENSE.md
├── NOTICE.md
├── public
│   ├── app
│   ├── css
│   ├── dashboards
│   ├── emails
│   ├── fonts
│   ├── img
│   ├── robots.txt
│   ├── sass
│   ├── test
│   ├── vendor
│   └── views
├── README.md
└── vendor
    └── phantomjs

如果也是用python -m SimpleHTTPServer 8383打开,浏览器不会显示任何东西

参照官方的安装指南分分钟搞定,而且貌似并没有依赖Web服务器之类的。

1
2
3
4
5
6
7
8
9
10
$ sudo yum install https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.4-1464167696.x86_64.rpm
$ sudo service grafana-server start
Starting Grafana Server: .... FAILED
$ ps -ef|grep grafana
grafana  16078     1  0 13:19 ?        00:00:00 /usr/sbin/grafana-server 
--pidfile=/var/run/grafana-server.pid --config=/etc/grafana/grafana.ini 
cfg:default.paths.data=/var/lib/grafana cfg:default.paths.logs=/var/log/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins
$ vi /var/log/grafana/grafana.log
$ sudo service grafana-server status
grafana-server (pid  16078) is running...

虽然看似失败了,不过日志文件中没有什么错误信息,打开:http://192.168.6.52:3000/
出现了登陆页面,admin/admin

而且grafana的图标是可以点的,也有数据源。首先添加influxdb的数据源

添加一个dashboards①,然后添加一个panel②,在数据源中选择influxdb

默认的查询语句:SELECT mean("value") FROM "measurement" WHERE $timeFilter GROUP BY time($interval) fill(null)

更改为:SELECT mean("value") FROM "cpu" WHERE "host" = "serverA" AND $timeFilter GROUP BY time($interval)

这时候上方会出现图,点击关闭,退出编辑状态。 注意不要点击眼睛,一旦变成灰色,表示不发送查询语句
正常q的内容是查询语句:curl -GET 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=SELECT value FROM cpu_load_short WHERE region='us-west'"

往influxdb中插入几条数据:

1
2
cpu,host=serverA,region=us_west value=0.36
cpu,host=serverA,region=us_west value=0.85

在右上角可以选择时间范围,如果时间超过了,没有数据,出现N/A。

Cassandra+InfluxDB+Grafana

使用Grafana监控Cassandra有两个步骤:

  1. 将Cassandra监控指标数据收集到InfluxDB中
  2. 在Grafana中配置InfluxDB的数据源,展现Cassandra指标数据

参考文档:
http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2
https://www.pythian.com/blog/monitoring-cassandra-grafana-influx-db/

其中步骤1有多种方案:

  1. 使用Graphite收集数据,发送到InfluxDB中
  2. 使用InfluxDB的telegraf输入插件收集数据

Graphite

1)下载metrics-graphite.jar放到Cassandra的lib目录下

2)修改influxdb的配置文件,其中database要对应influxdb中的数据库名称,这里为cassandra-metrics(要在influxdb中手动创建这个数据库)

参考文档中配置文件是config.toml,在新版本中启动influxdb时生成了一个配置文件,实际上是一样的。
旧版本的输入配置项是:[input_plugins.graphite],新版本的配置项为:[[graphite]]。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 influxdb-0.13.0-1]$ vi influxdb.generated.conf
[[graphite]]
  enabled = true
  bind-address = ":2003"
  protocol = "tcp"
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "1s"
  consistency-level = "one"
  separator = "."
  udp-read-buffer = 0
  # database = "graphite"
  database = "cassandra-metrics"
  udp_enabled = true

上面的配置文件表示,InfluxDB将打开2003端口,接收graphite类型的指标数据

3)重启influxdb,使配置文件生效

1
2
$ service influxdb reload  # 通过RPM方式安装时,不需要重启
$ influxdb -config influxdb.generated.conf reload

4)在cassandra的conf目录下创建influx的配置文件,其中host指的是influxdb的服务端地址,端口对应上面influxdb.generated.conf的2003端口。
如果influxdb安装在不同节点上,下面的host要指向influxdb的地址。 prefix通常设置为当前机器的IP地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 conf]$ vi /usr/install/cassandra/conf/influx-reporting.yaml
graphite:
-
  period: 60
  timeunit: 'SECONDS'
  prefix: 'Node1'
  hosts:
  - host: '192.168.6.52'
    port: 2003
  predicate:
    color: "white"
    useQualifiedName: true
    patterns:
    - ".*"

上面的配置文件表示:当前Cassandra节点的每个指标前缀是Node1,都会发送到地址为52的2003端口,即把Cassandra的指标数据发送给InfluxDB。
这里也可以看出指标数据发送的机器只是指定了地址和端口,所以目标地址可以是任何可以接收graphite类型的数据库,不一定是InfluxDB。

5)安装grafana,不需要配置config.js,在web页面也可以添加数据源

6)给cassandra-env.sh添加-D启动选项

1
2
[qihuang.zheng@dp0652 conf]$ vi /usr/install/cassandra/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=influx-reporting.yaml"

7)重启cassandra

1
2
INFO [main] YYYY-MM-DD HH:MM:SS,SSS CassandraDaemon.java:353 - Trying to load metrics-reporter-config from file: influx-reporting.yaml
INFO [main] YYYY-MM-DD HH:MM:SS,SSS GraphiteReporterConfig.java:68 - Enabling GraphiteReporter to 192.168.6.52:2003

8)验证Cassandra指标数据通过metrics-graphite.jar被收集到influxdb中

选择一个measurement,验证有数据被写入:select * FROM "Node1.jvm.daemon_thread_count"

9)在grafana中配置influxdb的数据源,把数据库名称改成cassandra-metrics

配置监控图,修改measurement

10)总结下通过Graphite+InfluxDB收集Cassandra指标数据的步骤流程图:

用正则表达式可以聚合多个节点的指标

1
2
select mean(value) from /.*org.apache.cassandra.metrics.ClientRequest.Read/
select mean(value) from "192.168.6.53.org.apache.cassandra.metrics.ClientRequest.Read.Latency.15MinuteRate"

Telegraph

Hekad

http://hekad.readthedocs.io/en/v0.10.0/index.html

Atlas

https://github.com/Netflix/atlas

Graylog

http://www.cnblogs.com/wjoyxt/p/4961262.html
http://docs.graylog.org/en/2.0/pages/installation/manual_setup.html

准备工作:mongodb安装和启动

1
2
3
4
5
6
7
8
9
10
11
12
mkdir -p /home/qihuang.zheng/data/mongodb
curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.2.7.tgz
tar zxf mongodb-linux-x86_64-3.2.7.tgz
nohup mongodb-linux-x86_64-3.2.7/bin/mongod --dbpath /home/qihuang.zheng/data/mongodb & 

sudo rpm -ivh pwgen-2.07-1.el6.x86_64.rpm && sudo yum install perl-Digest-SHA
[qihuang.zheng@dp0652 ~]$ pwgen -N 1 -s 96
XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic
[qihuang.zheng@dp0652 ~]$ echo -n XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic | shasum -a 256
cb535aa3ff35e81f69f9014005bcf1ad032048cc123dad735bbf87970eb2cacb  -
[qihuang.zheng@dp0652 ~]$ echo -n admin | shasum -a 256
8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918  -

graylog的配置文件中默认的配置项,单机情况可以不用修改任何配置,不过最好把localhost改成本机IP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ wget https://fossies.org/linux/misc/graylog-2.0.2.tgz
$ tar zxf graylog-VERSION.tgz && cd graylog
$ vi graylog.conf.example
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret =
root_password_sha2 =
rest_listen_uri = http://127.0.0.1:12900/
#elasticsearch_cluster_name = graylog
#elasticsearch_discovery_zen_ping_unicast_hosts = 127.0.0.1:9300, 127.0.0.2:9500
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
mongodb_uri = mongodb://localhost/graylog

配置文件的路径/etc/graylog/server/server.conf写死在bin/graylogctl脚本中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ sudo mkdir -p /etc/graylog/server/
$ sudo cp graylog.conf.example /etc/graylog/server/server.conf
$ cat /etc/graylog/server/server.conf |grep -v grep|grep -v ^#|grep -v ^$

password_secret=XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic
password=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
cluster=`cat ~/elasticsearch-2.3.3/config/elasticsearch.yml |grep cluster.name | awk '{print $2}'`
sed -i -e "s#password_secret =#password_secret = $password_secret#g" server.conf
sed -i -e "s#root_password_sha2 =#root_password_sha2 = $password#g" server.conf
sed -i -e "s#127.0.0.1#192.168.6.52#g" server.conf
sed -i -e "s#localhost#192.168.6.52#g" server.conf
sed -i -e "s#elasticsearch_shards = 4#elasticsearch_shards = 1#g" server.conf
sed -i -e "s#web_listen_uri = http://127.0.0.1:9000/#web_listen_uri = http://192.168.6.52:9999/#g" server.conf
######################################
elasticsearch_cluster_name = $cluster
elasticsearch_discovery_zen_ping_unicast_hosts = 192.168.6.52:9300

graylog的启动脚本在bin/graylogctl,实际上启动命令是java -jar graylog.jar

1
2
3
4
5
6
7
8
9
[qihuang.zheng@dp0652 graylog-2.0.2]$ ll
drwxr-xr-x 2 qihuang.zheng users     4096 620 13:37 bin
-rw-r--r-- 1 qihuang.zheng users    35147 526 23:31 COPYING
drwxr-xr-x 4 qihuang.zheng users     4096 620 12:58 data
-rw-r--r-- 1 qihuang.zheng users    23310 526 23:31 graylog.conf.example
-rw-r--r-- 1 qihuang.zheng users 80950701 526 23:34 graylog.jar    ⬅️
drwxr-xr-x 3 qihuang.zheng users     4096 620 11:55 lib
drwxr-xr-x 2 qihuang.zheng users     4096 620 13:02 log
drwxr-xr-x 2 qihuang.zheng users     4096 526 23:33 plugin

如果要自定义log配置,修改bin/graylogctl的start部分在-jar前添加配置文件路径

1
-Dlog4j.configurationFile=file:///home/qihuang.zheng/graylog-2.0.2/log4j2.xml -jar "${GRAYLOG_SERVER_JAR}" server

启动graylog,同时会启动web服务,默认端口是9000,不过和HDFS重了,所以上面把web_listen_uri修改成9999

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 ~]$ sudo graylog-2.0.2/bin/graylogctl start
Starting graylog-server ...
[qihuang.zheng@dp0652 ~]$ sudo graylog-2.0.2/bin/graylogctl status
graylog-server running with PID 8600
[qihuang.zheng@dp0652 ~]$ ll /etc/graylog/server/
-rw-r--r-- 1 root root    36 620 12:58 node-id
-rw-r--r-- 1 root root 23496 620 12:57 server.conf
[qihuang.zheng@dp0652 ~]$ cat /etc/graylog/server/node-id
eb943e44-3464-47be-9c07-2a554de71428
[qihuang.zheng@dp0652 graylog-2.0.2]$ ps -ef|grep graylog
root     33748     1 48 14:17 pts/0    00:01:20 /home/qihuang.zheng/jdk1.8.0_91/bin/java -Djava.library.path=bin/../lib/sigar -Xms1g -Xmx1g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar graylog.jar server -f /etc/graylog/server/server.conf -p /tmp/graylog.pid
506      38283 29183  0 14:20 pts/0    00:00:00 grep graylog
[qihuang.zheng@dp0652 graylog-2.0.2]$ cat /tmp/graylog.pid
33748

日志文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
[qihuang.zheng@dp0652 graylog-2.0.2]$ cat log/graylog-server.log
2016-06-20 14:17:46,602 INFO : kafka.log.LogManager - Loading logs.
2016-06-20 14:17:46,662 INFO : kafka.log.LogManager - Logs loading complete.
2016-06-20 14:17:46,662 INFO : org.graylog2.shared.journal.KafkaJournal - Initialized Kafka based journal at data/journal
2016-06-20 14:17:46,676 INFO : org.graylog2.shared.buffers.InputBufferImpl - Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 2 parallel message handlers.
2016-06-20 14:17:46,706 INFO : org.mongodb.driver.cluster - Cluster created with settings {hosts=[192.168.6.52:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2016-06-20 14:17:46,731 INFO : org.mongodb.driver.cluster - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, all=[ServerDescription{address=192.168.6.52:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2016-06-20 14:17:46,752 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:40}] to 192.168.6.52:27017
2016-06-20 14:17:46,753 INFO : org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=192.168.6.52:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 7]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=512946}
2016-06-20 14:17:46,758 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:41}] to 192.168.6.52:27017
2016-06-20 14:17:46,929 INFO : org.graylog2.plugin.system.NodeId - Node ID: eb943e44-3464-47be-9c07-2a554de71428
2016-06-20 14:17:46,978 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] version[2.3.2], pid[33748], build[b9e4a6a/2016-04-21T16:03:47Z]
2016-06-20 14:17:46,978 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] initializing ...
2016-06-20 14:17:46,984 INFO : org.elasticsearch.plugins - [graylog-eb943e44-3464-47be-9c07-2a554de71428] modules [], plugins [graylog-monitor], sites []
2016-06-20 14:17:48,243 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] initialized
2016-06-20 14:17:48,304 INFO : org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.2.4.Final
2016-06-20 14:17:48,419 INFO : org.graylog2.shared.buffers.ProcessBuffer - Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2016-06-20 14:17:49,918 INFO : org.graylog2.bindings.providers.RulesEngineProvider - No static rules file loaded.
2016-06-20 14:17:50,510 INFO : org.graylog2.bootstrap.ServerBootstrap - Graylog server 2.0.2 (4da1379) starting up
2016-06-20 14:17:50,514 WARN : org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2016-06-20 14:17:50,532 INFO : org.graylog2.shared.initializers.PeriodicalsService - Starting 24 periodicals ...
2016-06-20 14:17:50,538 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] starting ...
2016-06-20 14:17:50,544 INFO : org.graylog2.periodical.IndexRetentionThread - Elasticsearch cluster not available, skipping index retention checks.
2016-06-20 14:17:50,546 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:43}] to 192.168.6.52:27017
2016-06-20 14:17:50,546 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:42}] to 192.168.6.52:27017
2016-06-20 14:17:50,553 INFO : org.graylog2.periodical.IndexerClusterCheckerThread - Indexer not fully initialized yet. Skipping periodic cluster check.
2016-06-20 14:17:50,573 INFO : org.graylog2.shared.initializers.PeriodicalsService - Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2016-06-20 14:17:50,648 INFO : org.elasticsearch.transport - [graylog-eb943e44-3464-47be-9c07-2a554de71428] publish_address {127.0.0.1:9350}, bound_addresses {[::1]:9350}, {127.0.0.1:9350}
2016-06-20 14:17:50,654 INFO : org.elasticsearch.discovery - [graylog-eb943e44-3464-47be-9c07-2a554de71428] es52/rWpoduohQ1CXcyoAuwzNtg
2016-06-20 14:17:53,239 INFO : org.glassfish.grizzly.http.server.NetworkListener - Started listener bound to [192.168.6.52:9999]
2016-06-20 14:17:53,241 INFO : org.glassfish.grizzly.http.server.HttpServer - [HttpServer] Started.
2016-06-20 14:17:53,242 INFO : org.graylog2.initializers.WebInterfaceService - Started Web Interface at <http://192.168.6.52:9999/>
2016-06-20 14:17:53,657 WARN : org.elasticsearch.discovery - [graylog-eb943e44-3464-47be-9c07-2a554de71428] waited for 3s and no initial state was set by the discovery
2016-06-20 14:17:53,657 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] started
2016-06-20 14:17:53,730 INFO : org.elasticsearch.cluster.service - [graylog-eb943e44-3464-47be-9c07-2a554de71428] detected_master {Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300}, added {{Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300},}, reason: zen-disco-receive(from master [{Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300}])
2016-06-20 14:17:56,443 INFO : org.glassfish.grizzly.http.server.NetworkListener - Started listener bound to [192.168.6.52:12900]
2016-06-20 14:17:56,443 INFO : org.glassfish.grizzly.http.server.HttpServer - [HttpServer-1] Started.
2016-06-20 14:17:56,444 INFO : org.graylog2.shared.initializers.RestApiService - Started REST API at <http://192.168.6.52:12900/>
2016-06-20 14:17:56,445 INFO : org.graylog2.shared.initializers.ServiceManagerListener - Services are healthy
2016-06-20 14:17:56,446 INFO : org.graylog2.shared.initializers.InputSetupService - Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2016-06-20 14:17:56,446 INFO : org.graylog2.bootstrap.ServerBootstrap - Services started, startup times in ms: {JournalReader [RUNNING]=1, BufferSynchronizerService [RUNNING]=1, OutputSetupService [RUNNING]=1, InputSetupService [RUNNING]=2, MetricsReporterService [RUNNING]=2, KafkaJournal [RUNNING]=4, PeriodicalsService [RUNNING]=51, WebInterfaceService [RUNNING]=2705, IndexerSetupService [RUNNING]=3211, RestApiService [RUNNING]=5912}
2016-06-20 14:17:56,451 INFO : org.graylog2.bootstrap.ServerBootstrap - Graylog server up and running.
2016-06-20 14:18:00,548 INFO : org.graylog2.indexer.Deflector - Did not find an deflector alias. Setting one up now.
2016-06-20 14:18:00,552 INFO : org.graylog2.indexer.Deflector - There is no index target to point to. Creating one now.
2016-06-20 14:18:00,554 INFO : org.graylog2.indexer.Deflector - Cycling deflector to next index now.
2016-06-20 14:18:00,555 INFO : org.graylog2.indexer.Deflector - Cycling from <none> to <graylog_0>
2016-06-20 14:18:00,555 INFO : org.graylog2.indexer.Deflector - Creating index target <graylog_0>...
2016-06-20 14:18:00,614 INFO : org.graylog2.indexer.indices.Indices - Created Graylog index template "graylog-internal" in Elasticsearch.
2016-06-20 14:18:00,698 INFO : org.graylog2.indexer.Deflector - Waiting for index allocation of <graylog_0>
2016-06-20 14:18:00,800 INFO : org.graylog2.indexer.Deflector - Done!
2016-06-20 14:18:00,800 INFO : org.graylog2.indexer.Deflector - Pointing deflector to new target index....
2016-06-20 14:18:00,845 INFO : org.graylog2.system.jobs.SystemJobManager - Submitted SystemJob <bd3c64c0-36ae-11e6-bcbc-02423384d6ab> [org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob]
2016-06-20 14:18:00,845 INFO : org.graylog2.indexer.Deflector - Done!
2016-06-20 14:18:00,845 INFO : org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob - Calculating ranges for index graylog_0.
2016-06-20 14:18:00,964 INFO : org.graylog2.indexer.ranges.MongoIndexRangeService - Calculated range of [graylog_0] in [117ms].
2016-06-20 14:18:00,973 INFO : org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob - Created ranges for index graylog_0.
2016-06-20 14:18:00,973 INFO : org.graylog2.system.jobs.SystemJobManager - SystemJob <bd3c64c0-36ae-11e6-bcbc-02423384d6ab> [org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob] finished in 128ms.

查看ES是否创建索引

1
2
3
[qihuang.zheng@dp0652 graylog-2.0.2]$ curl http://192.168.6.52:9200/_cat/indices
yellow open logstash-%{type}-2016.05.25 5 1 12762 0 1.5mb 1.5mb
green  open graylog_0                   1 0     0 0  159b  159b

查看MongoDB是否创建数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[qihuang.zheng@dp0652 graylog-2.0.2]$ ~/mongodb-linux-x86_64-3.2.7/bin/mongo
MongoDB shell version: 3.2.7
connecting to: test
> show dbs;
graylog  0.001GB
local    0.000GB
test     0.000GB
> show collections
cluster_config
cluster_events
collectors
content_packs
grok_patterns
index_failures
index_ranges
nodes
notifications
pipeline_processor_pipelines
pipeline_processor_pipelines_streams
pipeline_processor_rules
roles
sessions
system_messages
users
> db.nodes.count()
1
> db.cluster_config.count()
11
> db.nodes.find()
{ "_id" : ObjectId("5767780ab01412219843147b"), "is_master" : true, "hostname" : "dp0652", "last_seen" : 1466404508, "transport_address" : "http://192.168.6.52:12900/", "type" : "SERVER", "node_id" : "eb943e44-3464-47be-9c07-2a554de71428" }
> db.cluster_config.distinct("type")
[
  "org.graylog2.bundles.ContentPackLoaderConfig",
  "org.graylog2.cluster.UserPermissionMigrationState",
  "org.graylog2.indexer.management.IndexManagementConfig",
  "org.graylog2.indexer.retention.strategies.ClosingRetentionStrategyConfig",
  "org.graylog2.indexer.retention.strategies.DeletionRetentionStrategyConfig",
  "org.graylog2.indexer.rotation.strategies.MessageCountRotationStrategyConfig",
  "org.graylog2.indexer.rotation.strategies.SizeBasedRotationStrategyConfig",
  "org.graylog2.indexer.rotation.strategies.TimeBasedRotationStrategyConfig",
  "org.graylog2.indexer.searches.SearchesClusterConfig",
  "org.graylog2.periodical.IndexRangesMigrationPeriodical.MongoIndexRangesMigrationComplete",
  "org.graylog2.plugin.cluster.ClusterId"
]

注意必须修改server.conf的elasticsearch_cluster_name配置项和已经安装的elasticsearch的名称一样,否则会报错:

1
2
3
4
5
6
7
8
2016-06-20 14:06:47,920 ERROR: org.graylog2.shared.rest.exceptionmappers.AnyExceptionClassMapper - Unhandled exception in REST resource
org.elasticsearch.discovery.MasterNotDiscoveredException
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226) ~[graylog.jar:?]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236) ~[graylog.jar:?]
        at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804) ~[graylog.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

用admin/admin登陆:http://192.168.6.52:9999,密码是echo -n admin | shasum -a 256中的admin

发送数据到gralog

UDP

1.在web页面的System/Inputs添加一个RAW UDP文本

2.模拟向端口输入数据

1
[qihuang.zheng@dp0652 graylog-2.0.2]$ echo "Hello Graylog, let's be friends." | nc -w 1 -u 127.0.0.1 5555

3.在Search可以查询到这条消息

GELF

1.配置gelf

2.发送gelf数据

curl -XPOST http://192.168.6.52:12201/gelf -p0 -d ‘{“short_message”:”Hello there”, “host”:”example.org”, “facility”:”test”, “_foo”:”bar”}’

Logstash

Input配置为GELF UDP,端口为12202,和GELF HTTP的12201区分开来

1
2
➜  logstash-2.3.2 bin/logstash -e 'input { stdin {} } output { gelf {host => "192.168.6.52" port => 12202 } }'
logstash message

syslog

kafka

1.创建topic

1
2
3
bin/kafka-topics.sh --create --zookeeper 192.168.6.55:2181,192.168.6.56:2181,192.168.6.57:2181 --replication-factor 1 --partitions 1 --topic graylog-test
bin/kafka-console-producer.sh --broker-list 192.168.6.55:9092,192.168.6.56:9092,192.168.6.57:9092 --topic graylog-test
hello msg from kafka

2.配置kafka输入

3.查看收集的日志

Graylog-Cassandra

https://github.com/Graylog2/graylog-plugin-metrics-reporter

Ref

分享到:
评论

相关推荐

    ELK 7.16.1 最新一键安装铂金离线包

    ELK(Elasticsearch, Logstash, Kibana)栈是大数据分析和日志管理领域广泛应用的开源解决方案。本文将详细介绍ELK 7.16.1版本的关键知识点,特别是针对最新官方修复的log4j漏洞,以及如何进行一键离线安装。 **...

    elk的docker-compose配置

    ELK(Elasticsearch, Logstash, Kibana)是一个流行的日志管理和分析栈,用于收集、解析、存储和可视化各种日志数据。在本文中,我们将深入探讨如何使用Docker Compose来设置一个完整的ELK环境。Docker Compose是一...

    ELK中文手册-清晰版

    ELK(Elasticsearch, Logstash, Kibana)栈是大数据分析和日志管理领域的一个强大工具组合,尤其在实时搜索、监控和数据分析方面表现出色。本手册提供了ELK的中文详细指南,旨在帮助用户更好地理解和应用这套系统。 ...

    华为大数据平台ELk组件产品文档

    华为大数据平台ELk组件产品文档,主要介绍华为FusionInsight C80版本的ELk组件功能和使用方法。ELk是由Elasticsearch、Logstash和Kibana组成的ELK Stack,广泛应用于日志分析和大数据处理领域。华为将此技术集成到其...

    centos7 elk 部署全过程

    在本文中,我们将详细探讨如何在CentOS 7操作系统上部署Elasticsearch 6.5.4、Logstash 6.5.4、Kibana 6.5.4以及log4j2,以此组成一个ELK(Elasticsearch、Logstash和Kibana)日志处理和可视化平台。这个过程将涉及...

    ELK测试数据包下载

    elk的测试数据。 ELK是三个开源软件的缩写,分别表示:Elasticsearch , Logstash, Kibana , 它们都是开源软件。新增了一个FileBeat,它是一个轻量级的日志收集处理工具(Agent),Filebeat占用资源少,适合于在各个...

    架构师日志平台ELKStack实践视频.zip

    1.elk简介、ES安装.flv 2.es集群.flv 3-logstash快速入门.flv 4-logstash收集系统日志-file.flv 5-logstash收集java日志-codec.flv 6-kibana介绍.flv 7-logstash收集nginx访问日志-json.flv 8-logstash收集syslog...

    ELK-14——300

    ### ELK-14——300:气体绝缘组合开关技术解析 #### 一、产品概述 ELK-14是ABB公司推出的一种气体绝缘组合开关(Gas-insulated Switchgear, GIS),工作电压等级为300kV。该产品采用模块化设计,具备极高的灵活性...

    云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备云

    云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备云计算面试题之ELK面试题,运维工程师必备...

    centos7--搭建部署ELK服务_xiaohuai0444167的博客-CSDN博客.doc

    CentOS 7 搭建 ELK 服务 ELK 服务是 Elastic Stack 的一部分,包括 Elasticsearch、Logstash 和 Kibana 等组件。ELK 服务可以用来实时地收集、处理和展示日志数据,提供了一个强大的日志分析和可视化平台。 一、...

    ELK_中文指南_指南

    ELK(Elasticsearch, Logstash, Kibana)栈是大数据分析和日志管理领域的一个流行解决方案。这个“ELK_中文指南_指南”压缩包文件提供了完整的中文版本,帮助用户更好地理解和应用ELK技术。 1. **Elasticsearch**:...

    使用kubernetes部署ELK日志系统

    在IT行业中,日志管理是监控和排查问题的关键环节,而ELK(Elasticsearch、Logstash、Kibana)栈则是广泛使用的日志收集、分析和可视化的工具。在这个场景下,我们将讨论如何利用Kubernetes(简称k8s)这一强大的...

    Centos7下搭建ELK日志分析系统

    【Centos7下搭建ELK日志分析系统】 ELK栈是日志管理和分析的强大工具,由Elasticsearch、Logstash、Kibana三个组件组成。Elasticsearch是一个分布式的实时搜索和分析引擎,用于存储、分析和检索大量数据。Logstash...

    ELK 5.5 环境搭建

    ### ELK 5.5 环境搭建与性能调优 #### 一、系统拓扑及注意事项 在搭建实时日志分析系统 ELK (ElasticSearch + Logstash + Kibana) 5.5 版本的过程中,需要注意以下几个关键点: 1. **Java 环境**:ELK 的三个组件...

    ELK大型环境部署

    ELK大型环境部署是在大数据环境下,对Elasticsearch、Logstash和Kibana(统称为ELK)的集中管理和大规模部署。Elasticsearch负责数据存储和搜索,Logstash负责数据的收集与处理,Kibana负责数据的展示和交互。ELK在...

    ELK Stack权威指南 第2版

    ELKstack是Elasticsearch、Logstash、Kibana三个开源软件的组合,是目前开源界流行的实时数据分析方案,成为实时日志处理领域开源界的选择。然而,ELKstack也并不是实时数据分析界的灵丹妙药,使用不恰当,反而会...

    ELK( ElasticSearch、Logstash和Kiabana)

    开源实时日志分析ELK平台能够完美的解决日志收集和日志检索、分析的问题,ELK就是指ElasticSearch、Logstash和Kiabana三个开源工具。 因为ELK是可以跨平台部署,因此非常适用于多平台部署的应用。 二 环境准备 1...

    ELK工具全套安装包资源

    ELK工具,全称为Elasticsearch、Logstash、Kibana,是三个开源软件的组合,广泛用于日志管理和分析领域。这套工具提供了一种高效且灵活的方式,用于收集、解析、存储、搜索和可视化大量数据。以下是关于ELK工具及其...

Global site tag (gtag.js) - Google Analytics