`

flume ng arc and configuration

 
阅读更多

Please ref flume user guide first

http://flume.apache.org/FlumeUserGuide.html

And the Cloudera flume blogs

http://blog.cloudera.com/blog/category/flume/

 

How to define JAVA_HOME, java options and add our customized lib into flume-ng.

All these information will be defined in FLUME_CONFI_DIR/flume-env.sh

Example like below.

JAVA_HOME=/opt/java 

JAVA_OPTS="-Xms200m -Xmx200m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=3669 -Dflume.called.from.service" 

FLUME_CLASSPATH=/opt/sponge/flume/lib/*

 

How start flume-ng as agent

Please note we should name the flume collector name to hostname_agent and this name will be used in the flume-conf-agent.properties

$/usr/lib/flume/bin/flume-ng agent --conf /opt/sponge/flume/config/   --conf-file /opt/sponge/flume/conf/flume-conf-agent.properties  --name hostname_agent &

 

How to start flume-en as collector

Please note we should name the flume collector name to hostname_collector and this name will be used in the flume-conf-collector.properties

     $/usr/lib/flume/bin/flume-ng agent --conf /opt/sponge/flume/config/   --conf-file /opt/sponge/flume/conf/flume-conf-collector.properties  --name hostname_collector &

 

How to define the flume agent and flume collector property file.

I’ve already committed 2 different property files to https://svn.nam.nsroot.net:9050/svn/153299/elf/sponge-branches/2013-03-14-FlumeNG/sponge/myflumeng/config

Please ref flume-conf-agent.properties and flume-conf-collector.properties.

The basic name convention are

1)each agent name will be set as hostname_agent

2)each collector name will be set as hostname_collector

3)the source names will be source1, source2,source3…..

4)the sink name will be avroSink1, avroSink2, avroSink3….

5)each sink’s interceptor will be set as interceptor1, interceptor2, interceptor3 ….

6)all agent sinks will be AVRO sink.

7)the default collector source is AVRO source

8)agent sinks are load balanced as round robin

9)file channel is default for both agent and collector

 

flume-conf-agent.properties

hostname_agent.sources = source1, source2

hostname_agent.channels = fileChannel

hostname_agent.sinks = avroSink1, avroSink2

 

# For each one of the sources, the type is defined

hostname_agent.sources.source1.type = exec

hostname_agent.sources.source1.command = tail -F /var/log/audit/audit.log

hostname_agent.sources.source1.channels = fileChannel

hostname_agent.sources.source1.batchSize=10

 

hostname_agent.sources.source2.type = exec

hostname_agent.sources.source2.command = tail -F /var/log/flume/flume.log

hostname_agent.sources.source2.channels = fileChannel

hostname_agent.sources.source2.batchSize=10

 

# For each one of the sources, the log interceptor is defined

hostname_agent.sources.source1.interceptors = logIntercept1

hostname_agent.sources.source1.interceptors.logIntercept1.type = com.citi.sponge.flume.sink.LogInterceptor$Builder

hostname_agent.sources.source1.interceptors.logIntercept1.preserveExisting = false

hostname_agent.sources.source1.interceptors.logIntercept1.hostName = hostname

hostname_agent.sources.source1.interceptors.logIntercept1.env = PROD

hostname_agent.sources.source1.interceptors.logIntercept1.logType = AUDIT_LOG

hostname_agent.sources.source1.interceptors.logIntercept1.appId = 111111

hostname_agent.sources.source1.interceptors.logIntercept1.logFilePath = /var/log/audit

hostname_agent.sources.source1.interceptors.logIntercept1.logFileName = audit.log

  

hostname_agent.sources.source2.interceptors = logIntercept2

hostname_agent.sources.source2.interceptors.logIntercept2.type = com.citi.sponge.flume.sink.LogInterceptor$Builder

hostname_agent.sources.source2.interceptors.logIntercept2.preserveExisting = false

hostname_agent.sources.source2.interceptors.logIntercept2.hostName = hostname

hostname_agent.sources.source2.interceptors.logIntercept2.env = PROD

hostname_agent.sources.source2.interceptors.logIntercept2.logType = FLUME

hostname_agent.sources.source2.interceptors.logIntercept2.appId = 111111

hostname_agent.sources.source2.interceptors.logIntercept2.logFilePath = /var/log/flume

hostname_agent.sources.source2.interceptors.logIntercept2.logFileName = flume.log

 

 

#for each of the sink, type is defined

hostname_agent.sinks.avroSink1.type = avro

hostname_agent.sinks.avroSink1.hostname=collector1

hostname_agent.sinks.avroSink1.port=1442

hostname_agent.sinks.avroSink1.batchSize=10 

hostname_agent.sinks.avroSink1.channel = fileChannel 

 

hostname_agent.sinks.avroSink2.type = avro

hostname_agent.sinks.avroSink2.hostname=collector2

hostname_agent.sinks.avroSink2.port=1442

hostname_agent.sinks.avroSink2.batchSize=10 

hostname_agent.sinks.avroSink2.channel = fileChannel 

 

 

#Specify the load balance configurations for sinks

agent.sinkgroups = sinkGroup

agent.sinkgroups.sinkGroup.sinks = avroSink1 avroSink2

agent.sinkgroups.sinkGroup.processor.type = load_balance

agent.sinkgroups.sinkGroup.processor.backoff = true

agent.sinkgroups.sinkGroup.processor.selector = round_robin

agent.sinkgroups.sinkGroup.processor.selector.maxBackoffMillis=30000

 

 

# Each channel's type is defined.

hostname_agent.channels.fileChannel.type = file

hostname_agent.channels.fileChannel.checkpointDir = /opt/sponge/file-channel/checkpoint

hostname_agent.channels.fileChannel.dataDirs = /opt/sponge/file-channel/dataDirs

hostname_agent.channels.fileChannel.transactionCapacity = 1000

hostname_agent.channels.fileChannel.checkpointInterval = 30000

hostname_agent.channels.fileChannel.maxFileSize = 2146435071

hostname_agent.channels.fileChannel.minimumRequiredSpace = 524288000

hostname_agent.channels.fileChannel.keep-alive = 5

hostname_agent.channels.fileChannel.write-timeout = 5

hostname_agent.channels.fileChannel.checkpoint-timeout = 600

 

flume-collector.properties

hostname_collector.sources = source

hostname_collector.channels = fileChannel

hostname_collector.sinks = hbaseSink

 

 

# For each one of the sources, the type is defined

hostname_collector.sources.avroSource.channels = fileChannel

hostname_collector.sources.avroSource.type = avro

hostname_collector.sources.avroSource.bind = hostname

hostname_collector.sources.avroSource.port = 1442 

 

hostname_collector.sinks.hbaseSink.type=org.apache.flume.sink.hbase.HBaseSink

hostname_collector.sinks.hbaseSink.table=spong_flumeng_log2

hostname_collector.sinks.hbaseSink.columnFamily=content

hostname_collector.sinks.hbaseSink.serializer=com.citi.sponge.flume.sink.LogHbaseEventSerializer

hostname_collector.sinks.hbaseSink.timeout=120

hostname_collector.sinks.hbaseSink.column=log

hostname_collector.sinks.hbaseSink.batchSize=2

hostname_collector.sinks.hbaseSink.channel=fileChannel

 

 

# Each channel's type is defined.

hostname_collector.channels.fileChannel.type = file

hostname_collector.channels.fileChannel.checkpointDir = /opt/sponge/file-channel/checkpoint

hostname_collector.channels.fileChannel.dataDirs = /opt/sponge/file-channel/dataDirs

hostname_collector.channels.fileChannel.transactionCapacity = 1000

hostname_collector.channels.fileChannel.checkpointInterval = 30000

hostname_collector.channels.fileChannel.maxFileSize = 2146435071

hostname_collector.channels.fileChannel.minimumRequiredSpace = 524288000

hostname_collector.channels.fileChannel.keep-alive = 5

hostname_collector.channels.fileChannel.write-timeout = 5

hostname_collector.channels.fileChannel.checkpoint-timeout = 600

分享到:
评论

相关推荐

    flume-ng安装

    Flume-NG 安装与配置指南 Flume-NG 是一个分布式日志收集系统,能够从各种数据源中实时采集数据,并将其传输到集中式存储系统中。本文将指导您完成 Flume-NG 的安装和基本配置。 安装 Flume-NG 1. 先决条件:...

    Flume ng share

    ### Flume NG 分享资料详解 #### Flume NG 概述 Flume NG 是一个分布式、可靠且可用的服务,用于高效地收集、聚合并移动大量的日志数据。它具有简单而灵活的架构,基于流式数据流。Flume NG 非常健壮且能够容忍...

    Flume-ng资料合集

    Flume NG是Cloudera提供的一个分布式、可靠、可用的系统,它能够将不同数据源的海量日志数据进行高效收集、聚合、移动,最后存储到一个中心化数据存储系统中。由原来的Flume OG到现在的Flume NG,进行了架构重构,...

    FLUME-FlumeNG-210517-1655-5858

    FlumeNG是Apache Flume的一个分支版本,旨在通过重写和重构来解决现有版本中的一些已知问题和限制。Flume是Cloudera开发的一个分布式、可靠且可用的系统,用于有效地收集、聚合和移动大量日志数据。它的主要用途是将...

    Flume-ng在windows环境搭建并测试+log4j日志通过Flume输出到HDFS.docx

    Flume-ng 在 Windows 环境搭建并测试 + Log4j 日志通过 Flume 输出到 HDFS Flume-ng 是一个高可用、可靠、分布式的日志聚合系统,可以实时地从各种数据源(如日志文件、网络 socket、数据库等)中收集数据,并将其...

    mvn flume ng sdk

    `Mvn Flume NG SDK` 是一个用于Apache Flume集成开发的重要工具,它基于Maven构建系统,使得在Java环境中开发、管理和部署Flume插件变得更加便捷。Apache Flume是一款高度可配置的数据收集系统,广泛应用于日志聚合...

    flume-ng-sql-source-1.5.2

    Flume-ng-sql-source-1.5.2是Apache Flume的一个扩展,它允许Flume从SQL数据库中收集数据。Apache Flume是一个分布式、可靠且可用于有效聚合、移动大量日志数据的系统。"ng"代表"next generation",表明这是Flume的...

    flume-ng-1.6.0-cdh5.5.2-src.tar.gz

    《Flume NG 1.6.0 在 CDH 5.5.2 中的应用与解析》 Flume NG,全称为“Next Generation Flume”,是Apache Hadoop项目中用于高效、可靠、分布式地收集、聚合和移动大量日志数据的工具。在CDH(Cloudera Distribution...

    flume-ng-sql-source-1.5.2.jar

    flume-ng-sql-source-1.5.2.jar从数据库中增量读取数据到hdfs中的jar包

    Flume-ng-1.6.0-cdh.zip

    Flume-ng-1.6.0-cdh.zip 内压缩了 3 个项目,分别为:flume-ng-1.6.0-cdh5.5.0.tar.gz、flume-ng-1.6.0-cdh5.7.0.tar.gz 和 flume-ng-1.6.0-cdh5.10.1.tar.gz,选择你需要的版本。

    flume-ng-sql-source-1.5.1

    flume-ng-sql-source-1.5.1 flume连接数据库 很好用的工具

    flume-ng-sql-source-1.5.3.jar

    flume-ng-sql-source-1.5.3.jar,flume采集mysql数据jar包,将此文件拖入FLUME_HOME/lib目录下,如果是CM下CDH版本的flume,则放到/opt/cloudera/parcels/CDH-xxxx/lib/flume-ng/lib下,同样需要的包还有mysql-...

    flume-ng-sql-source-release-1.5.2.zip

    Flume-ng-sql-source是Apache Flume的一个扩展插件,主要功能是允许用户从各种数据库中抽取数据并将其传输到其他目的地,如Apache Kafka。在本案例中,我们讨论的是版本1.5.2的发布包,即"flume-ng-sql-source-...

    flume-ng-1.5.0-cdh5.3.6.rar

    flume-ng-1.5.0-cdh5.3.6.rarflume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume-ng-1.5.0-cdh5.3.6.rar flume...

    flume-ng-sql-source

    包含flume-ng-sql-source-1.5.1&flume;-ng-sql-source-1.4.1 此内容均为网上下载

    flume-ng-sql-source.jar

    flume是一个日志收集器,更多详细的介绍可以参照官网:http://flume.apache.org/ flume-ng-sql-source实现oracle增量数据读取 有了这个jar文件即可从关系型数据库拉去数据到flume

    flume-ng-1.6.0-cdh5.5.0.tar.gz

    "flume-ng-1.6.0-cdh5.5.0.tar.gz" 是 Apache Flume 的一个特定版本,具体来说是 "Next Generation" (ng) 版本的 1.6.0,与 Cloudera Data Hub (CDH) 5.5.0 发行版兼容。CDH 是一个包含多个开源大数据组件的商业发行...

    flumeng-kafka-plugin:flumeng-kafka-plugin

    《Flume与Kafka集成:深入理解flumeng-kafka-plugin》 在大数据处理领域,Apache Flume 和 Apache Kafka 都扮演着至关重要的角色。Flume 是一款用于收集、聚合和移动大量日志数据的工具,而 Kafka 则是一个分布式流...

    flume-ng-1.5.0-cdh5.3.6.tar.gz

    《Flume NG 1.5.0-cdh5.3.6:大数据日志收集利器》 Apache Flume,作为一款高效、可靠且分布式的海量日志聚合工具,是大数据处理领域的重要组件之一。在CDH(Cloudera Distribution Including Apache Hadoop)5.3.6...

    flume-ng-elasticsearch-sink-6.5.4.jar.zip

    《Flume NG与Elasticsearch 6.5.4集成详解》 Flume NG,全称为Apache Flume,是一款由Apache软件基金会开发的数据收集系统,主要用于日志聚合、监控和数据传输。它设计的目标是高效、可靠且易于扩展,特别适合...

Global site tag (gtag.js) - Google Analytics