1. download source code
#git clone https://git-wip-us.apache.org/repos/asf/flume.git
2. compile
#export MAVEN_OPTS="-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m"
#mvn clean install -DskipTests
an error comes
see:
https://issues.apache.org/jira/browse/FLUME-2184
solution:
#mvn clean install -Dhadoop.profile=2 -DskipTests
#mvn clean install -Dhadoop.profile=2 -DskipTests -Dmaven.test.skip=true
the last para will compile test classes.(In some situtation, test clasess have some errors)
If you want to change the hadoop version, alter the pom.xml
<hadoop2.version>2.6.0</hadoop2.version>
3.run
#cd flume-ng-dist/target/apache-flume-1.6.0-SNAPSHOT-bin
#cp conf/flume-conf.properties.template conf/flume.conf
#cp conf/flume-env.sh.template conf/flume-env.sh
Copy and paste this into conf/flume.conf
:
# Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory # Define an Avro source called avro-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.avro-source1.channels = ch1 agent1.sources.avro-source1.type = avro agent1.sources.avro-source1.bind = 0.0.0.0 agent1.sources.avro-source1.port = 41414 # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent1.sinks.log-sink1.channel = ch1 agent1.sinks.log-sink1.type = logger # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = avro-source1 agent1.sinks = log-sink1
#bin/flume-ng agent --conf ./conf/ -f conf/flume.conf
-Dflume.root.logger=DEBUG,console -n agent1
#bin/flume-ng avro-client --conf conf -H localhost -p
41414
-F /etc/passwd -Dflume.root.logger=DEBUG,console
Error: the avro-client can't work(not read file and sent data to avro souce). when i shutdown the flume agent in other console, the avro-client has a error shows
2014-12-31 11:33:30,865 DEBUG [org.apache.avro.ipc.NettyTransceiver] - Remote peer dmining05/127.0.0.1:41414 closed connection. 2014-12-31 11:33:30,865 DEBUG [org.apache.avro.ipc.NettyTransceiver] - Disconnecting from dmining05/127.0.0.1:41414 Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.specific.SpecificData.getClassLoader()Ljava/lang/ClassLoader; at org.apache.avro.ipc.specific.SpecificRequestor.getClient(SpecificRequestor.java:158) at org.apache.avro.ipc.specific.SpecificRequestor.getClient(SpecificRequestor.java:148) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:171) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121) at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638) at org.apache.flume.api.RpcClientFactory.getDefaultInstance(RpcClientFactory.java:170) at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:198) at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)
Solution: the reason is that avro version, replace avro-1.7.4.jar avro-ipc-1.7.4.jar avro-mapred-1.7.4.jar in lib dir using avro-1.7.7.jar avro-ipc-1.7.7.jar avro-mapred-1.7.7.jar
-----------
Setting up Eclipse
mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs
Once this command completes successfully, you must add $HOME/.m2/repository to the classpath in preferences and then you can import all the flume modules as interdependent projects into Eclipse by going to File > Import > General > Existing Projects into Workspace.
References
https://cwiki.apache.org/confluence/display/FLUME/Getting+Started
https://cwiki.apache.org/confluence/display/FLUME/Development+Environment
相关推荐
flume:构建高可用、可扩展的海量日志采集系统 flume:构建高可用、可扩展的海量日志采集系统
其中上篇介绍了HDFS以及流式数据/日志的问题,同时还谈到了Flume是如何解决这些问题的。本书展示了Flume的架构,包括将数据移动到数据库中以及从数据库中获取数据、NoSQL数据存储和性能调优。对于各个架构组件(源、...
由于flume官方并未提供ftp,source的支持; 因此想使用ftp文件服务器的资源作为数据的来源就需要自定义ftpsource,根据github:https://github.com/keedio/flume-ftp-source,提示下载相关jar,再此作为记录。
在 `conf/flume-conf.properties` 文件中配置 Source: ``` agent.sources.baksrc.type = exec agent.sources.baksrc.command = tail -F /home/hadoop/data.txt agent.sources.baksrc.checkperiodic = 1000 ``` 3. ...
Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可...
Flume:构建高可用、可扩展的海量日志采集系统 第一部分
Apache Flume, Distributed Log Collection for Hadoop,2015 第二版,Packt Publishing
《Flume:构建高可用、可扩展的海量日志采集系统》是一本深入解析Apache Flume的专著,由史瑞德哈伦撰写。Flume是大数据领域中一个广泛使用的工具,专门设计用于高效地收集、聚合和移动大量日志数据。在现代企业中,...
flume进阶:如何设计一套Flume进阶课程体系+编程+研发; flume进阶:如何设计一套Flume进阶课程体系+编程+研发; flume进阶:如何设计一套Flume进阶课程体系+编程+研发; flume进阶:如何设计一套Flume进阶课程体系+...
标题 "flume抽取数据库数据的source" 指的是使用 Flume 的 JDBC Source 功能来从数据库中提取数据。JDBC(Java Database Connectivity)是 Java 平台用来与各种数据库交互的标准 API。Flume 的 JDBC Source 就是通过...
Flume-ng-sql-source-1.5.2是Apache Flume的一个扩展,它允许Flume从SQL数据库中收集数据。Apache Flume是一个分布式、可靠且可用于有效聚合、移动大量日志数据的系统。"ng"代表"next generation",表明这是Flume的...
- **Source**:Source 负责收集来自不同数据源的日志数据,并将其转换为 Flume 事件。 - **Sink**:Sink 负责将 Flume 事件存储到目标位置,如 HDFS、HBase 或其他存储系统。 - **Channel**:Channel 作为 Source 和...
flume断点续传覆盖jar,使用组件flume-taildir-source-1.9.0覆盖flume/bin目录下的jar即可
1. **Flume 基础概念**:Flume由Source、Channel和Sink三部分组成。Source负责接收数据,如日志文件、网络流或系统事件;Channel作为临时存储,确保数据在传输过程中的可靠性;Sink则负责将数据从Channel移出并发送...
Flume配置文件kafkaSource 包含Intercepter,包含正则表达式。
在本案例中,我们关注的是 Flume 的 SQL Source 插件,即 `flume-ng-sql-source-1.4.3.jar`。 **Flume SQL Source 插件详解** 1. **Flume NG 概述**:Flume NG 是 Flume 的新版本,与旧版相比,它提供了更灵活的...
标题中提到的 "flume1.9,win10,code.zip" 指的是针对 Flume 1.9 版本在 Windows 10 上运行的修改版源代码。这个压缩包包含的 `code` 文件夹很可能是包含了已经修改过的 Java 源代码,这些代码修改了 Flume 的核心...
《Flume SQL Source的增强与应用》 Apache Flume 是一个分布式、可靠且可用于有效收集、聚合和移动大量日志数据的系统。在大数据处理领域,Flume 的作用至关重要,它能够有效地从各种数据源摄取数据并将其传输到...