在http://bit1129.iteye.com/blog/2174791一文中,实现了单Kafka服务器的安装,在Kafka中,每个Kafka服务器称为一个broker。本文简单介绍下,在单机环境下Kafka的伪分布式安装和测试验证
1. 安装步骤
Kafka伪分布式安装的思路跟Zookeeper的伪分布式安装思路完全一样,不过比Zookeeper稍微简单些(不需要创建myid文件),主要是针对每个Kafka服务器配置一个单独的server.properties,三个服务器分别使用server.properties,server.1.properties, server.2.properties
cp server.properties server.1.properties cp server.properties server.2.properties
修改server.1.properties和server.2.properties,主要有三个属性需要修改
broker.id=1 port=9093 log.dirs=/tmp/kafka-logs-1
port指的是Kakfa服务器监听的端口
启动三个Kafka:
bin/kafka-server-start.sh server.properties
bin/kafka-server-start.sh server.1.properties
bin/kafka-server-start.sh server.2.properties
2. Kafka脚本常用配置参数
2.1 kafka-console-consumer.sh
--from-beginning If the consumer does not already have an established offset to consume from, start with the earliest message present in the log rather than the latest message.
--topic <topic> The topic id to consume on
--zookeeper <urls> REQUIRED: The connection string for the zookeeper connection in the form host:port. Multiple URLS can be given to allow fail-over.
--group <gid> The group id to consume on. (default: console-consumer-37803)
在consumer端,不需要指定broke-list,而是通过zookeeper和topic找到所有的持有topic消息的broker
2.2 kafka-console-producer.sh
--topic <topic> REQUIRED: The topic id to produce messages to.
--broker-list <broker-list> REQUIRED: The broker list string in the form HOST1:PORT1,HOST2:PORT2.
2.3 kafka-topic.sh
--create Create a new topic.
--describe List details for the given topics.
--list List all available topics.
--partitions <Integer: # of partitions> The number of partitions for the topic being created or altered (WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected)
--replication-factor <Integer: replication factor> The replication factor for each partition in the topic being created
--zookeeper <urls> REQUIRED: The connection string for the zookeeper connection in the form host:port. Multiple URLS can be given to allow fail-over.
--topic <topic> The topic to be create, alter or describe. Can also accept a regular expression except for --create option
3. 伪机群测试
测试前,先总结有哪些测试点
目前想到的是,Partition有个leader的概念,leader partition是什么意思?干什么用的?
3.1 创建Topic
./kafka-topics.sh --create --topic topic_p10_r3 --partitions 10 --replication-factor 3 --zookeeper localhost:2181
创建一个Topic,10个Partition,副本数为3,也就是说,每个broker上的每个分区,在其它节点都有副本,因为每个节点都有10个节点的数据
3.2 每个broker创建的目录
当创建完Topic后,每个Topic都会在Kakfa的配置目录下(比如/tmp/kafka-logs,建立相应的目录和文件)
topic_p10_r3-0
topic_p10_r3-1
----
topic_p10_r3-9
其中每个目录下面都有两个文件: 00000000000000000000.index 00000000000000000000.log
3.3 Topic的详细信息
./kafka-topics.sh --describe --topic topic_p10_r3 --zookeeper localhost:2181
得到的结果如下:
Topic:topic_p10_r3 PartitionCount:10 ReplicationFactor:3 Configs: Topic: topic_p10_r3 Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: topic_p10_r3 Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 Topic: topic_p10_r3 Partition: 3 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: topic_p10_r3 Partition: 4 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1 Topic: topic_p10_r3 Partition: 5 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2 Topic: topic_p10_r3 Partition: 6 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: topic_p10_r3 Partition: 7 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 8 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 Topic: topic_p10_r3 Partition: 9 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0
具体的含义是:
Here is an explanation of output. The first line gives a summary of all the partitions, each additional line gives information about one partition
- "leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.
- "replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.
- "isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
3.4 问题: 如果副本数为1,是否表示每个partition在集群中只有1份(也就是说每个partition只会存在于一个broker上),那么leader自然就表示这个partition就在leader所指的broker上了?
建立包含10个分区,同时只有一个副本的topic
./kafka-topics.sh --create --topic topic_p10_r1 --partitions 10 --replication-factor 1 --zookeeper localhost:2181
详细信息:
[hadoop@hadoop bin]$ ./kafka-topics.sh --describe --topic topic_p10_r1 --zookeeper localhost:2181 Topic:topic_p10_r1 PartitionCount:10 ReplicationFactor:1 Configs: Topic: topic_p10_r1 Partition: 0 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 1 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 2 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 3 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 4 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 5 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 6 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 7 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 8 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 9 Leader: 1 Replicas: 1 Isr: 1
可见理解不错,每个partition有不同的leader,Leader所在的broker同时也是Replicas所在的broker(ID号一样)
因此可以理解,
1. 每个partition副本集都有一个leader
2. leader指的是partition副本集中的leader,它负责读写,然后负责将数据复制到其它的broker上。
3.一个Topic的所有partition会比较均匀的分布到多个broker上
3.5 broker挂了,Kafka的容错机制
在上面已经建立了两个Topic,一个是10个分区3个副本, 一个是10个分区1个副本。此时,假如有一个broker挂了,看看这两个Topic的容错如何?
通过jps命令可以看到有三个Kafka进程。
通过ps -ef|grep server.2.properties可以找到brokerId为2的Kakfa进程,使用kill -9将其干掉。干掉的时候,console开始刷屏,异常信息一样,都是:
[2015-02-23 02:14:00,037] WARN Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2015-02-23 02:14:00,039] ERROR [ReplicaFetcherThread-0-2], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 4325; ClientId: ReplicaFetcherThread-0-2; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [topic_p10_r3,3] -> PartitionFetchInfo(0,1048576),[topic_p10_r3,9] -> PartitionFetchInfo(0,1048576),[topic_p10_r3,6] -> PartitionFetchInfo(0,1048576),[topic_p10_r3,0] -> PartitionFetchInfo(0,1048576) (kafka.server.ReplicaFetcherThread) java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:465) at sun.nio.ch.Net.connect(Net.java:457) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44) at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) [2015-02-23 02:14:00,040] WARN Reconnect due to socket error: null (kafka.consumer.SimpleConsumer)
3,9,6,0正是topic_p10_r3上broker2作为leader的partition,可见Kafka要做Leader移交,看看此时的topic_p10_r3和topic_p10_r1的情况
topic_p10_r3(Partition切换到其它Leader上了。。。Rplicas还有3,。。。)
[hadoop@hadoop bin]$ ./kafka-topics.sh --describe --topic topic_p10_r3 --zookeeper localhost:2181 Topic:topic_p10_r3 PartitionCount:10 ReplicationFactor:3 Configs: Topic: topic_p10_r3 Partition: 0 Leader: 0 Replicas: 2,0,1 Isr: 0,1 Topic: topic_p10_r3 Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,1 Topic: topic_p10_r3 Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,0 Topic: topic_p10_r3 Partition: 3 Leader: 1 Replicas: 2,1,0 Isr: 1,0 Topic: topic_p10_r3 Partition: 4 Leader: 0 Replicas: 0,2,1 Isr: 0,1 Topic: topic_p10_r3 Partition: 5 Leader: 1 Replicas: 1,0,2 Isr: 1,0 Topic: topic_p10_r3 Partition: 6 Leader: 0 Replicas: 2,0,1 Isr: 0,1 Topic: topic_p10_r3 Partition: 7 Leader: 0 Replicas: 0,1,2 Isr: 0,1 Topic: topic_p10_r3 Partition: 8 Leader: 1 Replicas: 1,2,0 Isr: 1,0 Topic: topic_p10_r3 Partition: 9 Leader: 1 Replicas: 2,1,0 Isr: 1,0
topic_p10_r1:没有切换,但是Leader是-1了。。
[hadoop@hadoop bin]$ ./kafka-topics.sh --describe --topic topic_p10_r1 --zookeeper localhost:2181 Topic:topic_p10_r1 PartitionCount:10 ReplicationFactor:1 Configs: Topic: topic_p10_r1 Partition: 0 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 1 Leader: -1 Replicas: 2 Isr: Topic: topic_p10_r1 Partition: 2 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 3 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 4 Leader: -1 Replicas: 2 Isr: Topic: topic_p10_r1 Partition: 5 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 6 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 7 Leader: -1 Replicas: 2 Isr: Topic: topic_p10_r1 Partition: 8 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 9 Leader: 1 Replicas: 1 Isr: 1
重启broker 2得到结果如下:(对于topic_p10_r3,leader没有变化,即每个Partition都有自己的Leader,新加入的broker只能follower;而topic_p10_r1,则会选出Leader)
[hadoop@hadoop bin]$ ./kafka-topics.sh --describe --topic topic_p10_r3 --zookeeper localhost:2181 Topic:topic_p10_r3 PartitionCount:10 ReplicationFactor:3 Configs: Topic: topic_p10_r3 Partition: 0 Leader: 0 Replicas: 2,0,1 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,0,2 Topic: topic_p10_r3 Partition: 3 Leader: 1 Replicas: 2,1,0 Isr: 1,0,2 Topic: topic_p10_r3 Partition: 4 Leader: 0 Replicas: 0,2,1 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 5 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2 Topic: topic_p10_r3 Partition: 6 Leader: 0 Replicas: 2,0,1 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 7 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: topic_p10_r3 Partition: 8 Leader: 1 Replicas: 1,2,0 Isr: 1,0,2 Topic: topic_p10_r3 Partition: 9 Leader: 1 Replicas: 2,1,0 Isr: 1,0,2 [hadoop@hadoop bin]$ ./kafka-topics.sh --describe --topic topic_p10_r1 --zookeeper localhost:2181 Topic:topic_p10_r1 PartitionCount:10 ReplicationFactor:1 Configs: Topic: topic_p10_r1 Partition: 0 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 1 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 2 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 3 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 4 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 5 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 6 Leader: 1 Replicas: 1 Isr: 1 Topic: topic_p10_r1 Partition: 7 Leader: 2 Replicas: 2 Isr: 2 Topic: topic_p10_r1 Partition: 8 Leader: 0 Replicas: 0 Isr: 0 Topic: topic_p10_r1 Partition: 9 Leader: 1 Replicas: 1 Isr: 1
相关推荐
标题中的“kafka自定义伪分布式安装.zip”表明这是一个关于Apache Kafka的自定义伪分布式安装教程的压缩包。Apache Kafka是一种流行的开源流处理平台,常用于构建实时数据管道和流应用。它允许用户以高吞吐量、低...
标题“基于Storm与Kafka集群的火电厂分布式流式数据建模与分析系统”指向了两个主要的技术组件:Apache Storm和Apache Kafka,以及它们在火电厂数据处理方面的应用。Apache Storm是一个分布式的、实时的计算系统,...
课程内容包括了Kafka java Consumer实战,Kafka集成框架,Kafka分布式集群架构,Kafka性能测试实战,Kafka集群监控实战,Kafka用户行为画像,Kafka性能存储优化及如何提高Kafka吞吐量等企业级技术。 视频大小:1.5G
伪分布式安装教程: Flume Hbase Spark Hive Kafka Sqoop zookeeper等分布式系统框架 备注:Hadoop安装教程当时忘记记录,后续也懒得弄,所以上传资料也暂无hadoop安装教程,尽请理解!!!!
内容涵盖Kafka集群的核心组件讲解、集群架构设计、分布式集群搭建与伪集群配置,帮助读者快速上手Kafka环境部署。 通过实战案例,深入解析Java Consumer与Producer的高级用法,包括手动提交Offset、数据回溯、...
在分布式环境中,通常需要多台服务器来运行Kafka集群,但为了测试或学习目的,可以使用单机上的伪分布式设置,这使得开发者能够在本地环境中快速搭建Kafka实例,而无需复杂的网络配置。 标签中的“kafka”、...
kafka高吞吐量的分布式发布订阅消息 思维导图
Kafka自动化管理和分布式状态系统导论涵盖了分布式消息流处理平台Apache Kafka的基础知识,其自动化管理和维护分布式状态系统的复杂性,以及在大规模部署中面临的挑战和解决方案。 Apache Kafka是一个开源的分布式...
fabric基于kafka的多orderer的分布式部署脚本:3zookeeper+4kafka+3orderer+4peer+1cli,其中还包含了ca的配置:docker-compose-ca-org1.yaml和docker-compose-ca-org2.yaml,用的时候改下证书和路径就可以了。
本教程将有助于小白实现从零开始搭建kafka伪分布式集群
需要的环境支持:jdk8,zookeeper端口为2181,然后将此压缩包的module目录解压到/opt下并直接运行cluster-kafka.sh即可。 注意:端口不能被占用 主要相关文档: ...
环境:window7 64位,zookeeper3.4.6 kafka版本:kafka_2.8.0-0.8.0 window 搭建开发kafka集群:我的集群目录:F:\liuzhiwen\cluster\tech\kafka_cluster 需要切换到指定的目录:命令行切换cd F:\liuzhiwen\cluster\...
Kafka 的架构包括话题(Topic)、生产者(Producer)、代理(Broker)和消费者(Consumer)四个组件。 话题(Topic)是特定类型的消息流,消息是字节的有效负载(Payload),话题是消息的分类名或种子(Feed)名。...
根据提供的文件信息,本研究文件的标题为“基于Kafka消息队列的新一代分布式电量采集方法研究”,文件的描述为“#资源达人分享计划#”,而标签则包括“分布式”、“分布式系统”、“分布式开发”、“参考文献”和...
Kafka分布式集群安装部署 Kafka是一个流行的分布式消息队列系统,广泛应用于大数据处理、实时数据处理和流处理等领域。为了确保Kafka的高可用性和可靠性,需要安装和部署分布式集群。本文将详细介绍Kafka分布式集群...
**KAFKA分布式消息系统在Windows环境下的搭建与应用** KAFKA是一个高吞吐量的分布式消息系统,由LinkedIn开发并开源,现在是Apache软件基金会的顶级项目。它主要设计用于处理实时流数据,允许应用程序发布和订阅...
Kafka技术实战学习的优选课程,课程内容全程实战,没有拖泥带水,但不包含基础知识的教学,需要同学们先具备一定的Kafka技术基础再进行学习。课程内容包括了Kafka java Consumer实战,Kafka集成框架等。
### Kafka分布式集群搭建详解 #### 一、概述 Kafka是一种高性能、分布式的消息发布与订阅系统,被广泛应用于日志收集、流处理、消息传递等多个领域。为了提高系统的可用性与扩展性,通常会采用分布式集群的方式...
标题中的“大数据 分布式 读写 kafka”指的是在大数据处理场景中,使用Apache Kafka进行分布式数据的读取和写入操作。Kafka是一个高效、可扩展、高吞吐量的分布式消息系统,广泛用于实时数据流处理和日志聚合。 在...