kafka python client：PyKafka vs kafka-python

陈修恒

浏览: 207454 次
性别:
来自: 北京

最近访客更多访客>>

stingice

wangcz

wangning1125

liaohanfeng

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

kafka python

python kafka client

引用：https://github.com/Parsely/pykafka/issues/334

@emmett9001写道

@microamp Thanks, this is a great idea. There's currently no documentation on this, but to my knowledge the main differences are the specifics of the Python API and PyKafka's implementation of theBalancedConsumer. PyKafka strives to keep the API as pythonic as possible, which means using useful features of the language where appropriate for client code simplicity. This includes things like context managers for object cleanup and futures for asynchronous error handling. PyKafka's balanced consumer implements the Kafka project's notion of the "high level consumer", which uses ZooKeeper to balance consumption of partitions between multiple nodes in a consumer group. From what I understand, kafka-python is waiting until Kafka 0.9, when this functionality will be supported natively by the Kafka server itself, to implement self-balancing consumers.
Also, the last time we did a speed test (which was admittedly a while ago at this point), PyKafka's consumer outperformed kafka-python. I unfortunately no longer have the results from that test, so you may not want to bet too hard on PyKafka being significantly faster or slower - just figured I'd mention it.

注：

1、PyKafka 尽量保持了 python 的接口方式，包括上下文处理和异步的异常处理
2、PyKafka 支持 ZooKeeper 做消费者负载均衡。而 kafka-python 直到 Kafka 0.9 才支持消费者负载均衡，并且是依靠kafka服务实现的。

@emmett9001写道

Some more research - there are differences in the versions of python supported by each library. PyKafka supports 2.7, 3.4, 3.5, and pypy, while kafka-python adds 2.6 and removes 3.5 support. kafka-python also requires a ZooKeeper connection for offset management, which PyKafka does not. kafka-python supports versions of Kafka from 0.8.0 to 0.8.2, where PyKafka only supports 0.8.2.

注：

1、Python 版本：kafka-python 支持 2.6，不支持 3.5， PyKafak 支持 2.7，3.4，3.5

2、Kafka 版本：kafka-python 支持 0.8.0 ~ 0.8.2，PyKafka 只支持 0.8.2

@ottomata写道

A difference between kafka-python and pykafka is the producer interface. kafka-python does not require that you know the topic when instantiating the producer. This is convenient if you need to produce to topics dynamically based on input (which I do!) :)

注：

kafka-python 在初始化 producer 时，不需要知道 topic。（言外之意：PyKafka 需要？）

@cscheffler写道

@emmett9001 @ottomata Just got pointed at this thread and thought I'd make a late contribution.

We compared pykafka and kafka-python about 2 months ago while trying to decide which one to use. In the end, the deciding factor for us was that balanced consumers were much easier to manage in pykafka.

Also, we discovered later, a pykafka producer doesn't die on Kafka broker restart, while our kafka-python producers did.

Below are performance figures from a 3-node Kafka cluster running in EC2, using a single producer or consumer. The three numbers for each test are the quartiles measured for the test.

pykafka producer: 41400 – 46500 – 50200 messages per second
pykafka consumer: 12100 – 14400 – 23700 messages per second
kafka-python producer: 26500 – 27700 – 29500 messages per second
kafka-python consumer: 35000 – 37300 – 39100 messages per second
So, for clarification, the median performance of a pykafka producer was 46500 messages per second, with a quartile range of 41400 (25th percentile) to 50200 (75th percentile). Hope that makes sense.

注：
1、当 broker 重启后，kafka-python producer 会死掉，而 PyKafka 不会

2、作为生产者，PyKafka 性能更好；作为消费者，kafka-python 性能更好

分享到：

linux tar 压缩和解压

2015-12-23 10:28
浏览 18103
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论