引用:https://github.com/Parsely/pykafka/issues/334
@emmett9001写道
@microamp Thanks, this is a great idea. There's currently no documentation on this, but to my knowledge the main differences are the specifics of the Python API and PyKafka's implementation of theBalancedConsumer. PyKafka strives to keep the API as pythonic as possible, which means using useful features of the language where appropriate for client code simplicity. This includes things like context managers for object cleanup and futures for asynchronous error handling. PyKafka's balanced consumer implements the Kafka project's notion of the "high level consumer", which uses ZooKeeper to balance consumption of partitions between multiple nodes in a consumer group. From what I understand, kafka-python is waiting until Kafka 0.9, when this functionality will be supported natively by the Kafka server itself, to implement self-balancing consumers.
Also, the last time we did a speed test (which was admittedly a while ago at this point), PyKafka's consumer outperformed kafka-python. I unfortunately no longer have the results from that test, so you may not want to bet too hard on PyKafka being significantly faster or slower - just figured I'd mention it.
Also, the last time we did a speed test (which was admittedly a while ago at this point), PyKafka's consumer outperformed kafka-python. I unfortunately no longer have the results from that test, so you may not want to bet too hard on PyKafka being significantly faster or slower - just figured I'd mention it.
注:
1、PyKafka 尽量保持了 python 的接口方式,包括上下文处理和异步的异常处理
2、PyKafka 支持 ZooKeeper 做消费者负载均衡。而 kafka-python 直到 Kafka 0.9 才支持消费者负载均衡,并且是依靠kafka服务实现的。
@emmett9001写道
Some more research - there are differences in the versions of python supported by each library. PyKafka supports 2.7, 3.4, 3.5, and pypy, while kafka-python adds 2.6 and removes 3.5 support. kafka-python also requires a ZooKeeper connection for offset management, which PyKafka does not. kafka-python supports versions of Kafka from 0.8.0 to 0.8.2, where PyKafka only supports 0.8.2.
注:
1、Python 版本:kafka-python 支持 2.6,不支持 3.5, PyKafak 支持 2.7,3.4,3.5
2、Kafka 版本:kafka-python 支持 0.8.0 ~ 0.8.2,PyKafka 只支持 0.8.2
@ottomata写道
A difference between kafka-python and pykafka is the producer interface. kafka-python does not require that you know the topic when instantiating the producer. This is convenient if you need to produce to topics dynamically based on input (which I do!) :)
注:
kafka-python 在初始化 producer 时,不需要知道 topic。(言外之意:PyKafka 需要?)
@cscheffler写道
@emmett9001 @ottomata Just got pointed at this thread and thought I'd make a late contribution.
We compared pykafka and kafka-python about 2 months ago while trying to decide which one to use. In the end, the deciding factor for us was that balanced consumers were much easier to manage in pykafka.
Also, we discovered later, a pykafka producer doesn't die on Kafka broker restart, while our kafka-python producers did.
Below are performance figures from a 3-node Kafka cluster running in EC2, using a single producer or consumer. The three numbers for each test are the quartiles measured for the test.
pykafka producer: 41400 – 46500 – 50200 messages per second
pykafka consumer: 12100 – 14400 – 23700 messages per second
kafka-python producer: 26500 – 27700 – 29500 messages per second
kafka-python consumer: 35000 – 37300 – 39100 messages per second
So, for clarification, the median performance of a pykafka producer was 46500 messages per second, with a quartile range of 41400 (25th percentile) to 50200 (75th percentile). Hope that makes sense.
We compared pykafka and kafka-python about 2 months ago while trying to decide which one to use. In the end, the deciding factor for us was that balanced consumers were much easier to manage in pykafka.
Also, we discovered later, a pykafka producer doesn't die on Kafka broker restart, while our kafka-python producers did.
Below are performance figures from a 3-node Kafka cluster running in EC2, using a single producer or consumer. The three numbers for each test are the quartiles measured for the test.
pykafka producer: 41400 – 46500 – 50200 messages per second
pykafka consumer: 12100 – 14400 – 23700 messages per second
kafka-python producer: 26500 – 27700 – 29500 messages per second
kafka-python consumer: 35000 – 37300 – 39100 messages per second
So, for clarification, the median performance of a pykafka producer was 46500 messages per second, with a quartile range of 41400 (25th percentile) to 50200 (75th percentile). Hope that makes sense.
注:
1、当 broker 重启后,kafka-python producer 会死掉,而 PyKafka 不会
2、作为生产者,PyKafka 性能更好;作为消费者,kafka-python 性能更好
相关推荐
Confluent的适用于Apache Kafka TM的Python客户端 confluent-kafka-python提供了与所有兼容的高级Producer,Consumer和AdminClient 经纪人> = v0.8, 和。 客户是: 可靠-它是 (通过二进制车轮自动提供)的包装,...
《PyPI上的opentracing-python-kafka-client-0.9.tar.gz:Python中的分布式追踪与Kafka集成详解》 在Python的世界里,PyPI(Python Package Index)是开发者获取和分享开源软件的主要平台。今天我们要关注的是PyPI...
在本文中,我们将深入探讨如何使用Python库`pykafka`来测试Apache Kafka集群。`pykafka`是一个强大的Python客户端,它提供了简洁的API,用于与Kafka进行交互,包括生产消息和消费消息。Apache Kafka是一种分布式流...
**Python-Kafka集群搭建与Python API使用指南** Kafka是一种分布式流处理平台,常用于实时数据处理和消息传递。在本教程中,我们将探讨如何搭建一个支持SASL(Simple Authentication and Security Layer)认证的...
**Python库-kafka_client_decorators 0.9.2** `kafka_client_decorators` 是一个专门针对Python的Kafka客户端开发的装饰器库,它为处理Apache Kafka消息提供了便捷和灵活的方式。这个库的版本0.9.2是专为Python 3...
Kafka Python client存在问题,建议暂时使用https://github.com/wbarnha/kafka-python-ngApache Kafka 分布式流处理系统的 Python 客户端。kafka-python 的设计功能与官方 java 客户端非常相似,并带有一些 Python ...
**Python与Confluent的Apache Kafka Python客户端** Apache Kafka是一个分布式流处理平台,它被广泛用于构建实时数据管道和流应用程序。Kafka以其高吞吐量、容错性和可扩展性而闻名。Confluent是Kafka的主要商业...
在Python中,Kafka是一个广泛使用的分布式消息系统,它允许应用程序高效地生产、消费和存储大量数据。`kafka-python`是Python社区中一个流行的Kafka客户端库,它提供了与Kafka服务器交互的各种功能,包括生产者、...
Kafka Python客户端介绍简单介绍用于Apache Kafka的python库。 (即) 此仓库包含一个可配置且简单的生产者和使用者,以简单地显示其工作方式的示例。跑步使用Python 3 安装要求: pip install -r requirements.txt ...
《Python库 Tencentcloud-sdk-python-ckafka:深入理解与应用》 在当今的软件开发领域,Python因其简洁、易读的语法以及丰富的库支持而深受开发者喜爱。在Python的生态系统中,Tencentcloud-sdk-python-ckafka是一...
Kafka还支持多语言客户端,如Java、Python、C++等,使得不同语言的应用程序都能轻松地与Kafka集成。此外,Kafka Connect允许你方便地将数据导入导出到其他数据存储,如Hadoop、Elasticsearch等。 在实际应用中,...
**Python库mypy_boto3_kafka介绍** `mypy_boto3_kafka`是一个针对Amazon Kafka服务的类型注解增强版本的Boto3库。Boto3是Amazon Web Services (AWS)的官方Python SDK,它允许开发人员通过Python轻松地访问和管理AWS...
在Python中,使用`kafka-python`库来获取Kafka主题(topic)的Lag值并不直观,因为库本身并没有直接提供这样的功能。不过,通过获取主题的总offset和消费者组的最新消费offset,我们可以自己计算出Lag值。下面将详细...
《PyPI官网下载的tencentcloud-sdk-python-ckafka-3.0.365.tar.gz详解》 PyPI(Python Package Index)是Python开发者的重要资源库,它提供了丰富的Python库,供全球开发者免费下载使用。在PyPI官网上,我们可以...
《Python库mypy_boto3_kafka:连接Kafka的强类型工具》 在Python的开发环境中,有一个名为mypy_boto3_kafka的库,它是一个针对Boto3库的类型注解扩展,专为处理Amazon Kafka服务而设计。这个库的版本是1.16.63.0,...
from pykafka import KafkaClient host = 'IP:9092, IP:9092, IP:9092' client = KafkaClient(hosts = host) # 生产者 topicdocu = client.topics['my-topic'] producer = topicdocu.get_producer() for i in range...
kafka-logsize-exporter 安装 下载,解压缩 入门 pip install -r requirement.txt vim cluster.conf # cluster name alias [kafka1003] # zookeeper zk = 127.0.0.1:2128/kafka1003 # kafka broker list brokers = ...
ansible-kafka-管理员 一个低级的ansible库,用于管理Kafka配置。 它不使用Kafka脚本,而是直接连接到Kafka和Zookeeper(如果需要)以确保创建资源。 无需SSH连接到远程主机。 如果您想增加分区,复制因子,更改...