Kafka 0.8

简介:

0.8 is a huge step forward in functionality from 0.7.x

 

This release includes the following major features:

  • Partitions are now replicated. 支持partition的复本, 避免broker失败导致的数据丢失 
    Previously the topic would remain available in the case of server failure, but individual partitions within that topic could disappear when the server hosting them stopped. If a broker failed permanently any unconsumed data it hosted would be lost. 
    Starting with 0.8 all partitions have a replication factor and we get the prior behavior as the special case where replication factor = 1. 
    Replicas have a notion of committed messages and guarantee that committed messages won't be lost as long as at least one replica survives. Replica logs are byte-for-byte identical across replicas.
  • Producer and consumer are replication aware. 支持replica的Producer和Consumer 
    When running in sync mode, by default, the producer send() request blocks until the messages sent is committed to the active replicas. As a result the sender can depend on the guarantee that a message sent will not be lost. 
    Latency sensitive producers have the option to tune this to block only on the write to the leader broker or to run completely async if they are willing to forsake this guarantee. 
    The consumer will only see messages that have been committed. 
  • The consumer has been moved to a "long poll" model where fetch requests block until there is data available. 
    This enables low latency without frequent polling. In general end-to-end message latency from producer to broker to consumer of only a few milliseconds is now possible.
  • We now retain the key used in the producer for partitioning with each message, so the consumer knows the partitioning key. 
    会保存producer用于partitioning的key, 并让consumer知道这个key
  • We have moved from directly addressing messages with a byte offset to using a logical offset (i.e. 0, 1, 2, 3...). 使用逻辑offset代替之前的物理offset 
    The offset still works exactly the same - it is a monotonically increasing number that represents a point-in-time in the log - but now it is no longer tied to byte layout. 
    This has several advantages: 
    (1) it is aesthetically (美学观点上地) nice, 
    (2) it makes it trivial to calculate the next offset or to traverse messages in reverse order, 
    (3) it fixes a corner case (极端情况) interaction between consumer commit() and compressed message batches. Data is still transferred using the same efficient zero-copy mechanism as before. 
  • We have removed the zookeeper dependency from the producer and replaced it with a simple cluster metadata api.
  • We now support multiple data directories (i.e. a JBOD setup).
  • We now expose both the partition and the offset for each message in the high-level consumer. 
    在high-level consumer中expose具体的partition和offset信息
  • We have substantially improved our integration testing, adding a new integration test framework and over 100 distributed regression and performance test scenarios that we run on every checkin.

 

在我看来, 主要的改动

1. 增加broker的安全性, 原来的方案, broker的fail就会导致数据丢失, 确实有点太说不过去, 所以replica feature是必须的

2. 使用逻辑offset, 上面说了些优点, 但是之前使用物理offset时, 也说了一堆优点 
    其实就是效率和易用性的balance, 之前出于对效率的追求, 所以使用物理offset 
    而现在考虑到物理offset实在用的太麻烦, 做出妥协, 改为逻辑offset, 本质没有区别, 只是需要增加一个逻辑offset到物理offset的映射, 以使物理offset对用户透明

3. 对python更好的支持, kafka-python

Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.

Maintainer: David Arthur 
License: Apache v.2.0

https://github.com/mumrah/kafka-python

 

Kafka Replication High-level Design

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication

参考,Apache Kafka Replication Design – High level


本文章摘自博客园,原文发布日期:2013-05-08

目录
相关文章
|
1天前
|
数据采集 人工智能 安全
|
10天前
|
云安全 监控 安全
|
2天前
|
自然语言处理 API
万相 Wan2.6 全新升级发布!人人都能当导演的时代来了
通义万相2.6全新升级,支持文生图、图生视频、文生视频,打造电影级创作体验。智能分镜、角色扮演、音画同步,让创意一键成片,大众也能轻松制作高质量短视频。
879 150
|
15天前
|
机器学习/深度学习 人工智能 自然语言处理
Z-Image:冲击体验上限的下一代图像生成模型
通义实验室推出全新文生图模型Z-Image,以6B参数实现“快、稳、轻、准”突破。Turbo版本仅需8步亚秒级生成,支持16GB显存设备,中英双语理解与文字渲染尤为出色,真实感和美学表现媲美国际顶尖模型,被誉为“最值得关注的开源生图模型之一”。
1615 8
|
6天前
|
人工智能 前端开发 文件存储
星哥带你玩飞牛NAS-12:开源笔记的进化之路,效率玩家的新选择
星哥带你玩转飞牛NAS,部署开源笔记TriliumNext!支持树状知识库、多端同步、AI摘要与代码高亮,数据自主可控,打造个人“第二大脑”。高效玩家的新选择,轻松搭建专属知识管理体系。
361 152
|
7天前
|
人工智能 自然语言处理 API
一句话生成拓扑图!AI+Draw.io 封神开源组合,工具让你的效率爆炸
一句话生成拓扑图!next-ai-draw-io 结合 AI 与 Draw.io,通过自然语言秒出架构图,支持私有部署、免费大模型接口,彻底解放生产力,绘图效率直接爆炸。
585 152
|
9天前
|
人工智能 安全 前端开发
AgentScope Java v1.0 发布,让 Java 开发者轻松构建企业级 Agentic 应用
AgentScope 重磅发布 Java 版本,拥抱企业开发主流技术栈。
541 13
|
2天前
|
编解码 人工智能 机器人
通义万相2.6,模型使用指南
智能分镜 | 多镜头叙事 | 支持15秒视频生成 | 高品质声音生成 | 多人稳定对话