Kafka 0.8

简介:

0.8 is a huge step forward in functionality from 0.7.x

 

This release includes the following major features:

  • Partitions are now replicated. 支持partition的复本, 避免broker失败导致的数据丢失 
    Previously the topic would remain available in the case of server failure, but individual partitions within that topic could disappear when the server hosting them stopped. If a broker failed permanently any unconsumed data it hosted would be lost. 
    Starting with 0.8 all partitions have a replication factor and we get the prior behavior as the special case where replication factor = 1. 
    Replicas have a notion of committed messages and guarantee that committed messages won't be lost as long as at least one replica survives. Replica logs are byte-for-byte identical across replicas.
  • Producer and consumer are replication aware. 支持replica的Producer和Consumer 
    When running in sync mode, by default, the producer send() request blocks until the messages sent is committed to the active replicas. As a result the sender can depend on the guarantee that a message sent will not be lost. 
    Latency sensitive producers have the option to tune this to block only on the write to the leader broker or to run completely async if they are willing to forsake this guarantee. 
    The consumer will only see messages that have been committed. 
  • The consumer has been moved to a "long poll" model where fetch requests block until there is data available. 
    This enables low latency without frequent polling. In general end-to-end message latency from producer to broker to consumer of only a few milliseconds is now possible.
  • We now retain the key used in the producer for partitioning with each message, so the consumer knows the partitioning key. 
    会保存producer用于partitioning的key, 并让consumer知道这个key
  • We have moved from directly addressing messages with a byte offset to using a logical offset (i.e. 0, 1, 2, 3...). 使用逻辑offset代替之前的物理offset 
    The offset still works exactly the same - it is a monotonically increasing number that represents a point-in-time in the log - but now it is no longer tied to byte layout. 
    This has several advantages: 
    (1) it is aesthetically (美学观点上地) nice, 
    (2) it makes it trivial to calculate the next offset or to traverse messages in reverse order, 
    (3) it fixes a corner case (极端情况) interaction between consumer commit() and compressed message batches. Data is still transferred using the same efficient zero-copy mechanism as before. 
  • We have removed the zookeeper dependency from the producer and replaced it with a simple cluster metadata api.
  • We now support multiple data directories (i.e. a JBOD setup).
  • We now expose both the partition and the offset for each message in the high-level consumer. 
    在high-level consumer中expose具体的partition和offset信息
  • We have substantially improved our integration testing, adding a new integration test framework and over 100 distributed regression and performance test scenarios that we run on every checkin.

 

在我看来, 主要的改动

1. 增加broker的安全性, 原来的方案, broker的fail就会导致数据丢失, 确实有点太说不过去, 所以replica feature是必须的

2. 使用逻辑offset, 上面说了些优点, 但是之前使用物理offset时, 也说了一堆优点 
    其实就是效率和易用性的balance, 之前出于对效率的追求, 所以使用物理offset 
    而现在考虑到物理offset实在用的太麻烦, 做出妥协, 改为逻辑offset, 本质没有区别, 只是需要增加一个逻辑offset到物理offset的映射, 以使物理offset对用户透明

3. 对python更好的支持, kafka-python

Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.

Maintainer: David Arthur 
License: Apache v.2.0

https://github.com/mumrah/kafka-python

 

Kafka Replication High-level Design

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication

参考,Apache Kafka Replication Design – High level


本文章摘自博客园,原文发布日期:2013-05-08

目录
相关文章
|
5天前
|
云安全 人工智能 安全
AI被攻击怎么办?
阿里云提供 AI 全栈安全能力,其中对网络攻击的主动识别、智能阻断与快速响应构成其核心防线,依托原生安全防护为客户筑牢免疫屏障。
|
15天前
|
域名解析 人工智能
【实操攻略】手把手教学,免费领取.CN域名
即日起至2025年12月31日,购买万小智AI建站或云·企业官网,每单可免费领1个.CN域名首年!跟我了解领取攻略吧~
|
9天前
|
安全 Java Android开发
深度解析 Android 崩溃捕获原理及从崩溃到归因的闭环实践
崩溃堆栈全是 a.b.c?Native 错误查不到行号?本文详解 Android 崩溃采集全链路原理,教你如何把“天书”变“说明书”。RUM SDK 已支持一键接入。
607 214
|
存储 人工智能 监控
从代码生成到自主决策:打造一个Coding驱动的“自我编程”Agent
本文介绍了一种基于LLM的“自我编程”Agent系统,通过代码驱动实现复杂逻辑。该Agent以Python为执行引擎,结合Py4j实现Java与Python交互,支持多工具调用、记忆分层与上下文工程,具备感知、认知、表达、自我评估等能力模块,目标是打造可进化的“1.5线”智能助手。
850 61
|
7天前
|
人工智能 移动开发 自然语言处理
2025最新HTML静态网页制作工具推荐:10款免费在线生成器小白也能5分钟上手
晓猛团队精选2025年10款真正免费、无需编程的在线HTML建站工具,涵盖AI生成、拖拽编辑、设计稿转代码等多种类型,均支持浏览器直接使用、快速出图与文件导出,特别适合零基础用户快速搭建个人网站、落地页或企业官网。
1262 157
|
4天前
|
编解码 Linux 数据安全/隐私保护
教程分享免费视频压缩软件,免费视频压缩,视频压缩免费,附压缩方法及学习教程
教程分享免费视频压缩软件,免费视频压缩,视频压缩免费,附压缩方法及学习教程
241 138
|
7天前
|
存储 安全 固态存储
四款WIN PE工具,都可以实现U盘安装教程
Windows PE是基于NT内核的轻量系统,用于系统安装、分区管理及故障修复。本文推荐多款PE制作工具,支持U盘启动,兼容UEFI/Legacy模式,具备备份还原、驱动识别等功能,操作简便,适合新旧电脑维护使用。
524 109