Kafka 0.8

简介:

0.8 is a huge step forward in functionality from 0.7.x

 

This release includes the following major features:

  • Partitions are now replicated. 支持partition的复本, 避免broker失败导致的数据丢失 
    Previously the topic would remain available in the case of server failure, but individual partitions within that topic could disappear when the server hosting them stopped. If a broker failed permanently any unconsumed data it hosted would be lost. 
    Starting with 0.8 all partitions have a replication factor and we get the prior behavior as the special case where replication factor = 1. 
    Replicas have a notion of committed messages and guarantee that committed messages won't be lost as long as at least one replica survives. Replica logs are byte-for-byte identical across replicas.
  • Producer and consumer are replication aware. 支持replica的Producer和Consumer 
    When running in sync mode, by default, the producer send() request blocks until the messages sent is committed to the active replicas. As a result the sender can depend on the guarantee that a message sent will not be lost. 
    Latency sensitive producers have the option to tune this to block only on the write to the leader broker or to run completely async if they are willing to forsake this guarantee. 
    The consumer will only see messages that have been committed. 
  • The consumer has been moved to a "long poll" model where fetch requests block until there is data available. 
    This enables low latency without frequent polling. In general end-to-end message latency from producer to broker to consumer of only a few milliseconds is now possible.
  • We now retain the key used in the producer for partitioning with each message, so the consumer knows the partitioning key. 
    会保存producer用于partitioning的key, 并让consumer知道这个key
  • We have moved from directly addressing messages with a byte offset to using a logical offset (i.e. 0, 1, 2, 3...). 使用逻辑offset代替之前的物理offset 
    The offset still works exactly the same - it is a monotonically increasing number that represents a point-in-time in the log - but now it is no longer tied to byte layout. 
    This has several advantages: 
    (1) it is aesthetically (美学观点上地) nice, 
    (2) it makes it trivial to calculate the next offset or to traverse messages in reverse order, 
    (3) it fixes a corner case (极端情况) interaction between consumer commit() and compressed message batches. Data is still transferred using the same efficient zero-copy mechanism as before. 
  • We have removed the zookeeper dependency from the producer and replaced it with a simple cluster metadata api.
  • We now support multiple data directories (i.e. a JBOD setup).
  • We now expose both the partition and the offset for each message in the high-level consumer. 
    在high-level consumer中expose具体的partition和offset信息
  • We have substantially improved our integration testing, adding a new integration test framework and over 100 distributed regression and performance test scenarios that we run on every checkin.

 

在我看来, 主要的改动

1. 增加broker的安全性, 原来的方案, broker的fail就会导致数据丢失, 确实有点太说不过去, 所以replica feature是必须的

2. 使用逻辑offset, 上面说了些优点, 但是之前使用物理offset时, 也说了一堆优点 
    其实就是效率和易用性的balance, 之前出于对效率的追求, 所以使用物理offset 
    而现在考虑到物理offset实在用的太麻烦, 做出妥协, 改为逻辑offset, 本质没有区别, 只是需要增加一个逻辑offset到物理offset的映射, 以使物理offset对用户透明

3. 对python更好的支持, kafka-python

Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.

Maintainer: David Arthur 
License: Apache v.2.0

https://github.com/mumrah/kafka-python

 

Kafka Replication High-level Design

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication

参考,Apache Kafka Replication Design – High level


本文章摘自博客园,原文发布日期:2013-05-08

目录
相关文章
|
7月前
|
消息中间件 存储 Kafka
Kafka详解
当今数字化世界中,数据的流动变得至关重要。为了满足不断增长的数据需求,企业需要强大而可靠的数据处理工具。Apache Kafka就是这样一个工具,它在数据流处理领域表现出色。本文将详细介绍Apache Kafka,探讨它的核心概念、用途以及如何使用它来构建强大的数据流应用。
|
27天前
|
消息中间件 Java Kafka
Kafka
Kafka
13 1
|
1月前
|
消息中间件 存储 分布式计算
|
2月前
|
消息中间件 存储 Java
玩转Kafka—初步使用
玩转Kafka—初步使用
30 0
|
5月前
|
消息中间件 缓存 算法
Kafka为什么这么快?
Kafka 是一个基于发布-订阅模式的消息系统,它可以在多个生产者和消费者之间传递大量的数据。Kafka 的一个显著特点是它的高吞吐率,即每秒可以处理百万级别的消息。那么 Kafka 是如何实现这样高得性能呢?本文将从七个方面来分析 Kafka 的速度优势。
38 1
|
5月前
|
消息中间件 开发框架 Java
113 Kafka介绍
113 Kafka介绍
39 0
|
8月前
|
消息中间件 缓存 Java
Kafka介绍
Kafka是由Apache软件基金会开发的一个开源流处理平台,由Scala和Java编写。 Kafka是一种高吞吐量的分布式发布订阅消息系统,作为消息中间件来说都起到了系统间解耦、异步、削峰等作用,同时又提供了Kafka streaming插件包在应用端实现实时在线流处理,它可以收集并处理用户在网站中的所有动作流数据以及物联网设备的采样信息
133 0
|
9月前
|
消息中间件 分布式计算 Java
浅谈kafka 一
浅谈kafka 一
|
消息中间件 存储 负载均衡
初识Kafka
通过阅读本篇文字,你可以了解到 Kafka 中的概念:消息、主题、分区、消费者群组、broker 等。
267 0
初识Kafka
|
消息中间件 存储 Kafka
kafka-初识kafka
- kafka是一个具有高吞吐,可水平扩展,可持久化的流式数据处理平台。 - kafka主要包括:消息系统、日志系统、流式处理平台、zookeeper 四大重要组件。 消息系统的重要概念:生产者(producer),消费者(customer),服务节点(broker)。消息系统中一个重要的原理:通过连通器原理实现了保持数据的一致性。
78 0
kafka-初识kafka