Storm Topology的并发度

简介:

Understanding the parallelism of a Storm topology

https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology

 

概念

一个Topology可以包含一个或多个worker(并行的跑在不同的machine上), 所以worker process就是执行一个topology的子集, 并且worker只能对应于一个topology

一个worker可用包含一个或多个executor, 每个component (spout或bolt)至少对应于一个executor, 所以可以说executor执行一个compenent的子集, 同时一个executor只能对应于一个component

Task就是具体的处理逻辑对象, 一个executor线程可以执行一个或多个tasks 
但一般默认每个executor只执行一个task, 所以我们往往认为task就是执行线程, 其实不然 
task代表最大并发度, 一个component的task数是不会改变的, 但是一个componet的executer数目是会发生变化的 
当task数大于executor数时, executor数代表实际并发数 

worker process executes a subset of a topology. 
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. 
A running topology consists of many such processes running on many machines within a Storm cluster.

An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).

task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. 
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks
By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.

image

 

Configuring the parallelism of a topology, 并发度的配置

The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them.

Storm currently has the following order of precedence for configuration settings:

defaults.yaml < storm.yaml < topology-specific configuration < internal component-specific configuration < external component-specific configuration

 

对于并发度的配置, 在storm里面可以在多个地方进行配置, 优先级如上面所示... 
具体包含,

worker processes的数目, 可以通过配置文件和代码中配置, worker就是执行进程, 所以考虑并发的效果, 数目至少应该大于machines的数目

executor的数目, component的并发线程数,只能在代码中配置(通过setBolt和setSpout的参数), 例如, setBolt("green-bolt", new GreenBolt(), 2)

tasks的数目, 可以不配置, 默认和executor1:1, 也可以通过setNumTasks()配置

Number of worker processes

  • Description: How many worker processes to create for the topology across machines in the cluster.
  • Configuration option: TOPOLOGY_WORKERS
  • How to set in your code (examples):

Number of executors (threads)

  • Description: How many executors to spawn per component.
  • Configuration option: ?
  • How to set in your code (examples):

Number of tasks

Here is an example code snippet to show these settings in practice:

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
               .setNumTasks(4)
               .shuffleGrouping("blue-spout);

In the above code we configured Storm to run the bolt GreenBolt with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.

 

Example of a running topology

The following illustration shows how a simple topology would look like in operation. 
The topology consists of three components: one spout called BlueSpout and two bolts called GreenBolt and YellowBolt
The components are linked such that BlueSpout sends its output to GreenBolt, which in turns sends its own output toYellowBolt.

image

 

Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes

topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) 
               .setNumTasks(4)                   //set tasks number to 4
               .shuffleGrouping("blue-spout");

topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
               .shuffleGrouping("green-bolt");

StormSubmitter.submitTopology(
        "mytopology",
        conf,
        topologyBuilder.createTopology()
    );

图和代码, 很清晰, 通过setBolt和setSpout一共定义2+2+6=10个executor threads 
并且同setNumWorkers设置2个workers, 所以storm会平均在每个worker上run 5个executors 
而对于green-bolt, 定义了4个tasks, 所以每个executor中有2个tasks

 

How to change the parallelism of a running topology, 动态的改变并发度

Storm支持在不restart topology的情况下, 动态的改变(增减)worker processes的数目和executors的数目, 称为rebalancing. 
通过Storm web UI, 或者通过storm rebalance命令, 见下面的例子

A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.

You have two options to rebalance a topology:

  1. Use the Storm web UI to rebalance the topology.
  2. Use the CLI tool storm rebalance as described below.

Here is an example of using the CLI tool:

# Reconfigure the topology "mytopology" to use 5 worker processes,
# the spout "blue-spout" to use 3 executors and
# the bolt "yellow-bolt" to use 10 executors.


$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10



本文章摘自博客园,原文发布日期:2013-05-04

目录
相关文章
|
10天前
|
人工智能 JavaScript Linux
【Claude Code 全攻略】终端AI编程助手从入门到进阶(2026最新版)
Claude Code是Anthropic推出的终端原生AI编程助手,支持40+语言、200k超长上下文,无需切换IDE即可实现代码生成、调试、项目导航与自动化任务。本文详解其安装配置、四大核心功能及进阶技巧,助你全面提升开发效率,搭配GitHub Copilot使用更佳。
|
4天前
|
JSON API 数据格式
OpenCode入门使用教程
本教程介绍如何通过安装OpenCode并配置Canopy Wave API来使用开源模型。首先全局安装OpenCode,然后设置API密钥并创建配置文件,最后在控制台中连接模型并开始交互。
1910 6
|
12天前
|
存储 人工智能 自然语言处理
OpenSpec技术规范+实例应用
OpenSpec 是面向 AI 智能体的轻量级规范驱动开发框架,通过“提案-审查-实施-归档”工作流,解决 AI 编程中的需求偏移与不可预测性问题。它以机器可读的规范为“单一真相源”,将模糊提示转化为可落地的工程实践,助力开发者高效构建稳定、可审计的生产级系统,实现从“凭感觉聊天”到“按规范开发”的跃迁。
1900 18
|
10天前
|
人工智能 JavaScript 前端开发
【2026最新最全】一篇文章带你学会Cursor编程工具
本文介绍了Cursor的下载安装、账号注册、汉化设置、核心模式(Agent、Plan、Debug、Ask)及高阶功能,如@引用、@Doc文档库、@Browser自动化和Rules规则配置,助力开发者高效使用AI编程工具。
1353 7
|
14天前
|
IDE 开发工具 C语言
【2026最新】VS2026下载安装使用保姆级教程(附安装包+图文步骤)
Visual Studio 2026是微软推出的最新Windows专属IDE,启动更快、内存占用更低,支持C++、Python等开发。推荐免费的Community版,安装简便,适合初学者与个人开发者使用。
1357 13
|
10天前
|
人工智能 JSON 自然语言处理
【2026最新最全】一篇文章带你学会Qoder编辑器
Qoder是一款面向程序员的AI编程助手,集智能补全、对话式编程、项目级理解、任务模式与规则驱动于一体,支持模型分级选择与CLI命令行操作,可自动生成文档、优化提示词,提升开发效率。
825 10
【2026最新最全】一篇文章带你学会Qoder编辑器
|
14天前
|
人工智能 测试技术 开发者
AI Coding后端开发实战:解锁AI辅助编程新范式
本文系统阐述了AI时代开发者如何高效协作AI Coding工具,强调破除认知误区、构建个人上下文管理体系,并精准判断AI输出质量。通过实战流程与案例,助力开发者实现从编码到架构思维的跃迁,成为人机协同的“超级开发者”。
1105 96
|
8天前
|
云安全 安全
免费+限量+领云小宝周边!「阿里云2026云上安全健康体检」火热进行中!
诚邀您进行年度自检,发现潜在风险,守护云上业务连续稳健运行
1182 2

热门文章

最新文章