Storm Topology的并发度-阿里云开发者社区

开发者社区> 寒凝雪> 正文

Storm Topology的并发度

简介:
+关注继续查看

Understanding the parallelism of a Storm topology

https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology

 

概念

一个Topology可以包含一个或多个worker(并行的跑在不同的machine上), 所以worker process就是执行一个topology的子集, 并且worker只能对应于一个topology

一个worker可用包含一个或多个executor, 每个component (spout或bolt)至少对应于一个executor, 所以可以说executor执行一个compenent的子集, 同时一个executor只能对应于一个component

Task就是具体的处理逻辑对象, 一个executor线程可以执行一个或多个tasks 
但一般默认每个executor只执行一个task, 所以我们往往认为task就是执行线程, 其实不然 
task代表最大并发度, 一个component的task数是不会改变的, 但是一个componet的executer数目是会发生变化的 
当task数大于executor数时, executor数代表实际并发数 

worker process executes a subset of a topology. 
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. 
A running topology consists of many such processes running on many machines within a Storm cluster.

An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).

task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. 
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks
By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.

image

 

Configuring the parallelism of a topology, 并发度的配置

The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them.

Storm currently has the following order of precedence for configuration settings:

defaults.yaml < storm.yaml < topology-specific configuration < internal component-specific configuration < external component-specific configuration

 

对于并发度的配置, 在storm里面可以在多个地方进行配置, 优先级如上面所示... 
具体包含,

worker processes的数目, 可以通过配置文件和代码中配置, worker就是执行进程, 所以考虑并发的效果, 数目至少应该大于machines的数目

executor的数目, component的并发线程数,只能在代码中配置(通过setBolt和setSpout的参数), 例如, setBolt("green-bolt", new GreenBolt(), 2)

tasks的数目, 可以不配置, 默认和executor1:1, 也可以通过setNumTasks()配置

Number of worker processes

  • Description: How many worker processes to create for the topology across machines in the cluster.
  • Configuration option: TOPOLOGY_WORKERS
  • How to set in your code (examples):

Number of executors (threads)

  • Description: How many executors to spawn per component.
  • Configuration option: ?
  • How to set in your code (examples):

Number of tasks

Here is an example code snippet to show these settings in practice:

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
               .setNumTasks(4)
               .shuffleGrouping("blue-spout);

In the above code we configured Storm to run the bolt GreenBolt with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.

 

Example of a running topology

The following illustration shows how a simple topology would look like in operation. 
The topology consists of three components: one spout called BlueSpout and two bolts called GreenBolt and YellowBolt
The components are linked such that BlueSpout sends its output to GreenBolt, which in turns sends its own output toYellowBolt.

image

 

Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes

topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) 
               .setNumTasks(4)                   //set tasks number to 4
               .shuffleGrouping("blue-spout");

topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
               .shuffleGrouping("green-bolt");

StormSubmitter.submitTopology(
        "mytopology",
        conf,
        topologyBuilder.createTopology()
    );

图和代码, 很清晰, 通过setBolt和setSpout一共定义2+2+6=10个executor threads 
并且同setNumWorkers设置2个workers, 所以storm会平均在每个worker上run 5个executors 
而对于green-bolt, 定义了4个tasks, 所以每个executor中有2个tasks

 

How to change the parallelism of a running topology, 动态的改变并发度

Storm支持在不restart topology的情况下, 动态的改变(增减)worker processes的数目和executors的数目, 称为rebalancing. 
通过Storm web UI, 或者通过storm rebalance命令, 见下面的例子

A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.

You have two options to rebalance a topology:

  1. Use the Storm web UI to rebalance the topology.
  2. Use the CLI tool storm rebalance as described below.

Here is an example of using the CLI tool:

# Reconfigure the topology "mytopology" to use 5 worker processes,
# the spout "blue-spout" to use 3 executors and
# the bolt "yellow-bolt" to use 10 executors.


$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10



本文章摘自博客园,原文发布日期:2013-05-04

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
怎么设置阿里云服务器安全组?阿里云安全组规则详细解说
阿里云服务器安全组设置规则分享,阿里云服务器安全组如何放行端口设置教程
6362 0
阿里云服务器端口号设置
阿里云服务器初级使用者可能面临的问题之一. 使用tomcat或者其他服务器软件设置端口号后,比如 一些不是默认的, mysql的 3306, mssql的1433,有时候打不开网页, 原因是没有在ecs安全组去设置这个端口号. 解决: 点击ecs下网络和安全下的安全组 在弹出的安全组中,如果没有就新建安全组,然后点击配置规则 最后如上图点击添加...或快速创建.   have fun!  将编程看作是一门艺术,而不单单是个技术。
3956 0
使用OpenApi弹性释放和设置云服务器ECS释放
云服务器ECS的一个重要特性就是按需创建资源。您可以在业务高峰期按需弹性的自定义规则进行资源创建,在完成业务计算的时候释放资源。本篇将提供几个Tips帮助您更加容易和自动化的完成云服务器的释放和弹性设置。
7580 0
阿里云服务器远程登录用户名和密码的查询方法
阿里云服务器远程连接登录用户名和密码在哪查看?阿里云服务器默认密码是什么?云服务器系统不同默认用户名不同
80 0
腾讯云服务器 设置ngxin + fastdfs +tomcat 开机自启动
在tomcat中新建一个可以启动的 .sh 脚本文件 /usr/local/tomcat7/bin/ export JAVA_HOME=/usr/local/java/jdk7 export PATH=$JAVA_HOME/bin/:$PATH export CLASSPATH=.
2003 0
阿里云服务器如何登录?阿里云服务器的三种登录方法
购买阿里云ECS云服务器后如何登录?场景不同,云吞铺子总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系统盘、创建快照、配置安全组等操作如何登录ECS云服务器控制台? 1、先登录到阿里云ECS服务器控制台 2、点击顶部的“控制台” 3、通过左侧栏,切换到“云服务器ECS”即可,如下图所示 通过ECS控制台的远程连接来登录到云服务器 阿里云ECS云服务器自带远程连接功能,使用该功能可以登录到云服务器,简单且方便,如下图:点击“远程连接”,第一次连接会自动生成6位数字密码,输入密码即可登录到云服务器上。
16274 0
阿里云ECS云服务器初始化设置教程方法
阿里云ECS云服务器初始化是指将云服务器系统恢复到最初状态的过程,阿里云的服务器初始化是通过更换系统盘来实现的,是免费的,阿里云百科网分享服务器初始化教程: 服务器初始化教程方法 本文的服务器初始化是指将ECS云服务器系统恢复到最初状态,服务器中的数据也会被清空,所以初始化之前一定要先备份好。
2901 0
+关注
5854
文章
223
问答
文章排行榜
最热
最新
相关电子书
更多
文娱运维技术
立即下载
《SaaS模式云原生数据仓库应用场景实践》
立即下载
《看见新力量:二》电子书
立即下载