https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:
- Worker processes
- Executors (threads)
- Tasks
Here is a simple illustration of their relationships:
A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.
An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).
A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks
. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
Configuring the parallelism of a topology
Note that in Storm’s terminology "parallelism" is specifically used to describe the so-called parallelism hint, which means the initial number of executor (threads) of a component. In this document though we use the term "parallelism" in a more general sense to describe how you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when "parallelism" is used in the normal, narrow definition of Storm.
The following sections give an overview of the various configuration options and how to set them in your code. There is more than one way of setting these options though, and the table lists only some of them. Storm currently has the following order of precedence for configuration settings: defaults.yaml
< storm.yaml
< topology-specific configuration < internal component-specific configuration < external component-specific configuration.
Number of worker processes
- Description: How many worker processes to create for the topology across machines in the cluster.
- Configuration option: TOPOLOGY_WORKERS
- How to set in your code (examples):
Number of executors (threads)
- Description: How many executors to spawn per component.
- Configuration option: ?
- How to set in your code (examples):
- TopologyBuilder#setSpout()
- TopologyBuilder#setBolt()
- Note that as of Storm 0.8 the
parallelism_hint
parameter now specifies the initial number of executors (not tasks!) for that bolt.
Number of tasks
- Description: How many tasks to create per component.
- Configuration option: TOPOLOGY_TASKS
- How to set in your code (examples):
Here is an example code snippet to show these settings in practice:
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout);
In the above code we configured Storm to run the bolt GreenBolt
with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor.
Example of a running topology
The following illustration shows how a simple topology would look like in operation. The topology consists of three components: one spout called BlueSpout
and two bolts called GreenBolt
and YellowBolt
. The components are linked such that BlueSpout
sends its output to GreenBolt
, which in turns sends its own output to YellowBolt
.
The GreenBolt
was configured as per the code snippet above whereas BlueSpout
and YellowBolt
only set the parallelism hint (number of executors). Here is the relevant code:
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
StormSubmitter.submitTopology(
"mytopology",
conf,
topologyBuilder.createTopology()
);
And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:
- TOPOLOGY_MAX_TASK_PARALLELISM: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. Config#setMaxTaskParallelism().
How to change the parallelism of a running topology
A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.
You have two options to rebalance a topology:
- Use the Storm web UI to rebalance the topology.
- Use the CLI tool storm rebalance as described below.
Here is an example of using the CLI tool:
# Reconfigure the topology "mytopology" to use 5 worker processes,
# the spout "blue-spout" to use 3 executors and
# the bolt "yellow-bolt" to use 10 executors.
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
相关推荐
Herbert Edelsbrunner & Hohn Harer: Computational Topology: An Introduction. 计算拓扑学导论。
电子书,经典的结构拓扑优化入门书籍! 电子书,经典的结构拓扑优化入门书籍!!
Algebraic topology: a first course
The Medieval Kingdom topology: Peer relations in kindergarten children Psychology in the Schools Volume 32, April I995 THE MEDIEVAL KINGDOM TOPOLOGY: PEER RELATIONS IN KINDERGARTEN CHILDREN ...
Algebra, Topology, Differential Calculus, and Optimization TheoryFor Computer Science and Machine LearningJean Gallier and Jocelyn Quaintance Department of Computer and Information ScienceUniversity ...
H consensus control of multi-agent systems with switching topology: a dynamic output feedback protocol
学习计算机科学总共需要多少数学基础?宾夕法尼亚大学计算机和信息科学系教授Jean Gallier用一本1960页书的容量解决了所有的问题。该本书涵盖了计算机科学所需的线性代数、微分和最优化理论等基础知识,包含十分详尽...
SUMS77 Topology, Calculus and Approximation, Vilmos Komornik (2017).zip
gem 'topology' 然后执行: $ bundle 或者自己安装: $ gem install Topology 用法 top = Topology . new Set [ Set [ 1 ] , Set [ 2 ] ] => #<Topology sos=#>, #, #, #<Set>}>> 贡献 分叉它( ) 创建您的...
拓扑结构
拓扑 合并到 ALSA git 之前的 ASoC 拓扑用户空间工具
该ppt根据论文内容制作,非该论文作者在INFOCOM上使用的原始ppt。内容仅供参考,具体细节,请参考原论文内容。
setting have provided new and interesting stimuli for topology. These prob- lems also have increased the interaction between topology and related areas of mathematics such as order theory and ...
nsq_topology 这是一组脚本,用于生成NSQ拓扑图,如下所示: 它包含两个组成部分:nsq_data 该脚本建立了支持拓扑图的数据。 通过与nsqlookupd进行通信,然后与集群中的所有nsqds进行通信。 您可能希望在托管...
拓扑框架 用于构建和测试网络拓扑的框架,并支持pytest。 文献资料 执照 ...Licensed under the Apache License, Version 2.0 (the ...Unless required by applicable law or agreed to in writing, software distribute
数据结构的组织形式在算法的程序实现中占有重要地位。探讨了网格数据处理中的数据结构组织问题, 提出了一种动态的、有较强适应性的通用流形网格数据组织结构,并以不同实例验证了所提出的数据结构在时间上的即时有效...
闪电网络八卦 ... 由于使用了基于Sphinx构造[sphinx2009]的洋葱路由,因此这是必需的,在该过程中,将要传输的数据(即付款)与关联的路由数据包一起发送,该路由数据包指定了应该通过其传输数据的路由。...
拓扑结构 该库开发了Coq中一般拓扑的一些基本概念和... Zorn的引理(此存储库的一部分集库) Coq命名空间: Topology 相关出版物:无建造和安装说明安装最新版拓扑的最简单方法是通过 : opam repo add coq-released ...