The high-level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model . Pregel
computations consist of a sequence of iterations, called supersteps. During a superstep the framework invokes a user-defined function for each vertex, conceptually in parallel.The function specifies behavior at a single vertex V and a single superstep S. It can read messages sent to V in superstep S − 1, send messages to other vertices that will be received at superstep S + 1, and modify the state of V and its outgoing edges. Messages are typically sent along outgoing edges, but a message may be sent to any vertex whose identifier is known.
The input to a Pregel computation is a directed graph in which each vertex is uniquely identified by a string vertex
identifier. Each vertex is associated with a modifiable, user defined value. The directed edges are associated with their source vertices, and each edge consists of a modifiable, user defined value and a target vertex identifier.
A typical Pregel computation consists of input, when the graph is initialized, followed by a sequence of supersteps separated by global synchronization points until the algorithm terminates, and finishing with output.
Within each superstep the vertices compute in parallel,each executing the same user-defined function that expresses the logic of a given algorithm. A vertex can modify its state or that of its outgoing edges, receive messages sent to it in the previous superstep, send messages to other vertices (to be received in the next superstep), or even mutate the topology of the graph. Edges are not first-class citizens in this model, having no associated computation.
Algorithm termination is based on every vertex voting to halt. In superstep 0, every vertex is in the active state; all active vertices participate in the computation of any given superstep. A vertex deactivates itself by voting to halt. This means that the vertex has no further work to do unless triggered externally, and the Pregel framework will not execute that vertex in subsequent supersteps unless it receives a message. If reactivated by a message, a vertex must explicitly deactivate itself again. The algorithm as a whole terminates when all vertices are simultaneously inactive and there are no messages in transit. This simple state machine is illustrated in Figure 1.
The output of a Pregel program is the set of values explicitly output by the vertices. It is often a directed graph
isomorphic to the input, but this is not a necessary property of the system because vertices and edges can be added and removed during computation. A clustering algorithm,for example, might generate a small set of disconnected vertices selected from a large graph. A graph mining algorithm might simply output aggregated statistics mined from the graph.
Figure 2 illustrates these concepts using a simple example:given a strongly connected graph where each vertex contains a value, it propagates the largest value to every vertex. In each superstep, any vertex that has learned a larger value from its messages sends it to all its neighbors. When no further vertices change in a superstep, the algorithm terminates.
We chose a pure message passing model, omitting remote reads and other ways of emulating shared memory, for two reasons. First, message passing is sufficiently expressive that there is no need for remote reads. We have not found any graph algorithms for which message passing is insufficient.Second, this choice is better for performance. In a cluster environment, reading a value from a remote machine incurs high latency that can’t easily be hidden. Our message passing model allows us to amortize latency by delivering messages asynchronously in batches.
Graph algorithms can be written as a series of chained MapReduce invocations [11, 30]. We chose a different model for reasons of usability and performance. Pregel keeps vertices and edges on the machine that performs computation,and uses network transfers only for messages. MapReduce,however, is essentially functional, so expressing a graph algorithm as a chained MapReduce requires passing the entire state of the graph from one stage to the next—in general requiring much more communication and associated serialization overhead. In addition, the need to coordinate the steps of a chained MapReduce adds programming complexity that is avoided by Pregel’s iteration over supersteps.
The orignal paper see:
Grzegorz Malewicz, Matthew H. Austern, 《Pregel: A System for Large-Scale Graph Processing》
相关推荐
Giraph 是 Google 于 2010 年发布的论文 Pregel: a system for large-scale graph processing 的开源实现。Giraph 是以 Hadoop 为基础开发的上层应用,其系统架构和计算模型与 Pregel 保持了一致。同时也在 Pregel ...
google的大规模图形处理引擎 Pregel的论文 In this paper we present a computational model suitable for this task....is a framework for processing large graphs that is expressive and easy to program.
6个pdf,Google官方发布的。 [1]Bigtable: A Distributed Storage System for Structured Data [2]MapReduce: Simplified Data Processing on Large Clusters ...[6]Pregel: A System for Large-Scale Graph Processing
Colossus Papers: spanner, Pregel, Dremel, Caffeine. A second generation of google file system and large-scale distributed computing patforms and database
使用Pregel和PageRank算法进行图分析已实施的操作基于图度的社交图中大多数连接的用户。 基于单用户分离度。 输入是用户的ID-输出是具有用户的元组列表以及它们之间的分隔度。 两个定义的用户之间的隔离度(作为单个...
程序构建:~/reef-pregel$ mvn 全新安装程序执行: /reef-pregel/bin$ ./run.sh -input file:/// /reef-pregel/bin/数据集(如果您在 hadoop 上运行此程序,请使用此选项:-local false) 如何验证输出:$ grep -r "R...
DMID 在信息系统亚琛工业大学(RWTH Aachen University)主席的学士论文“ Pregel:重叠社区检测算法的并行实现”中,实现了针对giraph的重叠社区检测算法DMID的实现。 ## SETUP在下面,我们将描述如何在Ubuntu 64位...
pregel - 图卷积网络的Tensorflow实现
Pregel 文档
Spark GraphX是一个新的Spark API ,它用于图和分布式图( graph-parallel )的计算GraphX综合了Pregel和GraphLab具有的优点,即接口相对简单,又保证性能,可以应对点分割的本专题会详细介绍GraphX的实现原理,转换...
针对这一问题,提出基于Pregel-like的社会网络隐私保护方法。该方法避免了传统MapReduce模型在多次迭代处理时的数据反复迁移和作业连续调度等问题,利用“节点为中心”的思想,通过节点间消息传递和程序的多次迭代...
图形 计划详情: (i)日期:2020年4月9... (ii)使用Spark GraphX上的Pregel完成第二个程序(Graph02.scala)。 代码执行:您可以在Apache Spark兼容服务器上运行此程序。 注意:我包括了示例输入和输出.txt文件。
预凝胶最短路径 Pregel 系统的最短路径算法。 使用 Apache Spark 和 GraphX API 实现。 Scala
本文档比较了两个知名的计算模型Pregel和MapReduce的特点及其应用场景。
因此,本文提出了一种基于Pregel-like系统的个性化社交网络隐私保护方法。 这种方法为不同的顶点提供了不同类型的隐私保护,并且在类似Pregel的系统中采用了“像顶点一样思考”的概念。 它通过顶点与程序多次迭代...
本文提出了一种在大型 RDF 图上回答基于 Pregel 的并行 Provenance-aware Regular Path Queries (P3RPQ) 的新方法。 我们的方法是使用 Pregel 框架开发的,该框架利用 Glushkov 自动机来并行跟踪 RPQ 的匹配过程。 ...
#资源达人分享计划#
#资源达人分享计划#
在ArangoDB中引入Pregel框架,通过Worker算法、合成算法、pregelRunner模块来执行不同的实现方式。来试试吧!ArangoDB团队研究出一种算法,能够在一个图中识别出已连接的子图,文中以国家为例;在ArangoDB中引入...
Frequent Subgraph Mining Based on Pregel