`

spark集群中运行SparkPi的示例

 
阅读更多

    1、SparkPi.scala源码(官网例子)

     

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}

 

2、在Intellij IDE集成开发环境中运行,会出错,需要修改代码,增加

      val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")

 

3、利用IDE环境,把代码打成jar,只需要源码程序即可(其它的引用包去掉)

 

4、然后在IDE的代码中增加

     spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")

     把helloworld.jar分发到各个worker中

 

5、运行即可

    14/12/31 15:28:57 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:21) finished in 4.500 s

    14/12/31 15:28:58 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:21, took 8.608873 s

    Pi is roughly 3.14468

 

    修改后的运行代码如下:

import scala.math.random
import org.apache.spark.{SparkConf, SparkContext}

/**
 * Created by cec on 12/31/14.
 */


object SparkPi {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://master:7077")
    val spark = new SparkContext(conf)
    spark.addJar("/home/cec/spark-1.2.0-bin-hadoop2.4/helloworld.jar")
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }

}

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics