`
mabusyao
  • 浏览: 247330 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

java7中的ThreadLocalRandom(转)

 
阅读更多

今天早上看到一个关于java7中的ThreadLocalRandom的用法的帖子,说是比Math.Random()速度要快一倍,转过来学习一下 :

 

When I first wrote this blog my intention was to introduce you to a class ThreadLocalRandom which is new in Java 7 to generate random numbers. I have analyzed the performance of ThreadLocalRandom in a series of micro-benchmarks to find out how it performs in a single threaded environment. The results were relatively surprising: although the code is very similar, ThreadLocalRandom is twice as fast as Math.random()! The results drew my interest and I decided to investigate this a little further. I have documented my anlysis process. It is an examplary introduction into analysis steps, technologies and some of the JVM diagnostic tools required to understand differences in the performance of small code segments. Some experience with the described toolset and technologies will enable you to write faster Java code for your specific Hotspot target environment.

OK, that's enough talk, let's get started!

Math.random() works on a static singleton instance of Random whilst ThreadLocalRandom -> current() -> nextDouble() works on a thread local instance of ThreadLocalRandom which is a subclass of Random. ThreadLocal introduces the overhead of variable look up on each call to the current()-method. Considering what I've just said, then it's really a little surprising that it's twice as fast as Math.random() in a single thread, isn't it? I didn't expect such a significant difference. 

Again, I am using a tiny micro-benchmarking framework presented in one of Heinz blogs. The framework that Heinz developed takes care of several challenges in benchmarking Java programs on modern JVMs. These challenges include: warm-up, garbage collection, accuracy of Javas time API, verification of test accuracy and so forth. 

Here are my runnable benchmark classes:

01.public class ThreadLocalRandomGenerator implements BenchmarkRunnable {
02. 
03.private double r;
04. 
05.@Override
06.public void run() {
07.r = r + ThreadLocalRandom.current().nextDouble();
08.}
09. 
10.public double getR() {
11.return r;
12.}
13. 
14.@Override
15.public Object getResult() {
16.return r;
17.}
18. 
19.}
20. 
21.public class MathRandomGenerator implements BenchmarkRunnable {
22. 
23.private double r;
24. 
25.@Override
26.public void run() {
27.r = r + Math.random();
28.}
29. 
30.public double getR() {
31.return r;
32.}
33. 
34.@Override
35.public Object getResult() {
36.return r;
37.}
38.}

Let's run the benchmark using Heinz' framework:

01.public class FirstBenchmark {
02. 
03.private static List<BenchmarkRunnable> benchmarkTargets = Arrays.asList(newMathRandomGenerator(),
04.new ThreadLocalRandomGenerator());
05. 
06.public static void main(String[] args) {
07.DecimalFormat df = new DecimalFormat("#.##");
08.for (BenchmarkRunnable runnable : benchmarkTargets) {
09.Average average = new PerformanceHarness().calculatePerf(new PerformanceChecker(1000, runnable), 5);
10.System.out.println("Benchmark target: " + runnable.getClass().getSimpleName());
11.System.out.println("Mean execution count: " + df.format(average.mean()));
12.System.out.println("Standard deviation: " + df.format(average.stddev()));
13.System.out.println("To avoid dead code coptimization: " + runnable.getResult());
14.}
15.}
16.}

Notice: To make sure the JVM does not identify the code as "dead code" I return a field variable and print out the result of my benchmarking immediately. That's why my runnable classes implement an interface called RunnableBenchmark. I am running this benchmark three times. The first run is in default mode, with inlining and JIT optimization enabled:

1.Benchmark target: MathRandomGenerator
2.Mean execution count: 14773594,4
3.Standard deviation: 180484,9
4.To avoid dead code coptimization: 6.4005410634212025E7
5.Benchmark target: ThreadLocalRandomGenerator
6.Mean execution count: 29861911,6
7.Standard deviation: 723934,46
8.To avoid dead code coptimization: 1.0155096190946539E8

 

Then again without JIT optimization (VM option -Xint):

1.Benchmark target: MathRandomGenerator
2.Mean execution count: 963226,2
3.Standard deviation: 5009,28
4.To avoid dead code coptimization: 3296912.509302683
5.Benchmark target: ThreadLocalRandomGenerator
6.Mean execution count: 1093147,4
7.Standard deviation: 491,15
8.To avoid dead code coptimization: 3811259.7334526842

The last test is with JIT optimization, but with -XX:MaxInlineSize=0 which (almost) disables inlining:

1.Benchmark target: MathRandomGenerator
2.Mean execution count: 13789245
3.Standard deviation: 200390,59
4.To avoid dead code coptimization: 4.802723374491231E7
5.Benchmark target: ThreadLocalRandomGenerator
6.Mean execution count: 24009159,8
7.Standard deviation: 149222,7
8.To avoid dead code coptimization: 8.378231170741305E7

Let's interpret the results carefully: With full JVM JIT optimization the ThreadLocalRanom is twice as fast as Math.random(). Turning JIT optimization off shows that the two perform equally good (bad) then. Method inlining seems to make 30% of the performance difference. The other differences may be due to other otimization techniques.

One reason why the JIT compiler can tune ThreadLocalRandom more effectively is the improved implementation of ThreadLocalRandom.next(). 

01.public class Random implements java.io.Serializable {
02....
03.protected int next(int bits) {
04.long oldseed, nextseed;
05.AtomicLong seed = this.seed;
06.do {
07.oldseed = seed.get();
08.nextseed = (oldseed * multiplier + addend) & mask;
09.while (!seed.compareAndSet(oldseed, nextseed));
10.return (int)(nextseed >>> (48 - bits));
11.}
12....
13.}
14. 
15.public class ThreadLocalRandom extends Random {
16....
17.protected int next(int bits) {
18.rnd = (rnd * multiplier + addend) & mask;
19.return (int) (rnd >>> (48-bits));
20.}
21....
22.}

The first snippet shows Random.next() which is used intensively in the benchmark of Math.random(). Compared to ThreadLocalRandom.next() the method requires significantly more instructions, although both methods do the same thing. In the Random class the seed variable stores a global shared state to all threads, it changes with every call to the next()-method. Therefore AtomicLong is required to safely access and change the seed value in calls to nextDouble(). ThreadLocalRandom on the other hand is - well - thread local :-) The next()-method does not have to be thread safe and can use an ordinary long variable as seed value. 

About method inlining and ThreadLocalRandom

One very effective JIT optimization is method inlining. In hot paths executed frequently the hotspot compiler decides to inline the code of called methods (child method) into the callers method (parent method). "Inlining has important benefits. It dramatically reduces the dynamic frequency of method invocations, which saves the time needed to perform those method invocations. But even more importantly, inlining produces much larger blocks of code for the optimizer to work on. This creates a situation that significantly increases the effectiveness of traditional compiler optimizations, overcoming a major obstacle to increased Java programming language performance."

Since Java 7 you can monitor method inlining by using diagnostic JVM options. Running the code with '-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining' will show the inlining efforts of the JIT compiler. Here are the relevant sections of the output for Math.random() benchmark:

1.13   java.util.Random::nextDouble (24 bytes)
2.3   java.util.Random::next (47 bytes)   callee is too large
3.13   java.util.Random::next (47 bytes)   callee is too large

The JIT compiler cannot inline the Random.next() method that is called in Random.nextDouble(). This is the inlining output of ThreaLocalRandom.next():

1.8   java.util.Random::nextDouble (24 bytes)
2.3   java.util.concurrent.ThreadLocalRandom::next (31 bytes)
3.13   java.util.concurrent.ThreadLocalRandom::next (31 bytes)

 

Due to the fact that the next()-method is shorter (31 bytes) it can be inlined. Because the next()-method is called intensively in both benchmarks this log suggests that method inlining may be one reason why ThreadLocalRandom performs significantly faster. 

To verify that and to find out more it is required to deep dive into assembly code. With Java 7 JDKs it is possible to print out assembly code into the console. See here on how to enable -XX:+PrintAssembly VM Option. The option will print out the JIT optimized code, that means you can see the code the JVM actually executes. I have copied the relevant assembly code into the links below.

Assembly code of ThreadLocalRandomGenerator.run() here.
Assembly code of MathRandomGenerator.run() here.
Assembly code of Random.next() called by Math.random() here.

Assembly code is machine-specific and low level code, it's more complicated to read then bytecode. Let's try to verify that method inlining has a relevant effect on performance in my benchmarks and: are there other obvious differences how the JIT compiler treats ThreadLocalRandom and Math.random()? In ThreadLocalRandomGenerator.run() there is no procedure call to any of the subroutines like Random.nextDouble() or ThreatLocalRandom.next(). There is only one virtual (hence expensive) method call to ThreadLocal.get() visible (see line 35 in ThreadLocalRandomGenerator.run() assembly). All the other code is inlined into ThreadLocalRandomGenerator.run(). In the case of MathRandomGenerator.run() there are two virtual method calls to Random.next() (see block B4 line 204 ff. in the assembly code of MathRandomGenerator.run()). This fact confirms our suspicion that method inlining is one important root cause for the performance difference. Further more, due to synchronization hassle, there are considerably more (and some expensive!) assembly instructions required in Random.next() which is also counterproductive in terms of execution speed.

Understanding the overhead of the invokevirtual instruction

So why is (virtual) method invocation expensive and method inlining so effective? The pointer of invokevirtual instructions is not an offset of a concrete method in a class instance. The compiler does not know the internal layout of a class instance. Instead, it generates symbolic references to the methods of an instance, which are stored in the runtime constant pool. Those runtime constant pool items are resolved at run time to determine the actual method location. This dynamic (run-time) binding requires verification, preparation and resolution which can considerably effect performance. (see Invoking Methods and Linking in the JVM Spec for details)

That's all for now. The disclaimer: Of course, the list of topics you need to understand to solve performance riddles is endless. There is a lot more to understand then micro-benchmarking, JIT optimization, method inlining, java byte code, assemby language and so forth. Also, there are lot more root causes for performance differences then just virtual method calls or expensive thread synchronization instructions. However, I think the topics I have introduced are a good start into such deep diving stuff. Looking forward to critical and enjoyable comments!

分享到:
评论

相关推荐

    java中ThreadLocalRandom的使用详解

    主要介绍了java中ThreadLocalRandom的使用详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧

    java7帮助文档

    Java Platform Standard Edition 7 Documentation What's New Documentation Release Notes Tutorials and Training The Java Tutorials Java Training More Information Java SE 7 Names and ...

    java随机数

    但是,期待已久的Java 7提供了一种新的方式来产生随机数: 1.java.util.concurrent.ThreadLocalRandom.current().nextInt(10) 这个新的API综合了其他两种方法的优点:单一实例/静态访问,就像Math.random()一样灵活...

    java 随机数.docx

    在Java中,生成随机数可以使用java.util.Random类或java.util.concurrent.ThreadLocalRandom类。这两个类提供了生成伪随机数的方法,下面是它们的详细介绍: 使用java.util.Random类生成随机数: 创建Random类的...

    java7-improvements:Java 7 的新特性和改进

    Java SE 7 的新特性和改进JAVA SE 7 新特性的简单示例如下: 自动资源管理钻石操作员新文件 IO API 蔚来手表服务JDBC 4 多重捕捉字符串 Switch 语句Fork 加入框架ThreadLocalRandom 类下划线数字文字

    JAVA 生成随机数的3种方法

    ThreadLocalRandom class 1) java.util.Random 要使用这个方法,首先要生成一个实例。来调用它的nextInt(),nextDouble(), nextLong() 等等 我们可以使用这个类来生成随机的 integer, float, double, long, booleans ...

    跑腿源码java-cryptographic-hashing:密码散列

    跑腿源码java ...源代码编写:使用的 JAVA 库:存储库下可用的代码:IDE 使用:IntelliJ ...java.util.concurrent.ThreadLocalRandom; 字符串池解决了两个不同的问题:它们通常在执行大量非并发分配时提供向前

    java8stream源码-BetterRandom:更好的Java随机数生成。Java文档:

    从新线程访问时让它自动拆分,这样它甚至比ThreadLocalRandom更快,但在线程之间传递仍然安全。 包装任何Supplier以创建一个线程本地的Random ,但可以像任何其他Random一样访问。 使用AesCounterRandom ,这是一种...

    RandomJson:提供KotlinJava库以创建随机json字符串

    该库可以在Java 10+中使用。 参见。 用法 配置 首先,我们需要为创建者创建配置。 此配置指定每种原始json类型的随机值生成器。 该库包括每种类型的一些基本生成器。 val config = RandomJsonConfig ( ...

    jmh-samples:一些 JMH - Java Microbenchmark Harness Samples

    一些 JMH - Java Microbenchmark Harness Samples 包含一些 JMH 示例代码。 地图推杆 比较 ConcurrentHashMap 与同步映射 原子基准 比较 AtomicLong 与 LongAdder 随机基准 比较 Random 与 ThreadLocalRandom

    TinyEventBus:一个小型且快速的pubsub实现,具有Java 8和11的订户优先级和事件取消功能

    一个小而又快速的pubsub实现,具有Java 8和11的订阅者优先级和事件取消功能。 利用 void run() { Bus bus = new Bus (); bus . reg( Sub . of( System . out :: println)); bus . pub( " Hello World! " ); } ...

    AntSimulation

    蚂蚁模拟 运行 AntFrame.java。 线程数应小于10,试验次数应小于100M。 维数是多维数据集的维数,在这个问题的情况下是 3。 ThreadLocalRandom 类需要 Java 1.7

    mockeyjockey:MockeyJockey-自定义数据生成器,考虑了大数据项目

    使用ThreadLocalRandom初始化MockeyJockey: MockeyJockey.ThreadLocal u = new MockeyJockey.ThreadLocal(); 常见的生成器方法 .get()将返回一个生成的值。 .withExplicitReset(boolean)将帮助您控制显式重置...

Global site tag (gtag.js) - Google Analytics