- 浏览: 254553 次
- 性别:
- 来自: 南京
文章分类
最新评论
-
mabusyao:
漠北空城 写道请问下,你这个是JDK版本是多少呢?!忘记了,应 ...
HashMap 源码解读 -
漠北空城:
请问下,你这个是JDK版本是多少呢?!
HashMap 源码解读 -
schumee:
完美团队~
项目沉思录 - 1.1 -
winie:
整理下 搞成引擎嘛 国产需要这样的engine
简单工作流引擎 -
mabusyao:
某位同学给我提供的堪称完美的解决方案:1. 将三个int数组放 ...
CraneWork
今天早上看到一个关于java7中的ThreadLocalRandom的用法的帖子,说是比Math.Random()速度要快一倍,转过来学习一下 :
When I first wrote this blog my intention was to introduce you to a class ThreadLocalRandom which is new in Java 7 to generate random numbers. I have analyzed the performance of ThreadLocalRandom in a series of micro-benchmarks to find out how it performs in a single threaded environment. The results were relatively surprising: although the code is very similar, ThreadLocalRandom is twice as fast as Math.random()! The results drew my interest and I decided to investigate this a little further. I have documented my anlysis process. It is an examplary introduction into analysis steps, technologies and some of the JVM diagnostic tools required to understand differences in the performance of small code segments. Some experience with the described toolset and technologies will enable you to write faster Java code for your specific Hotspot target environment.
OK, that's enough talk, let's get started!
Math.random() works on a static singleton instance of Random whilst ThreadLocalRandom -> current() -> nextDouble() works on a thread local instance of ThreadLocalRandom which is a subclass of Random. ThreadLocal introduces the overhead of variable look up on each call to the current()-method. Considering what I've just said, then it's really a little surprising that it's twice as fast as Math.random() in a single thread, isn't it? I didn't expect such a significant difference.
Again, I am using a tiny micro-benchmarking framework presented in one of Heinz blogs. The framework that Heinz developed takes care of several challenges in benchmarking Java programs on modern JVMs. These challenges include: warm-up, garbage collection, accuracy of Javas time API, verification of test accuracy and so forth.
Here are my runnable benchmark classes:
01.
public
class
ThreadLocalRandomGenerator
implements
BenchmarkRunnable {
02.
03.
private
double
r;
04.
05.
@Override
06.
public
void
run() {
07.
r = r + ThreadLocalRandom.current().nextDouble();
08.
}
09.
10.
public
double
getR() {
11.
return
r;
12.
}
13.
14.
@Override
15.
public
Object getResult() {
16.
return
r;
17.
}
18.
19.
}
20.
21.
public
class
MathRandomGenerator
implements
BenchmarkRunnable {
22.
23.
private
double
r;
24.
25.
@Override
26.
public
void
run() {
27.
r = r + Math.random();
28.
}
29.
30.
public
double
getR() {
31.
return
r;
32.
}
33.
34.
@Override
35.
public
Object getResult() {
36.
return
r;
37.
}
38.
}
Let's run the benchmark using Heinz' framework:
01.
public
class
FirstBenchmark {
02.
03.
private
static
List<BenchmarkRunnable> benchmarkTargets = Arrays.asList(
new
MathRandomGenerator(),
04.
new
ThreadLocalRandomGenerator());
05.
06.
public
static
void
main(String[] args) {
07.
DecimalFormat df =
new
DecimalFormat(
"#.##"
);
08.
for
(BenchmarkRunnable runnable : benchmarkTargets) {
09.
Average average =
new
PerformanceHarness().calculatePerf(
new
PerformanceChecker(
1000
, runnable),
5
);
10.
System.out.println(
"Benchmark target: "
+ runnable.getClass().getSimpleName());
11.
System.out.println(
"Mean execution count: "
+ df.format(average.mean()));
12.
System.out.println(
"Standard deviation: "
+ df.format(average.stddev()));
13.
System.out.println(
"To avoid dead code coptimization: "
+ runnable.getResult());
14.
}
15.
}
16.
}
Notice: To make sure the JVM does not identify the code as "dead code" I return a field variable and print out the result of my benchmarking immediately. That's why my runnable classes implement an interface called RunnableBenchmark. I am running this benchmark three times. The first run is in default mode, with inlining and JIT optimization enabled:
1.
Benchmark target: MathRandomGenerator
2.
Mean execution count: 14773594,4
3.
Standard deviation: 180484,9
4.
To avoid dead code coptimization: 6.4005410634212025E7
5.
Benchmark target: ThreadLocalRandomGenerator
6.
Mean execution count: 29861911,6
7.
Standard deviation: 723934,46
8.
To avoid dead code coptimization: 1.0155096190946539E8
Then again without JIT optimization (VM option -Xint):
1.
Benchmark target: MathRandomGenerator
2.
Mean execution count: 963226,2
3.
Standard deviation: 5009,28
4.
To avoid dead code coptimization: 3296912.509302683
5.
Benchmark target: ThreadLocalRandomGenerator
6.
Mean execution count: 1093147,4
7.
Standard deviation: 491,15
8.
To avoid dead code coptimization: 3811259.7334526842
The last test is with JIT optimization, but with -XX:MaxInlineSize=0 which (almost) disables inlining:
1.
Benchmark target: MathRandomGenerator
2.
Mean execution count: 13789245
3.
Standard deviation: 200390,59
4.
To avoid dead code coptimization: 4.802723374491231E7
5.
Benchmark target: ThreadLocalRandomGenerator
6.
Mean execution count: 24009159,8
7.
Standard deviation: 149222,7
8.
To avoid dead code coptimization: 8.378231170741305E7
Let's interpret the results carefully: With full JVM JIT optimization the ThreadLocalRanom is twice as fast as Math.random(). Turning JIT optimization off shows that the two perform equally good (bad) then. Method inlining seems to make 30% of the performance difference. The other differences may be due to other otimization techniques.
One reason why the JIT compiler can tune ThreadLocalRandom more effectively is the improved implementation of ThreadLocalRandom.next().
01.
public
class
Random
implements
java.io.Serializable {
02.
...
03.
protected
int
next(
int
bits) {
04.
long
oldseed, nextseed;
05.
AtomicLong seed =
this
.seed;
06.
do
{
07.
oldseed = seed.get();
08.
nextseed = (oldseed * multiplier + addend) & mask;
09.
}
while
(!seed.compareAndSet(oldseed, nextseed));
10.
return
(
int
)(nextseed >>> (
48
- bits));
11.
}
12.
...
13.
}
14.
15.
public
class
ThreadLocalRandom
extends
Random {
16.
...
17.
protected
int
next(
int
bits) {
18.
rnd = (rnd * multiplier + addend) & mask;
19.
return
(
int
) (rnd >>> (
48
-bits));
20.
}
21.
...
22.
}
The first snippet shows Random.next() which is used intensively in the benchmark of Math.random(). Compared to ThreadLocalRandom.next() the method requires significantly more instructions, although both methods do the same thing. In the Random class the seed variable stores a global shared state to all threads, it changes with every call to the next()-method. Therefore AtomicLong is required to safely access and change the seed value in calls to nextDouble(). ThreadLocalRandom on the other hand is - well - thread local :-) The next()-method does not have to be thread safe and can use an ordinary long variable as seed value.
About method inlining and ThreadLocalRandom
One very effective JIT optimization is method inlining. In hot paths executed frequently the hotspot compiler decides to inline the code of called methods (child method) into the callers method (parent method). "Inlining has important benefits. It dramatically reduces the dynamic frequency of method invocations, which saves the time needed to perform those method invocations. But even more importantly, inlining produces much larger blocks of code for the optimizer to work on. This creates a situation that significantly increases the effectiveness of traditional compiler optimizations, overcoming a major obstacle to increased Java programming language performance."
Since Java 7 you can monitor method inlining by using diagnostic JVM options. Running the code with '-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining' will show the inlining efforts of the JIT compiler. Here are the relevant sections of the output for Math.random() benchmark:
1.
@
13
java.util.Random::nextDouble (
24
bytes)
2.
@
3
java.util.Random::next (
47
bytes) callee is too large
3.
@
13
java.util.Random::next (
47
bytes) callee is too large
The JIT compiler cannot inline the Random.next() method that is called in Random.nextDouble(). This is the inlining output of ThreaLocalRandom.next():
1.
@
8
java.util.Random::nextDouble (
24
bytes)
2.
@
3
java.util.concurrent.ThreadLocalRandom::next (
31
bytes)
3.
@
13
java.util.concurrent.ThreadLocalRandom::next (
31
bytes)
Due to the fact that the next()-method is shorter (31 bytes) it can be inlined. Because the next()-method is called intensively in both benchmarks this log suggests that method inlining may be one reason why ThreadLocalRandom performs significantly faster.
To verify that and to find out more it is required to deep dive into assembly code. With Java 7 JDKs it is possible to print out assembly code into the console. See here on how to enable -XX:+PrintAssembly VM Option. The option will print out the JIT optimized code, that means you can see the code the JVM actually executes. I have copied the relevant assembly code into the links below.
Assembly code of ThreadLocalRandomGenerator.run() here.
Assembly code of MathRandomGenerator.run() here.
Assembly code of Random.next() called by Math.random() here.
Assembly code is machine-specific and low level code, it's more complicated to read then bytecode. Let's try to verify that method inlining has a relevant effect on performance in my benchmarks and: are there other obvious differences how the JIT compiler treats ThreadLocalRandom and Math.random()? In ThreadLocalRandomGenerator.run() there is no procedure call to any of the subroutines like Random.nextDouble() or ThreatLocalRandom.next(). There is only one virtual (hence expensive) method call to ThreadLocal.get() visible (see line 35 in ThreadLocalRandomGenerator.run() assembly). All the other code is inlined into ThreadLocalRandomGenerator.run(). In the case of MathRandomGenerator.run() there are two virtual method calls to Random.next() (see block B4 line 204 ff. in the assembly code of MathRandomGenerator.run()). This fact confirms our suspicion that method inlining is one important root cause for the performance difference. Further more, due to synchronization hassle, there are considerably more (and some expensive!) assembly instructions required in Random.next() which is also counterproductive in terms of execution speed.
Understanding the overhead of the invokevirtual instruction
So why is (virtual) method invocation expensive and method inlining so effective? The pointer of invokevirtual instructions is not an offset of a concrete method in a class instance. The compiler does not know the internal layout of a class instance. Instead, it generates symbolic references to the methods of an instance, which are stored in the runtime constant pool. Those runtime constant pool items are resolved at run time to determine the actual method location. This dynamic (run-time) binding requires verification, preparation and resolution which can considerably effect performance. (see Invoking Methods and Linking in the JVM Spec for details)
That's all for now. The disclaimer: Of course, the list of topics you need to understand to solve performance riddles is endless. There is a lot more to understand then micro-benchmarking, JIT optimization, method inlining, java byte code, assemby language and so forth. Also, there are lot more root causes for performance differences then just virtual method calls or expensive thread synchronization instructions. However, I think the topics I have introduced are a good start into such deep diving stuff. Looking forward to critical and enjoyable comments!
发表评论
-
大数据下的实体解析
2016-07-07 12:03 671大数据时代的实体解析困境 <!--[if !sup ... -
中文相似度匹配算法
2015-12-30 14:44 1921基于音形码的中文字 ... -
各种语言写的wordcount
2015-09-24 16:07 0Java版本: String input ... -
数组双指针算法的研究
2015-07-14 16:59 2467双指针算法在数组/链 ... -
初识ThreadLocal
2015-07-07 13:15 1520最近公司在进行Java开发人员的招聘活动,其中有一道面试题 ... -
摩尔投票法
2015-06-30 20:13 18433摩尔投票法 提问: 给定一个int型数组,找出该数 ... -
小心寄存器
2012-11-08 13:53 4试试这段代码就知道了 public cla ... -
项目沉思录 - 1.0
2012-07-14 08:33 1135团队建设 1. 招聘 1.1 团队缺什么 (团队 ... -
项目沉思录 - 0
2012-07-12 16:30 1016我常常会回忆起我职业生涯初期的两个项目: 其中一个庞大,复杂, ... -
简单工作流引擎
2012-07-06 16:58 2418从公司的一个项目中挖出来的工作流引擎的代码,虽然是一个很简单的 ... -
Always clean the ThreadLocal variables.
2012-05-24 09:16 1219Any variable stored in ThreadLo ... -
STRUTS2 源码 - Logging System
2012-05-24 08:51 1410看了STRUTS2的源码,了解了它的logging系统,觉得还 ... -
在线词典的数据结构实现。
2012-05-18 08:37 0昨天在网上看到了一道百度的面试题: Baidu写道 ... -
Log4j 代码学习 - Factory
2012-05-17 08:47 1114我们最早提到,Log4j的初始代码在LogManager的静态 ... -
Log4j 代码学习 - Appender
2012-05-16 09:09 1363在上一篇文章里,我们 ... -
Log4j 代码学习
2012-05-15 14:58 1167最近闲来无事,正好手头上有Log4j的代码,于是就拿来学习了下 ... -
(转)追MM与23种设计模式
2011-11-16 14:13 10031、FACTORY—追MM少不了请吃饭了,麦当劳的鸡翅和肯德 ... -
(转)Java 参数列表
2011-11-05 19:48 2950下面的讨论以Windows ... -
(转)TOMCAT源码分析
2011-10-17 16:06 2151TOMCAT源码分析(启动框架 ... -
java写的四则运算器
2011-08-19 22:19 2743本打算做一个从RE到NFA的转换器,思路已经理清了,但是在动手 ...
相关推荐
ThreadLocalRandom是JDK 7中引入的一个类,用于解决多线程环境中Random类的性能瓶颈问题。 首先,我们来了解一下Random类的使用。Random类是java中生成随机数的常用方法,但是它不是线程安全的。在多线程环境中,...
描述中的"基于Java开发的小程序抽奖转盘"表明这个项目是利用Java的技术栈来创建的。Java是一种广泛使用的面向对象的编程语言,具有跨平台性、稳定性和高性能的特点,适合开发各种类型的应用程序,包括移动端和Web端...
Java Platform Standard Edition 7 Documentation What's New Documentation Release Notes Tutorials and Training The Java Tutorials Java Training More Information Java SE 7 Names and ...
在Java 7中,你可以使用`setSeed(long seed)`方法更精确地控制随机数生成的起始状态。这在需要复现特定随机序列或者比较不同随机数生成策略时非常有用。 2. **生成指定范围的随机整数**: `nextInt(int bound)`...
10. **ThreadLocalRandom**:Java 7引入的新随机数生成器,相比传统的`java.util.Random`,ThreadLocalRandom在多线程环境中性能更优,因为它减少了锁的竞争。 通过学习这些知识点,Java开发者能够更好地利用Java 7...
这个类是Java 7引入的,结合了单例模式和线程局部变量的优势。 值得注意的是,使用`Math.random()`和简单的取模操作`Math.abs(rnd.nextInt()) % n`来生成[0, n)范围内的随机数,可能会导致分布不均匀。正确的做法是...
可能使用了`java.util.Random`类或者线程安全的`java.util.concurrent.ThreadLocalRandom`,学习这部分可以加深对随机数生成的理解。 6. v2ch10、v2ch7、v2ch9、v2ch6、v2ch4:这些文件夹很可能包含了书中相应章节...
多线程编程在Java 7中得到了进一步的优化,除了前文提到的`ForkJoinPool`之外,还增加了`ThreadLocalRandom`类,这是一个针对多线程环境优化的随机数生成器,能够显著提高在多个线程中生成随机数的性能。 ### 异常...
在Java编程语言中,实现刮刮卡和大转盘等互动式抽奖系统是常见的应用场景,尤其是在游戏、电商促销活动等领域。这些系统的核心在于模拟随机事件并控制中奖概率。下面我们将详细探讨如何利用Java实现此类功能。 首先...
对于更高级的并发环境,Java 7引入了一个新的并发随机数生成器:java.util.concurrent.ThreadLocalRandom。这个类在内部使用了线程局部变量来避免多线程之间的竞争,相比于使用Random类,它在生成大量随机数时可以...
7. **Swing和AWT的改进**:尽管JavaFX在后续版本中成为新的UI框架,但Java 1.6对Swing和AWT的优化依然显著,包括组件的性能提升和用户体验的改善。 8. **JDBC 4.0**:Java 1.6引入了JDBC 4.0,提供了自动连接管理、...
在Java编程语言中,随机数的应用非常广泛,可以用于各种模拟、游戏开发、加密算法以及数据分析等场景。本文将深入探讨Java中生成随机数的方法、类库和实用技巧。 首先,Java提供了一个内置的`java.util.Random`类,...
Java并发工具包是Java平台中的一个关键特性,它位于`java.util.concurrent`包下,为开发者提供了高效、安全的多线程编程支持。这个工具包的设计目标是简化并发编程,提高程序的性能和可维护性,同时避免了线程同步的...
- **使用更现代的随机数生成器**:自Java 8起,引入了`java.util.concurrent.ThreadLocalRandom`类,它在多线程环境中表现更佳,可以考虑替代`java.util.Random`。 - **异常处理与边界检查**:在实际项目中,添加...
在Java编程中,自动生成ID是一项常见的需求,特别是在数据库记录、分布式系统节点标识等领域。"java自动生成id策略"指的是设计并实现一种机制,确保在多线程环境下能够高效、唯一地生成ID。这里我们将详细探讨这个...
应用中提到的“随机抽取n个单词”功能可能使用Java的Random类或ThreadLocalRandom类生成随机数,结合SQL查询,从数据库中随机选择单词。 10. **异常处理**: 在与数据库交互过程中,可能出现如连接失败、SQL执行...
10. Java 7 的Threadlocalrandom与Random 11. BigDecimal类 12. Date类 13. Calendar类 14. Timezone类 15. 创建正则表达式 16. 使用正则表达式 17. Java国际化的思路 18. Java支持的国家和语言 19. 完成程序国际化 ...
7. `Constants.java` - 存储游戏中固定的参数,如红球数量、蓝球数量、中奖规则等。 通过这个项目,学习者可以了解到以下知识点: 1. Java基础语法:变量声明、条件语句、循环结构、函数定义等。 2. 随机数生成:...
Java的`java.util.Random`类可以生成基本的随机数,而`java.util.concurrent.ThreadLocalRandom`提供线程安全的随机数生成。对于特定概率分布(如正态分布、泊松分布),可以使用Apache Commons Math库。 七、数值...