`

Benchmark Analysis: Guice vs Spring

阅读更多

The original link can be found at : http://www.javalobby.org/articles/guice-vs-spring/

 

At the weekend I managed to get some free time away from working on our next release to look at a recent benchmark that compared the performance of Google Guice 1.0 with Spring 2.5. The benchmark referred to in the article as a modified version of Crazy Bob’s “Semi Useless” Benchmark is interesting not in terms of the results but in the analysis of the results and test construction. After reviewing the findings and benchmark code I thought it would be a good idea to instrument particular parts in the code with our Probes resource metering framework to better understand some peculiarities reported and to show how I use JXInsight Probes myself.

What follows is the typical process steps I perform when investigating a probable performance issue with a code base that I am largely unfamiliar with. During this process I highlight possible pitfalls in creating a benchmark as well in the analysis of results when related to concurrency, GC and memory allocation. Very importantly is that the conclusions of this process are different from those posted on the benchmark blog

In the benchmark there are 2 basic parameters with 4 possible combinations for each technology benchmarked. The first parameter indicates whether the test is to be executed concurrently (5 threads). The second parameter indicates whether a singleton factory is used. It is important to note that in the benchmark the number of calls made by one or more threads is dependent on the value of the singleton parameter (see first line in iterate(..) method). Below are the results from my own benchmarking on a Mac PowerBook G4 for each of the four possible combinations per technology.

(C) Concurrent
(CS)Concurrent + Singleton
(S) Singleton
( ) Non-Concurrent + Non-Singleton

Bar Chart

After being surprised by the large difference in the concurrent tests with a non-singleton factory (Guice is much faster than Spring) I was intrigued by the closeness of the singleton tests across technologies especially as the Guice concurrent test run for the singleton instance was much slower than its concurrent test run for the non-singleton factory even after taking into account the increase (x10) in number of validate(...) method calls. Time for some basic instrumentation to better explain the results.

Below I have highlighted the main changes I made to the benchmark code to capture the resource consumption across the two different executing thread workers using our open Probes API.

Java Code

I ran both the Guice and Spring tests separately to ensure that neither impacted the others results. I then merged each of the probes snapshot that were automatically exported at the end of each test run. Here is the jxinsight.override.config properties file used for my first set of tests.

jxinsight.server.profiler.autostart=false
jxinsight.server.agent.events.gc.enabled=false
# new system property in 5.5.EA.16
jxinsight.server.probes.highprecision.enabled=true
jxinsight.server.probes.meter.clock.time.include=true
jxinsight.server.probes.meter.cpu.time.include=true
jxinsight.server.probes.meter.blocking.time.include=true
jxinsight.server.probes.meter.waiting.time.include=true
jxinsight.server.probes.meter.jmgmt.gc.time.include=true
jxinsight.server.probes.meter.jmgmt.gc.count.include=true

The high level metering results are shown below with times reported in microseconds and the data sorted by total. Note the general ordering of technologies changes within one particular meter.

Metering Table

The table below shows the wall clock times for each technology and for each parameter combination. What immediately struck me about the data was the change in ordering of singleton and non-singleton tests across Spring and Guice. With Guice the singleton tests were always slower than the non-singleton tests which is the complete opposite to Spring and what you would expect. I then noticed that this difference was much more prominent when executed concurrently. Spring (CS) was 13.696 seconds compared with Guice’s (CS) 19.544 seconds - approximately a 6 second difference.

When comparing across concurrent tests I used the Iterate probe group because the wall clock is a global meter and hence is multiplied by the number of concurrent (aggregating) threads firing probes.

Metering Table

Analysis of the CPU metering data also revealed the same strange difference in test ordering across technologies when the table was sorted by the Total column. Because CPU is a thread specific meter I looked at the totals for the Run probe group. Spring (CS) was 11.028 seconds compared with Guice’s (CS) 13.112 seconds - approximately 2 seconds in the difference but not 6 seconds. Could this be a clue? Well not necessarily because the CPU times for Spring (S) and Guice (S) were reversed though somewhat closer - 2.203 seconds compared to 2.075 seconds respectively. It would appear from this that Guice trades additional CPU processing with a reduction in thread monitor contention.

Metering Table

When I first looked at the following table I noticed that the difference in ordering between the non-singleton to singleton tests across technologies had disappeared - both Spring and Guice had non-singleton listed first under concurrent testing and the reversed order when executing tests with one thread.

Then I noticed the high blocking time of 9.248 seconds when executing Guice (CS) tests compared 0.054 seconds for the same test using Spring. To have such high blocking times the software needs to be executing a relatively expensive (in terms of this micro-benchmark) operation within a synchronized block that is called with a high degree of concurrency (multiple threads). But this is a singleton test what could that expensive operation be once the singleton has been created?

Metering Table

I was now extremely curious about what would be the final conclusion so I skipped the waiting time meter as this would only report on the Thread.join() method calls used to determine the end of the concurrent tests runs.

The garbage collection (GC) time meter always has a tale to tell about the software execution model and this time around it was to be no different. My first observation was that Guice placed before Spring within this particular meter. But on closer examination I could see that this was largely a result of the aggregation of GC times for singleton tests (are the alarm bells ringing). Guice appeared to be more efficient, in terms of object allocation, when executing non-singleton tests than when executing singleton tests. Surely this is not correct? It is not correct (at least not completely) and there is a clue in the Spring metering data - object allocation (how else would we have GC) always occurs even when using a singleton factory.

Metering Table

The next meter I looked at is the number of times GC occurs during each test run using the value of the total for the Iterate probe group because GC like wall clock time is a global meter and thus is duplicated (correctly) in the Run probe group totals. After looking at the GC times in the table above you might be expecting that the ordering of the technology stack to be the same but it is not! Does this tell me something? That GC is more efficient (shorter cycles) when cleaning up objects created by Spring?

Metering Table

At this stage I had enough information to formulate an educated guess for the cause of the performance degradation when executing singleton tests with Guice but there was still something not right about the data. I decided to re-run my tests but this time turning off all timing metrics and focusing on object allocation. Here are the system properties used in the jxinsight.override.config file.

jxinsight.server.profiler.autostart=false
jxinsight.server.agent.events.gc.enabled=false
jxinsight.server.agent.events.alloc.enabled=true
jxinsight.server.probes.highprecision.enabled=true
jxinsight.server.probes.meter.alloc.bytes.include=true
jxinsight.server.probes.meter.clock.time.include=false

Metering Table

The test data did confirm the first part of my guess that the underlying cause was related to an increased level of object allocation. (The second part was that this allocation occurred in a synchronized block which would explain why the difference was more pronounce in the concurrent tests.) The Guice singleton tests had not only higher values than the same tests under Spring but more significantly they were higher than the Guice non-singleton tests. But why did the Spring numbers for singleton tests look still too high? I decided to take another look at the validate(...) method and the out bound calls. This time I was looking for a method that might inadvertently create objects. I found the culprit. The JUnit assertEquals(int,int) creates two Integers before calling Object.equals(Object).

Here are the revised figures after replacing the Assert.assertEquals(int, int) method with an implementation that does not create any Integer objects. The Spring results now looked inline with what we would expect from a single factory - an object instance created for each additional concurrent thread.

Metering Table

The above figures for the Guice singleton tests had decreased by approximately 50% but there was still object allocation occurring. This called for more in-depth resource metering analysis of the Google Guice codebase.

XML

Here is the metering data collected after load-time weaving in our Probes aspect library into the runtime. I have included the Total (Inherent) column in this screen shot to point out the actual object allocation cost centers and call delegation (Hint: 6,400,000 - 1,600,000 = 4,800,000 and 4,800,00 - 2,400,000 = 2,400,000).

Metering Table

Here is the related code snippet extracted from Google Code showing the actual object allocation calls.

Java Code

分享到:
评论

相关推荐

    Benchmark::Timer-开源

    《Benchmark::Timer——Perl中的基准测试利器》 在Perl编程领域,进行性能优化是至关重要的。为了有效地评估和比较代码的执行效率,开发者通常需要一个可靠的基准测试工具。"Benchmark::Timer"就是这样一款强大的...

    salta-benchmark:Salta 和 Guice 的基准

    通过“salta-benchmark”项目,我们可以了解Salta和Guice在实际使用中的性能表现,从而为大型项目或性能敏感的应用选择更适合的DI框架。这样的比较不仅限于速度,还需要考虑框架的易用性、社区支持、文档完善度等...

    Elm-Benchmark:用于 Elm 的 Benchmark.js

    Elm 中的 Benchmark.js 这个 repo 提供了一个库,用于在 Elm 中编写基于控制台的 Benchmark.js 测试。入门在开始之前,您必须同时引入 node.js 依赖项(jsdom 和 benchmark): $ cabal install$ npm install ...

    废水处理模型The COST Simulation Benchmark:Description and Simulator M

    The COST Simulation Benchmark:Description and Simulator M。 MATLAB建模。本出版物的重点是COST“模拟基准”,它是作为一个由两个成本行动促成的合作的直接结果。成本行动682 '综合废水管理'(1992-1998)侧重于...

    benchmark-trend:衡量Ruby代码的性能趋势

    Benchmark :: Trend可通过在尺寸增大的输入上运行Ruby代码,测量其执行时间,然后将这些观察值拟合到一个模型中,以最佳地预测给定的Ruby代码如何根据函数扩展来估算Ruby代码的计算复杂度。工作量不断增加。为什么...

    Go-kube-bench用于检查Kubernetes是否安全部署的Go应用

    这个应用遵循了CIS (Center for Internet Security) Kubernetes 1.6 Benchmark v1.0.0的标准,该标准提供了一套最佳实践,以帮助用户确保他们的Kubernetes环境在部署时符合安全规范。 首先,我们要理解Kubernetes的...

    rspec-benchmark:RSpec的性能测试匹配器

    RSpec :: Benchmark由: 用于测量执行时间和每秒迭代次数的。 用于估计计算复杂度的。 用于测量对象和内存分配。 为什么? 集成和单元测试可确保更改的代码保持预期的功能。 无法保证的是,代码更改对库性能的...

    benchmarksql:类似于TPC-C的测试工具

    BenchmarkSQL是类似于TPC-C测试工具。 TPC-C TPC-C是由定义的OLTP基准。 它由与10个外键关系相关的9个表组成。 除了“项目”表外,所有事物都通过基数按数据库的初始加载期间生成的仓库数量( W )进行缩放。 5个...

    Benchmark.rar_benchmark_benchmark三代_benchmark模型_三代benchmark_主动控制

    总结来说,"Benchmark.rar_benchmark_benchmark三代_benchmark模型_三代benchmark_主动控制"是一个专注于主动控制算法评估的资源包,它提供了一个先进的三代Benchmark模型,用于比较和优化控制策略的性能。...

    ws-benchmark:用于websocket的CLI工具,例如用于http的apache bench

    安装yarn global add ws-benchmark 用法运行ws-benchmark "ws://localhost:8080" -c 10 -n 2000支持的协议ws: ws-benchmark "ws://localhost:8080" -c 10 -n 2000 wss: ws-benchmark "wss://localhost:8080" -c 10...

    Benchmark:基准测试PHP代码的框架

    " nicmart/benchmark " : " dev-master " } } 然后,您可以运行以下两个命令来安装它: $ curl -s http://getcomposer.org/installer | php $ php composer.phar install 或者如果已经则只需运行composer ...

    sidekiq-benchmark:向Sidekiq工作者添加基准化方法,保留指标,并在Web UI中添加选项卡以供您浏览

    安装将此行添加到应用程序的Gemfile中: gem 'sidekiq-benchmark'然后执行: $ bundle要求从0.5.0版开始,可与Sidekiq 4.2或更高版本一起使用用法class SampleWorker include Sidekiq :: Worker include Sidekiq :: ...

    pagespeed-benchmark:一种工具,可以对多个网址多次运行lighthouse pagespeed测试,并显示统计信息

    灯塔基准 Lighthouse基准测试是在多...npm i -g pagespeed-benchmark pagespeed-benchmark https://www.github.com/ -n5 输出 这是输出看起来像的一个例子 > node cli https://example.com/ -n 10 Number of requests p

    Benchmark Catalog:基准目录-开源

    基准目录

    springboot-dsl-benchmark:Springboot bean的基准测试和带有DSL与注释的路由定义

    《SpringBoot DSL Benchmark:深入解析Kotlin在Spring Boot中的应用》 在当今的软件开发领域,Spring Boot以其高效、简洁的特性受到了广泛欢迎。而Kotlin作为一种现代化的编程语言,其简洁的语法和强大的功能使其...

    Fritz Chess Benchmark4.3.2完全汉化版

    《Fritz Chess Benchmark4.3.2:深度解析与应用指南》 Fritz Chess Benchmark4.3.2是一款广受欢迎的国际象棋基准测试软件,它由德国的ChessBase公司开发,旨在衡量计算机处理器在执行复杂棋局模拟时的性能。这款...

    benchmark:[GPU.js]的基准化工具(https

    Benchmark是用于GPU.js的简单基准测试工具。 该工具在JavaScript和CLI中均可使用。 该工具运行三个基准测试: 目录 与React Native一起使用 安装 注意:软件包gpu.js需要单独安装。 基准可在npm上以@gpujs/...

    AS_SSD_Benchmark 1.6.4194 绿色版解压即用

    AS SSD Benchmark是一款专业的固态硬盘性能测试工具,由德国InnoDisk公司开发。这款软件的主要目的是为用户提供一个简便的方法来评估他们的SSD(固态硬盘)或HDD(机械硬盘)的读写速度,以及I/O操作性能。1.6.4194...

    Fritz-Chess-Benchmark42.7z

    《Fritz Chess Benchmark42:国际象棋性能测试工具详解》 Fritz Chess Benchmark42是一款专门用于评估计算机处理器在处理复杂国际象棋算法时性能的软件。它源于著名的国际象棋程序Fritz,该程序在国际象棋界有着...

    join-order-benchmark:加入订单基准(JOB)

    《JOIN-ORDER-BENCHMARK:深度解析SQL数据库性能优化的关键因素》 在数据库管理领域,SQL查询性能优化是一项至关重要的任务,而"JOIN-ORDER-BENCHMARK"(JOIN订单基准,简称JOB)则是衡量这一性能的重要工具。它...

Global site tag (gtag.js) - Google Analytics