We’re pretty obsessed with performance at Gilt Groupe
. You can get a taste
for what we’re dealing with, and how we’re dealing with it, from our
recent presentation at RailsConf
.
One of the techniques we’re using is to precompute what certain
high-volume pages will look like at a given time in the future, and
store the result as static HTML that we serve to the actual users at
that time. For ease of initial development, and because there’s still
a fair bit of business logic involved in determining which
version
of a particular page to serve, this was done inside our
normal controller actions which look for a static file to serve,
before falling back to generating it dynamically.
We’re now running on Rails 2.3 and, of course, Rails Metal is the
new hotness in 2.3. I spent the last couple days looking into how
much improvement in static file serving we would see by moving it
into the Metal layer. Based on most of what I’ve read, I expected
we might shave off a couple milliseconds. This expectation turned
out to be dramatically wrong.
Metal components operate outside the realm of the usual Rails
timing and logging components, so you don’t get any internal
measurements of page performance. Instead, I fired up ab
to measure
the serving times externally. What I found for the page I was
benchmarking was that the Metal implementation took about 5ms. The
old controller action took 170ms. But, wait... the Rails logs were
only reporting 8ms for that action. Something was fishy.
I started inserting timers at various places in the Rails stack,
trying to figure out where the other 160ms was going. A little bit
was routing logic and other miscellaneous overhead, but even setting
a timer around the very entry points into the Rails request serving
path, I was only seeing 15ms being spent. This was getting really
puzzling, because at this point where a Rack response is returned to
the web server, I expected things to look identical between Metal and
ActionController. However, looking more closely at the response
objects I discovered the critical difference. The Metal response
returns an [String]
, while the controller returned an
ActionController::Response.
I went into the Rails source and found the each
method
for ActionController::Response. Here it is:
def each(&callback)
if @body.respond_to?(:call)
@writer = lambda { |x| callback.call(x) }
@body.call(self, self)
elsif @body.is_a?(String)
@body.each_line(&callback)
else
@body.each(&callback)
end
@writer = callback
@block.call(self) if @block
end
The critical line is the case where the body is a String. The code
iterates over each line in the response. Each line is written
individually to the network socket. In the case of the particular
page I was looking at, that was 1300 writes. Ouch.
To confirm this was the problem, I changed that line to
yield @body
With the whole body being sent in a single write, ab reported 15ms
per request, right in line with what I measured inside Rails.
1 line changed. 150ms gained. Not too bad.
This sort of performance pessimization we uncovered is particularly insidious
because it’s completely invisible to all the usual Rails
monitoring tools. It doesn’t show up in your logged response time;
you won’t see it in NewRelic or TuneUp. The only way you’re going
to find out about it is by running an external benchmarking tool.
Of course, this is always a good idea, but it’s easy to forget to do
it, because the tools that work inside the Rails ecosystem are so
nice. But the lesson here is, if you’re working on performance
optimizations, make sure to always get a second opinion.
相关推荐
论文《Speeding Up Multi-Relational Data Mining》由Anna Atramentov和Vasant Honavar撰写,主要关注如何加速多关系数据挖掘算法的运行时间,同时保持结果质量不变。研究者提出了一种通用的方法,该方法通过优化...
### Speeding up MATLAB Applications #### 引言 MATLAB 是一种广泛应用于科学计算、数据分析以及算法开发的强大工具。为了提高 MATLAB 应用程序的运行效率,本文档将介绍一系列优化技巧,包括利用向量和矩阵操作...
这是一篇介绍网站优化最佳实践的文章。文章为Yahoo发布在网上的,可以在其网站上找到。个人觉得,这里面提供的一些建议,规则,都有很实践性。值得每个Web设计人员参考。 ... 我就是把它压成了PDF,方便随时观看复习。
### 网站优化资料:Best Practices for Speeding Up Your Web Site #### 一、引言 随着互联网的发展,用户体验成为衡量网站成功与否的重要因素之一。网页加载速度直接影响着用户体验和搜索引擎排名。为了帮助...
研究中提出的一个重要结果是,通过仅仅大约100行代码的修改,就可以改进Linux和FreeBSD版本的e1000设备驱动程序,实现的原型实现使用了传统的e1000设备和基于套接字的发送/接收器,实现了超过1Mpps(百万封包每秒)...
1. 大数据处理挑战 随着数据量的增长,分布式处理变得越来越重要。数据压缩可以减少数据量,优化应用程序性能。根据 Forbes 的报告,60% 的组织使用数据压缩。 2. FPGA 加速 FPGA(Field-Programmable Gate Array...
* 该解决方案可以实现缓存一致的界面,支持“in-line”数据传输 * Accelerator Function Unit (AFU) 是一种可重构的区域,用户可以根据需要编程 四、挑战 * FPGA 的编程模型具有挑战性,需要硬件相关的知识 * FPGA...
Speeding up NetworkingVan Jacobson van@packetdesign.comBob Felderman feldy@precisionio.comPrecision I/OLinux.conf.au 2006 Dunedin, NZmailto:van@packetdesign.commailto:van@packetdesign.commailto:feldy...
【VolcanoML:端到端AutoML的可扩展搜索空间分解加速】 AutoML(自动化机器学习)已经成为学术界和工业界的焦点,它自动在特征工程、算法/模型选择和超参数调优诱导的空间中搜索最优的机器学习管道。...
It describes the use of GPU, MEX, FPGA, and other forms of compiled code, as well as techniques for speeding up deployed applications. It details specific tips for MATLAB GUI, graphics, and I/O. It ...
It describes the use of GPU, MEX, FPGA, and other forms of compiled code, as well as techniques for speeding up deployed applications. It details specific tips for MATLAB GUI, graphics, and I/O. It ...
You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis ...
1. **设计并行算法**:针对CS技术中的关键步骤,如稀疏表示、随机测量矩阵生成及重构算法等,设计适合GPU并行架构的算法实现。 2. **优化内存访问模式**:由于GPU内存带宽有限,优化数据布局和内存访问模式对于提高...
Chapter 1 Continuous Integration: Speeding Up Your Development Pipeline Chapter 2 Continuous Delivery: A Perfect Fit For Docker Principles Chapter 3 Network Simulation: Realistic Environment Testing ...
In 2009, Wu proposed a fast modular exponentiation algorithm and claimed that the proposed algorithm on average saved about 38.9% and 26.68% of single-precision multiplications as compared to Dussé–...
The program creates a feature selection and a rejection criterion by using power values of features. References: [1] Sun CT, Jang JSR (1993). A neuro-fuzzy classifier and its applications. Proc. of...
DevOps小文,共13页用词严谨,视角独到,同时... Yet, based on the number of software failures now making headlines on a daily basis, it's evident that speeding up the SDLC opens the door to severe repercu
线条简化 使用 Douglas-Peucker 算法的线简化算法。 有关更多信息,请访问维基百科。 该模块分别包含通过DouglasPeucker2D和...std::list<p2d> line; // Contains coordinate. DouglasPuecker2D<p2d, p2dAcce