We’re pretty obsessed with performance at Gilt Groupe
. You can get a taste
for what we’re dealing with, and how we’re dealing with it, from our
recent presentation at RailsConf
.
One of the techniques we’re using is to precompute what certain
high-volume pages will look like at a given time in the future, and
store the result as static HTML that we serve to the actual users at
that time. For ease of initial development, and because there’s still
a fair bit of business logic involved in determining which
version
of a particular page to serve, this was done inside our
normal controller actions which look for a static file to serve,
before falling back to generating it dynamically.
We’re now running on Rails 2.3 and, of course, Rails Metal is the
new hotness in 2.3. I spent the last couple days looking into how
much improvement in static file serving we would see by moving it
into the Metal layer. Based on most of what I’ve read, I expected
we might shave off a couple milliseconds. This expectation turned
out to be dramatically wrong.
Metal components operate outside the realm of the usual Rails
timing and logging components, so you don’t get any internal
measurements of page performance. Instead, I fired up ab
to measure
the serving times externally. What I found for the page I was
benchmarking was that the Metal implementation took about 5ms. The
old controller action took 170ms. But, wait... the Rails logs were
only reporting 8ms for that action. Something was fishy.
I started inserting timers at various places in the Rails stack,
trying to figure out where the other 160ms was going. A little bit
was routing logic and other miscellaneous overhead, but even setting
a timer around the very entry points into the Rails request serving
path, I was only seeing 15ms being spent. This was getting really
puzzling, because at this point where a Rack response is returned to
the web server, I expected things to look identical between Metal and
ActionController. However, looking more closely at the response
objects I discovered the critical difference. The Metal response
returns an [String]
, while the controller returned an
ActionController::Response.
I went into the Rails source and found the each
method
for ActionController::Response. Here it is:
def each(&callback)
if @body.respond_to?(:call)
@writer = lambda { |x| callback.call(x) }
@body.call(self, self)
elsif @body.is_a?(String)
@body.each_line(&callback)
else
@body.each(&callback)
end
@writer = callback
@block.call(self) if @block
end
The critical line is the case where the body is a String. The code
iterates over each line in the response. Each line is written
individually to the network socket. In the case of the particular
page I was looking at, that was 1300 writes. Ouch.
To confirm this was the problem, I changed that line to
yield @body
With the whole body being sent in a single write, ab reported 15ms
per request, right in line with what I measured inside Rails.
1 line changed. 150ms gained. Not too bad.
This sort of performance pessimization we uncovered is particularly insidious
because it’s completely invisible to all the usual Rails
monitoring tools. It doesn’t show up in your logged response time;
you won’t see it in NewRelic or TuneUp. The only way you’re going
to find out about it is by running an external benchmarking tool.
Of course, this is always a good idea, but it’s easy to forget to do
it, because the tools that work inside the Rails ecosystem are so
nice. But the lesson here is, if you’re working on performance
optimizations, make sure to always get a second opinion.
相关推荐
Speeding up packet IO in virtual machines Speeding up packet IO in virtual machines
Speeding Up Multi-Relational Data Mining
Speeding up MATLAB Applications 加速 MATLAB 应用程序.pdf
藏经阁-Speeding up Spark with Data Co.pdf
Speeding up packet IO in virtual machines,针对KVM 虚机场景下,提高IO的方法
藏经阁-Speeding up Spark with Data Compression on Xeon+FPGA.pdf
这是一篇介绍网站优化最佳实践的文章。文章为Yahoo发布在网上的,可以在其网站上找到。个人觉得,这里面提供的一些建议,规则,都有很实践性。值得每个Web设计人员参考。 ... 我就是把它压成了PDF,方便随时观看复习。
火山ML通过可扩展的搜索空间分解加速端到端AutoML_VolcanoML Speeding up End-to-End AutoML via Scalable Search Space Decomposition.pdf
Yahoo!的Exceptional Performance团队为改善Web性能带来最佳实践
这是fast上的有关重删的文章..我已经作了相应的备注!
Speeding up NetworkingVan Jacobson van@packetdesign.comBob Felderman feldy@precisionio.comPrecision I/OLinux.conf.au 2006 Dunedin, NZmailto:van@packetdesign.commailto:van@packetdesign.commailto:feldy...
It describes the use of GPU, MEX, FPGA, and other forms of compiled code, as well as techniques for speeding up deployed applications. It details specific tips for MATLAB GUI, graphics, and I/O. It ...
It describes the use of GPU, MEX, FPGA, and other forms of compiled code, as well as techniques for speeding up deployed applications. It details specific tips for MATLAB GUI, graphics, and I/O. It ...
plugin for speeding up wordpress site
You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis ...
利用高性能GPU处理SAR回波,重现图像的实现研究
Chapter 1 Continuous Integration: Speeding Up Your Development Pipeline Chapter 2 Continuous Delivery: A Perfect Fit For Docker Principles Chapter 3 Network Simulation: Realistic Environment Testing ...
In 2009, Wu proposed a fast modular exponentiation algorithm and claimed that the proposed algorithm on average saved about 38.9% and 26.68% of single-precision multiplications as compared to Dussé–...
The program creates a feature selection and a rejection criterion by using power values of features. References: [1] Sun CT, Jang JSR (1993). A neuro-fuzzy classifier and its applications. Proc. of...
H.9 Speeding Up Integer Multiplication and Division H-45 H.10 Putting It All Together H-58 H.11 Fallacies and Pitfalls H-62 H.12 Historical Perspective and References H-63 Exercises H-69