Active Record 提供 find_each来分批处理大量数据. 但是,当数据量在百万级别或是更多的时候find_each也会变得很慢。
采用像Resque这样的异步处理插件是一个不错的选择:
User.find_each {|user| Resque.enqueue(MyJob, user) }
但是用Resque有时候又有点杀鸡用牛刀的嫌疑,使用forking!
if GC.respond_to?(:copy_on_write_friendly=)
GC.copy_on_write_friendly = true
end
jobs_per_process = 100
process_count = 10
User.find_in_batches(:batch_size => jobs_per_process * process_count) do |group|
batches = group.in_groups(process_count)
batches.each do |batch|
Process.fork do
ActiveRecord::Base.establish_connection
# Do the actual work
batch.each {|user| .. }
end
end
Process.waitall
end
上面的代码一次性从数据库获取1000条记录,然后fork 10个进程,每个进并行处理100条记录,这样比串行处理1000条记录快多了。
如果考虑到并行处理会耗费额外的内存,使用REE是一个不错的选择.
原文地址
分享到:
相关推荐
google batch processing in a neural network processor
Summary Natural Language Processing in Action is your guide to creating machines that understand human language using the power of ... Scaling Up (Optimization, Parallelization, And Batch Processing)
Work with all aspects of batch processing in a modern Java environment using a selection of Spring frameworks. This book provides up-to-date examples using the latest configuration techniques based on...
DESCRIPTION Even though running batch processes is an everyday task in almost all IT departments, Java developers have had few options for writing batch applications. The result? No standards, poor ...
gs-batch-processing-master2 guide and quick start
Apache Hadoop™ YARN Moving beyond MapReduce and Batch Processing with Apache Hadoop
Apache Hadoop TM YARN Moving beyond MapReduce and Batch Processing with Apache HadoopTM 2 全本,不是sample。
ENVI_Raster_Processing_Batch_Tools_V5.3_18_S1.zip
Spring Batch 示例
including transactions and job state/restartability * How to scale batch jobs via distributed batch processing * How to handle testing batch processes (Unit and functional) Who this book is for * ...
Based on spark batch processing big data platform(基于spark的.zip
Spring Batch in Action
Spring batch in action,很好的一本书。
Spring Batch In Action
robust table registration method for batch table ocr processing
适用于带有AWS Batch的MATLAB Parallel Server的Parallel Computing Toolbox插件的安装程序文件。 这些示例文件使用通用调度程序界面使用户能够使用AWS Batch将作业提交到MATLAB Parallel Server。 一旦安装,您将...
Manning.Spring.Batch.in.Action.Oct.2011 -- 英文
:fire: 快速批量删除Active Record和Postgres 安装 将此行添加到您的应用程序的Gemfile中: gem 'delete_in_batches' 如何使用 批量删除行 Tweet . where ( user_id : 1 ) . delete_in_batches 重要提示:在生产...