Large table update performance

How to update each row at large table faster

Posted on 2017-01-12 15:06:07

Once I faced a situation, when I had to update each of 400_000 rows in the table with its individual new value. The most trivial, straightforward way is to loop over the collection of records and update each of them, like:

User.all.each do |user|
  user.update(email: new_email)
end

If we benchmark this code with our database, we get the following results:

<Benchmark::Tms:0x007f9cceb7dd70 @cstime=0.0, @cutime=0.0, @label="", @real=1515.4289010000066, @stime=81.41, @total=494.72, @utime=413.31>

Quite long, yeah. Then I decided to wrap all updates in single transaction, to decrease overhead of running new transaction on each update:

ActiveRecord::Base.transaction do
  User.all.each do |user|
    user.update(email: new_email)
  end
end

After that I’ve received the following benchmark:

<Benchmark::Tms:0x007f821d3d1160 @cstime=0.0, @cutime=0.0, @label="", @real=1315.2501749999938, @stime=84.15, @total=449.27, @utime=365.11999999999995>

A bit faster, but still slow :(

Finally I decided to get rid of active model and run a plain SQL record:

ActiveRecord::Base.transaction do
  User.all.each do |user|
    sql = ActiveRecord::Base.send(:sanitize_sql_array, ['UPDATE users SET email=? WHERE email=?', new_email, user.email])
    ActiveRecord::Base.connection.execute(sql)
  end
end

And this approach brought us much better performance improve, check it out:

<Benchmark::Tms:0x007ff05d6a1878 @cstime=0.0, @cutime=0.0, @label="", @real=668.5155260000029, @stime=45.46, @total=215.62, @utime=170.16>

Summary: sometimes a couple of basic operations can improve your code performance a lot. Don’t afraid to benchmark your code and remember that any abstraction level, like ActiveModel, ActiveRecord does not bring you any performance improves. Try to get balance between application performance and your productivity.