3

Cache and reuse generated attribute methods by casperisfine · Pull Request #4209...

 2 years ago
source link: https://github.com/rails/rails/pull/42095
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Context

By experimenting with #42085, I was able to better profile our application, which made me realize that define_attribute_methods was responsible for about 140MB of total resident memory as well as for ~15% of boot time in our app. Of course you mileage may vary based on the size of the app.

The main bulk of the extra memory is from CodeGenerator:

retained memory by file
-----------------------------------
...
 125.79 MB  rails-5e55ac69d843/activemodel/lib/active_model/attribute_methods.rb

A quick digging session shows that we have 238k attribute methods, but only 50k (21%) of them are unique:

>> attr_methods = ActiveRecord::Base.descendants.flat_map { |m| m.send(:generated_attribute_methods).public_instance_methods(false) }; nil
=> nil
>> attr_methods.size
=> 238818
>> attr_methods.uniq.size
=> 50707

So if we were to be able to re-use them, we could theoretically cut that resident memory by ~80%, so cut our total memory usage by ~88MB (on the totally wrong assumption that sharing a method has no overhead, but it's just for napkin math needs).

Looking at the detail of the duplicated methods, it's pretty clear what going on:

>> attr_methods.group_by(&:itself).transform_values(&:size).sort_by(&:last).reverse.take(20)
=> [[:clear_id_change, 875],
 [:id_previously_was, 875],
 [:id_previous_change, 875],
 [:id_previously_changed?, 875],
 [:id_changed?, 875],
 [:id_change, 875],
 [:id_will_change!, 875],
 [:saved_change_to_id?, 875],
 [:id_change_to_be_saved, 875],
 [:will_save_change_to_id?, 875],
 [:id_before_last_save, 875],
 [:saved_change_to_id, 875],
 [:id_came_from_user?, 875],
 [:restore_id!, 875],
 [:created_at_was, 859],
 [:created_at_will_change!, 859],
 [:created_at, 859],
 [:created_at_change, 859],
 [:will_save_change_to_created_at?, 859],
 [:created_at_change_to_be_saved, 859]]

Just from that you can easily infer that we have 875 ActiveRecord::Base descendants, and that the vast majority of them have a created_at and updated_at column. Then we have a long tail of other popular column names across models, some fairly specific to our usage, such as shop_id (817), but many that shouldn't be too uncommon in most apps such as type (250), deleted_at (171), name (141), title (72), etc.

Overall I think it's fair to assume attribute names are not random, and that it's common for an attribute name to be shared across multiple models.

micro benchmarks

Based on the above analysis I wondered wether we could use define_method(name, UnboundMethod) to share the MRI InstructionSequence between these similar methods, and wether it's effective:

# frozen_string_literal: true
require 'heap-profiler'

class GeneratedAttributeMethods < Module
end

cache = GeneratedAttributeMethods.new

CACHE = {}
50_000.times do |i|
  meth_name = "foo_#{i}"
  cache.module_eval("def #{meth_name}; 42; end")
  CACHE[meth_name] = cache.instance_method(meth_name)
end

mods = 5.times.map { GeneratedAttributeMethods.new }

$names = 25.times.map { |i| "foo_#{i}" }

if ARGV.first == 'cache'
  def def_methods(mod)
    $names.each do |name|
      mod.define_method(name, CACHE[name])
    end
  end
else
  def def_methods(mod)
    mod.module_eval($names.map { |m| "def #{m}; 42; end" }.join(";"))
  end
end

HeapProfiler.report("/tmp/heap-prof-#{ARGV.first || 'orig'}") do
  mods.each { |m| def_methods(m) }
end
$ heap-profiler /tmp/heap-prof-orig/ | grep 'Total retained'
Total retained: 59.67 kB (391 objects)
$ heap-profiler /tmp/heap-prof-cache/ | grep 'Total retained'
Total retained: 6.44 kB (136 objects)

Answer, looks like yes.

This patch

So once the above is laid out, I believe the change is quite straightforward. Rather than directly generating the method code in each generated_attribute_methods modules, we first define them in a central "cache" module, and then copy them from the cache into the final module.

Actual effect

I tried this patch against our app, the retained memory reduction was about 80MB, so not far off from the napkin math.

retained memory by file
-----------------------------------
...
  46.35 MB  rails-88a98efb114b/activemodel/lib/active_model/attribute_methods.rb

Backward compatibility

Unfortunately at this stage there's one backward incompatibility issue that might break gems using the define_method_attribute* API, such as my own frozen_record: https://github.com/byroot/frozen_record/blob/42e41eb60d17df6a692e2e18e2ea7af6a28265b9/lib/frozen_record/compact.rb#L29-L31

This whole caching strategy assumes that two attribute methods with the same name will have the same source code. The ability for subclasses to define define_method_attribute* breaks this assumption.

So I need to find a way to handle this.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK