Calling Ruby Methods in C: Avoid Memory Leaks
January 24, 2023 — Ulysse
Memory leaks are a pain for gem users. They are hard to track and can lead to expensive infrastructure costs.
Memory leaks within a C extension are even worse. You’ll see a lot of tools and articles about finding leaks in Ruby. However, you don’t have the same access to internals in C.
A naive usage of rb_funcall
can cause memory leaks: it’s much better to use rb_protect
instead. So, if you are a C extension
writer, please read on for the sake of developers who will use your gem.
Let’s get started!
The Issue with rb_funcall
and C
rb_funcall
can be a great tool when you need to interact between Ruby and the C parts
of your library but only need to write a little C.
However, when you run rb_funcall
, you are no longer in C where everything is
straightforward. You can be left in muddy waters if the called function:
- Completely changes its definition during runtime
- Raises a call
Number 1 is the easiest one to catch. You’ll likely end up with a segfault, and if your test suite is complete enough, you should catch that before publishing.
However, the latter can cause memory leaks and make your codebase way harder to read. Let’s take a look at that now.
Raise in Ruby Causing C Memory Leaks
Ruby’s raising mechanism jumps between parts of the code from one scope to the
first parent that catches an error. This is implemented in the MRI using longjmp
and setjmp
.
If you are interested in how this is built, read the
Evaluator chapter in the Ruby Hacking Guide. In a nutshell, when
you use a begin..ensure
block, you setjmp()
, and when you raise within
this block, you longjmp()
to the saved position.
So if a function is raised with rb_funcall
, the C code called after it
never executes.
The example below illustrates a potential leak. If json_parse
raises, it will
leak.
VALUE rb_create_geometry_hash(VALUE self, VALUE wkt) {
// Alloc
GEOSWKTReader* reader = GEOSWKTReader_create();
GEOSGeoJSONWriter* writer = GEOSGeoJSONWriter_create();
// C processing
GEOSGeometry* geom = GEOSWKTReader_read(reader, StringValuePtr(wkt));
char* geojson = GEOSGeoJSONWriter_writeGeometry(writer, geom, -1);
// Ruby processing
VALUE rb_geojson = rb_str_new_cstr(geojson);
VALUE result = rb_funcall(self, rb_intern("json_parse"), 1, rb_geojson);
// Free
GEOSWKTReader_destroy(reader);
GEOSGeom_destroy(geom);
GEOSGeoJSONWriter_destroy(writer);
GEOSFree(geojson);
return result;
}
Of course, the example above is a bit silly - you could invert the freeing and Ruby processing parts. However, this is not always possible, and longer function bodies can become more intertwined.
Using begin..ensure
in Ruby
If you’re using Ruby, you could instead write the above example using
begin..ensure
:
def create_geometry_hash(wkt)
reader = GEOSWKTReader.new
writer = GEOSGeoJSONWriter.new
begin
json_parse(writer.write(reader.read(wkt)))
ensure
reader.close
writer.close
end
end
This API is also available in C with rb_rescue
and rb_ensure
:
static VALUE try_ruby_processing(VALUE args) {
char* geojson = (char*)args;
// Ruby processing
VALUE rb_geojson = rb_str_new_cstr(geojson);
VALUE result = rb_funcall(self, rb_intern("json_parse"), 1, rb_geojson);
}
struct to_free {
GEOSWKTReader* reader;
GEOSGeoJSONWriter* writer;
GEOSGeometry* geom;
char* geojson;
};
static VALUE ensure_free(VALUE args) {
struct to_free data = (struct to_free)args
GEOSWKTReader_destroy(data.reader);
GEOSGeom_destroy(data.geom);
GEOSGeoJSONWriter_destroy(data.writer);
GEOSFree(data.geojson);
}
VALUE rb_create_geometry_hash(VALUE self, VALUE wkt) {
// Alloc
GEOSWKTReader* reader = GEOSWKTReader_create();
GEOSGeoJSONWriter* writer = GEOSGeoJSONWriter_create();
// C processing
GEOSGeometry* geom = GEOSWKTReader_read(reader, StringValuePtr(wkt));
char* geojson = GEOSGeoJSONWriter_writeGeometry(writer, geom, -1);
return rb_ensure(
try_ruby_processing, (VALUE)geojson
ensure_free, (struct to_free){ reader, writer, geom, geojson }
);
return result;
}
However, this is a bit cumbersome, and if you want to add a rescue
block to
the party, it gets way less readable. I suggest reading Peter Zhu’s ‘A Rubyist’s Walk Along the C-side (Part 8): Exceptions & Error Handling’
if you want to use the begin..rescue..ensure..end
API in C.
Using rb_protect
for C
There is another option. First, let’s see how it could look in Ruby:
def create_geometry_hash(wkt)
reader = GEOSWKTReader.new
writer = GEOSGeoJSONWriter.new
err = nil
result = nil
begin
result = json_parse(writer.write(reader.read(wkt)))
rescue => e
err = e
end
reader.close
writer.close
raise err if err
result
end
This looks strange in Ruby, but is a workflow very well
suited to C. The MRI has an API for that, rb_protect
, and the C function looks like this:
VALUE ruby_call(VALUE rb_geojson) {
return rb_funcall(self, rb_intern("json_parse"), 1, rb_geojson);
}
VALUE rb_create_geometry_hash(VALUE self, VALUE wkt) {
int state;
// Alloc
GEOSWKTReader* reader = GEOSWKTReader_create();
GEOSGeoJSONWriter* writer = GEOSGeoJSONWriter_create();
// C processing
GEOSGeometry* geom = GEOSWKTReader_read(reader, StringValuePtr(wkt));
char* geojson = GEOSGeoJSONWriter_writeGeometry(writer, geom, -1);
// Ruby processing
VALUE rb_geojson = rb_str_new_cstr(geojson);
rb_protect(ruby_call, rb_geojson, &state);
// Free
GEOSWKTReader_destroy(reader);
GEOSGeom_destroy(geom);
GEOSGeoJSONWriter_destroy(writer);
GEOSFree(geojson);
if (state) rb_jump_tag(state);
return result;
}
The above method will re-raise a Ruby error after having freed everything.
Note that we could also choose to ignore the error by using an
empty rescue
block in Ruby:
...
if (state) rb_set_errinfo(Qnil);
return result; // => nil
}
:warning: If you do not raise the error, the rb_set_errinfo(Qnil)
step is
important so you don’t keep information available about an error that users should not know
about.
Or, you can conditionally choose to raise an error, like rescue My::Error
:
...
if (state) {
if (rb_obj_is_kind_of(rb_errinfo(), rb_define_class_under(rb_mMy, "Error", rb_eStandardError))) {
rb_jump_tag(state);
} else {
rb_set_errinfo(Qnil);
}
}
return result;
}
You can actually consider rb_errinfo()
as the same as the $!
global
variable.
This is all great, but when it boils down to one rb_funcall
only, we can simplify that API.
The overall idea behind using the rb_protect
API when there is a function
to raise is to enhance readability. You don’t need to check if the
function can raise or not, you assume it can, and use the state to work with
that.
The rb_protect_funcall
Proposal
Let’s isolate rb_funcall
, as it’s the only dangerous method to use. Here’s an API that will do that:
VALUE rb_protect_funcall(VALUE recv, ID mid, int* state, int n, ...);
This API is the same as rb_funcall
, with a state
from rb_protect
. Hence the
usage is pretty straightforward:
VALUE rb_create_geometry_hash(VALUE self, VALUE wkt) {
int state;
// Alloc
GEOSWKTReader* reader = GEOSWKTReader_create();
GEOSGeoJSONWriter* writer = GEOSGeoJSONWriter_create();
// C processing
GEOSGeometry* geom = GEOSWKTReader_read(reader, StringValuePtr(wkt));
char* geojson = GEOSGeoJSONWriter_writeGeometry(writer, geom, -1);
// Ruby processing
VALUE rb_geojson = rb_str_new_cstr(geojson);
rb_protect_funcall(self, rb_intern("json_parse"), &state, 1, rb_geojson);
// Free
GEOSWKTReader_destroy(reader);
GEOSGeom_destroy(geom);
GEOSGeoJSONWriter_destroy(writer);
GEOSFree(geojson);
if (state) rb_jump_tag(state);
return result;
}
This API is not yet available in Ruby, and may never be. You can take it from RGeo (MIT LICENSE).
A Real-World Example
If you want to see a real-world example, I encourage you to read the RGeo
codebase as we recently switched to going full rb_protect
. We even
have some functions, such as rgeo_convert_to_geos_geometry
, that propagate
this state for simpler usage. This function is a good place to start digging
around.
Feel free to open an issue on RGeo to discuss the choices we made further.
Wrapping Up
In this post, we warned against using rb_funcall
with C as it can cause memory leaks. We explored using begin..ensure
or rb_protect
instead.
Happy coding!