Skip to content

Ruby: Only add RepeatedField instance to cache once fully initialized#13054

Closed
casperisfine wants to merge 1 commit intoprotocolbuffers:mainfrom
casperisfine:fix-uninitialized-repeated-fields-in-obj-cache
Closed

Ruby: Only add RepeatedField instance to cache once fully initialized#13054
casperisfine wants to merge 1 commit intoprotocolbuffers:mainfrom
casperisfine:fix-uninitialized-repeated-fields-in-obj-cache

Conversation

@casperisfine
Copy link
Copy Markdown
Contributor

Fix: #11968

Since calling a method on WeakMap allow the RubyVM to switch threads, adding a non-fully initialized object into the cache allow another thread from using it.

Note: with this fix, the other two thread may create their own wrapper for the same underlying Array. We don't know if it's a big deal or not.

Co-Authored-By: @peterzhu2118

cc @haberman

Fix: protocolbuffers#11968

Since calling a method on WeakMap allow the RubyVM to switch threads,
adding a non-fully initialized object into the cache allow another thread
from using it.

Note: with this fix, the other two thread may create their own wrapper
for the same underlying Array. We don't know if it's a big deal or not.

Co-Authored-By: Peter Zhu <[email protected]>
@casperisfine casperisfine requested a review from a team as a code owner June 14, 2023 14:09
@casperisfine casperisfine requested review from JasonLunn and removed request for a team June 14, 2023 14:09
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 14, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@casperisfine
Copy link
Copy Markdown
Contributor Author

Ah damn it. Crediting @peterzhu2118 as co-author is making the CLA bot nervous -_-.

Peter could you sign the CLA?

@peterzhu2118
Copy link
Copy Markdown
Contributor

I signed the CLA.

@fowles fowles requested review from fowles and removed request for JasonLunn June 14, 2023 15:17
@haberman
Copy link
Copy Markdown
Member

Thanks so much for debugging and fixing this.

This pattern also occurs in the following place, which will also need the same change:

ObjectCache_Add(map, val);
TypedData_Get_Struct(val, Map, &Map_type, self);
self->map = map;
self->arena = arena;
self->key_type = key_type;
self->value_type_info = value_type;
if (self->value_type_info.type == kUpb_CType_Message) {
const upb_MessageDef* val_m = self->value_type_info.def.msgdef;
self->value_type_class = Descriptor_DefToClass(val_m);
}

I wish there was a way to atomically perform a "get or insert" operation on the WeakMap, so we could avoid the duplicate wrapper situation. Duplicate wrappers are mostly benign, but they are observable by the user, eg. with BasicObject#equal?. If we could write our own WeakMap it would be possible to implement "get or insert", but it appears that it's not feasible to make a custom WeakMap unless we have rb_objspace_garbage_object_p.

Would it be possible to add a "get or insert" operation to WeakMap upstream, so we could at least avoid duplicate wrappers going forward?

@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Jun 14, 2023
@fowles
Copy link
Copy Markdown
Contributor

fowles commented Jun 14, 2023

After this fix in debug builds we may start to fail the assert here:

PBRUBY_ASSERT(ObjectCache_Get(key) == val);

I think the best solution is to have ObjectCache_Add return the VALUE that was inserted and then the caller can do a try insert and get back the official one that won the race.

@fowles
Copy link
Copy Markdown
Contributor

fowles commented Jun 14, 2023

After this fix in debug builds we may start to fail the assert here:

PBRUBY_ASSERT(ObjectCache_Get(key) == val);

I think the best solution is to have ObjectCache_Add return the VALUE that was inserted and then the caller can do a try insert and get back the official one that won the race.

Ok, that won't quite work since a racing thread can call item_set and clobber the existing value. An insert or return operation (like @haberman asked for is really the right answer here). I think we can survive the race condition of duplicate wrappers although it is a leaky abstraction, but we do need to remove the assert at the end

@casperisfine
Copy link
Copy Markdown
Contributor Author

Would it be possible to add a "get or insert" operation to WeakMap upstream, so we could at least avoid duplicate wrappers going forward?

You can implement that yourself in Ruby with a mutex. If I read the code correctly the "secondary map" codepath uses a mutex, so similar, but it's easier to do in ruby (untested pseudo-code):

class MyWeakMap
  def initialize
    @map = ObjectSpace::WeakMap.new
    @mutex = Mutex.new
  end

  def getset(key, new_value)
    return old_value if old_value = @map[key]
    @mutex.synchronize do
      @map[key] ||= new_value
    end
  end
end

@casperisfine
Copy link
Copy Markdown
Contributor Author

the other two thread may create their own wrapper for the same underlying Array. We don't know if it's a big deal or not.

Reading the code again, I think this case was already possible before since we were calling RepeatedField_alloc(cRepeatedField) before ObjectCache_Add().

Even with getset, both thread may need to alloc the VALUE and eventually one will notice it ran into a race condition and may abandon its own instance and use the other.

All this to say, I think this is a different concern, and shouldn't prevent the fix from being merged.

@fowles
Copy link
Copy Markdown
Contributor

fowles commented Jun 14, 2023

All this to say, I think this is a different concern, and shouldn't prevent the fix from being merged.

Agreed, but the asserts are a really issue that we should remove

@fowles fowles added the ruby label Jun 14, 2023
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 15, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 15, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 15, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 16, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 16, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
@fowles
Copy link
Copy Markdown
Contributor

fowles commented Jun 16, 2023

Closing this in favor of #13075 13075

@fowles fowles closed this Jun 16, 2023
casperisfine pushed a commit to casperisfine/protobuf that referenced this pull request Jun 19, 2023
Superseeds: protocolbuffers#13054

The object cache is fundamentally subject to race conditions.
Objects must be created before they are registered into the cache,
so if two threads try to create the same object, we'll inevitably
end up with two instances mapping to the same underlying memory.

To entirely prevent that we'd need a lot of extra locking which
I don't think is really worth it compared to a few useless allocations.

Instead we can replace `ObjectCache_Add` by a `getset` type of operation,
the extra instance is still created, but the later threads will receive
the "canonical" instance and will be able to abandon their duplicated
instance.

Additionally, this PR moves the ObjectCache implementation in Ruby,
as it's much easier to debug there, and the performance difference
is negligible. The `ObjectCache` instance is also exposed as
`Google::Protobuf::OBJECT_CACHE` to better allow to debug
potential memory issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🅰️ safe for tests Mark a commit as safe to run presubmits over ruby

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ruby: segfault iterating over protobuf objects

5 participants