-
Notifications
You must be signed in to change notification settings - Fork 211
Fix usage of rb_struct_initialize() to pass an Array of members values and not a Hash and add TruffleRuby in CI #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s and not a Hash
* rb_struct_initialize() does not accept a Hash, and it's very brittle
to pass `[{...}]` and to rely on that C function using rb_keyword_given_p().
It basically worked accidentally, by having **members in the caller of the caller.
Such logic when Struct#initialize is defined in Ruby (as in TruffleRuby) is basically impossible to implement,
because it's incorrectly treating positional arguments as keyword arguments.
* rb_struct_initialize() is used in CRuby to set members of Data instances in marshal.c (there is no rb_data_initialize() yet).
There, the code passes an Array of members values for Data (and for Struct which are not `keyword_init: true`):
https://github.com/ruby/ruby/blob/48c7f349f68846e10d60ae77ad299a38ee014479/marshal.c#L2150-L2176
So we should do the same in psych.
* Rename to init_data since it's only used for Data.
* See ruby#692 (comment).
|
I added TruffleRuby in CI in this PR since it just needs a tiny extra fix to be fully green. |
| end | ||
| data ||= allocate_anon_data(o, members) | ||
| init_struct(data, **members) | ||
| values = data.members.map { |m| members[m] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use members.values but then we should check if the data_class.members == members.keys, and raise an exception if not. If so, which exception class should I use then, do we have one for "incompatible change in loaded classes"?
Otherwise if e.g. one changes the order of fields from Foo = Data.define(:a, :b) to Data.define(:b, :a) then it would load the instance incorrectly in a rather surprising manner.
If there are extra members in Data.define, the current approach sets them to nil, like before:
irb(main):002> Psych::VERSION
=> "5.2.6"
irb(main):011> Foo = Data.define(:a,:b)
=> Foo
irb(main):012> d=Psych.dump Foo.new(1,2)
=> "--- !ruby/data:Foo\na: 1\nb: 2\n"
irb(main):013> puts d
--- !ruby/data:Foo
a: 1
b: 2
=> nil
irb(main):014> Foo = Data.define(:a,:b, :c)
(irb):14: warning: already initialized constant Foo
(irb):11: warning: previous definition of Foo was here
=> Foo
irb(main):015> Psych.unsafe_load d
=> #<data Foo a=1, b=2, c=nil>This is rb_struct_initialize()-specific behavior BTW, Data#initialize doesn't allow that:
> Data.define(:a,:b).new(a: 1)
(irb):3:in 'Data#initialize': missing keyword: :b (ArgumentError)Checking members would raise, unless we get fancy and only compare the first N members where N is members.size.
Basically this seems a good approach and it's compatible with what was done before.
|
While I agree that So, as I understand it, this PR deviates from the current behavior of CRuby with psych 5.4.6. I think it should be updated to have the same behavior. Probably some tests need to be added for the following failure cases (I've included With CRuby 3.4.5 and psych 5.2.6: $ irb -ryaml
irb(main):001> D = Data.define(:foo, :bar)
=> D
irb(main):002> d = D["foo", "bar"]
=> #<data D foo="foo", bar="bar">
irb(main):003> d.to_yaml
=> "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
irb(main):004> Marshal.dump d
=> "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
irb(main):005>
$ irb -ryaml
irb(main):001> D = Data.define(:a, :b)
=> D
irb(main):002> Marshal.load "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
<internal:marshal>:34:in 'Marshal.load': struct D not compatible (:foo for :a) (TypeError)
from (irb):2:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):003> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
/home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:222:in 'Psych::Visitors::ToRuby#init_struct': unknown keywords: foo, bar (ArgumentError)
init_struct(data, **members)
^^^^^^^^^^^^^^^
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:222:in 'Psych::Visitors::ToRuby#visit_Psych_Nodes_Mapping'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in 'Psych::Visitors::Visitor#visit'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in 'Psych::Visitors::Visitor#accept'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in 'Psych::Visitors::ToRuby#accept'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:345:in 'Psych::Visitors::ToRuby#visit_Psych_Nodes_Document'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in 'Psych::Visitors::Visitor#visit'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in 'Psych::Visitors::Visitor#accept'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in 'Psych::Visitors::ToRuby#accept'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/nodes/node.rb:49:in 'Psych::Nodes::Node#to_ruby'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych.rb:275:in 'Psych.unsafe_load'
from (irb):3:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):004>And with TruffleRuby 33.0.0-dev-d6dd6557 and psych 5.2.6: $ rbenv shell truffleruby-dev
$ irb -ryaml
irb(main):001> D = Data.define(:foo, :bar)
=> D
irb(main):002> d = D["foo", "bar"]
=> #<data D foo="foo", bar="bar">
irb(main):003> d.to_yaml
=> "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
irb(main):004> Marshal.dump d
=> "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
irb(main):005>
$ irb -ryaml
irb(main):001> D = Data.define(:a, :b)
=> D
irb(main):002> Marshal.load "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
<internal:core> core/marshal.rb:1013:in `block in construct_struct': struct D is not compatible (:foo for :a) (TypeError)
from <internal:core> core/integer.rb:157:in `times'
from <internal:core> core/marshal.rb:1010:in `construct_struct'
from <internal:core> core/marshal.rb:719:in `construct'
from <internal:core> core/marshal.rb:1532:in `load'
from (irb):2:in `<main>'
from <internal:core> core/kernel.rb:417:in `loop'
from <internal:core> core/throw_catch.rb:36:in `catch'
from /home/nick/.local/share/rubies/truffleruby-dev/lib/gems/gems/irb-1.15.3/exe/irb:9:in `<top (required)>'
from <internal:core> core/kernel.rb:386:in `load'
from /home/nick/.local/share/rbenv/versions/truffleruby-dev/bin/irb:25:in `<main>'
irb(main):003> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D a={:foo=>"foo", :bar=>"bar"}, b=nil>
irb(main):004>
$ ruby --version
truffleruby 33.0.0-dev-d6dd6557, like ruby 3.3.7, Oracle GraalVM Native [x86_64-linux]As a user, if my serialized data doesn't match the current definition, I'd much rather see a crash than have it continue with no exception but corrupted values. @eregon You did bring up the scenario where the parameter order is changed. Currently, $ ruby --version
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [x86_64-linux]
$ irb -ryaml
irb(main):001> D = Data.define(:bar, :foo)
=> D
irb(main):002> Marshal.load "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
<internal:marshal>:34:in 'Marshal.load': struct D not compatible (:foo for :bar) (TypeError)
from (irb):2:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):003> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D bar="bar", foo="foo">
irb(main):004> Although Data.define does provide an explicit ordering, which can be used with |
|
And sorry to nitpick, without celebrating first! On the plus side, with this PR merged and psych 5.3.0 released, CRuby and TruffleRuby both behave the same now! $ irb -ryaml
irb(main):001> D = Data.define(:foo, :bar)
=> D
irb(main):002> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D foo="foo", bar="bar">
irb(main):003>
$ irb -ryaml
irb(main):001> D = Data.define(:a, :b)
=> D
irb(main):002> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D a=nil, b=nil>
irb(main):003> I believe that second scenario is a bug (I'll open an issue for it), but at least it's the same bug in both implementations! 🎉 😉 |
I agree this is safer, maybe you can make a PR for that? (EDIT: I made #761)
Have you seen #749 (comment)? I made that comment explicitly to discuss what should happen if members don't match, but it didn't receive any reply in over a month, so I assumed that everyone is OK with the current solution. I think I hadn't considered completely different members as in your example, I just assumed it would behave like in my comment with the extra member, but it seems it doesn't. That's a good point and I think we should validate members strictly, potentially allowing a different order.
Please file a ticket at https://bugs.ruby-lang.org/ for this, as I mentioned in #692 (comment).
I think that is orthogonal, regardless of how we pass the data we can/need to validate in Psych (relying on If there a C API to init a Data with a Hash or keyword arguments, that's fine, but there isn't one currently (though performance-wise a Hash is slower than Array). It's also relatively awkward to pass keyword arguments to a C function (needs to be wrapped in another C function, passed to |
Actually I'll make one: #761 |
|
This PR introduced a new method only for the C extension, breaking Psych on JRuby. The JRuby jobs appear to not be reporting the error properly in most cases, but did show up as a failure in the most recent PR merge. |
Nevermind, the new methods were originally added in #692 but never added for JRuby. That PR was merged in May 1 and the JRuby side has been broken ever since, but the CI jobs were not properly reflecting those failures. |
|
@eregon Earlier, I skipped past part of your earlier comment, but I wanted to quickly circle back and address a couple of points:
Sorry about that, sincerely. I do owe you an apology. When I first saw the ticket, it seemed good enough to me and I was busy with other unrelated things, so I was fine with just a thumbs up. Only after some time did I feel my objection strongly enough to comment. When I returned to it later, 1) I'd forgotten some of the details of your earlier comments, and 2) I was speaking with disproportionate urgency because I realized I'd waited too long and it "might" get merged as-is... Only while writing the comment did I discover it had already been merged and released! I wrote my follow-up comment to try and convey that, despite my objections, I do appreciate the work you've put into this! And honestly, releasing it as-is was better than delaying any longer for my objections: the 5.3.0 release was hugely useful for the very issue that brought me to this in the first place! But again, I'm sorry for the unfairness in both timing and tone of my objections. |
...
After #765, I think we should file a slightly different ticket:
And, of course, TruffleRuby could add identical code for these on both What do you think? Ideas for better method names? It'll be a few days (or weeks) before I get around to filing such a ticket. So if you feel the urgency more keenly, I don't mind if you file this proposal (or your own) instead. |
Thank you, I appreciate it. |
I think this a great idea, because then it calls the subclass-specific copy of
I suspect this would be less likely to be accepted, or rather that only one of them would be accepted.
It'd be nice to clean up the usage in Marshal for sure, there are probably subtle bugs lurking there as we have seen here. I wouldn't make it public API though, Marshal is in core so doesn't need it to be public. |
[{...}]and to rely on that C function using rb_keyword_given_p(). It basically worked accidentally, by having **members in the caller of the caller. Such logic when Struct#initialize is defined in Ruby (as in TruffleRuby) is basically impossible to implement, because it's incorrectly treating positional arguments as keyword arguments.keyword_init: true): https://github.com/ruby/ruby/blob/48c7f349f68846e10d60ae77ad299a38ee014479/marshal.c#L2150-L2176 So we should do the same in psych.