Skip to content

Python generated code is no longer compatible with Cython since 3.20 #10800

@vthib

Description

@vthib

Version: 4.21.8
Language: Python

Before the 3.20 update, the generated code was statically defining all the messages, and could be given to cython without issues.
Since the 3.20 update, those messages are dynamically generated, leading to a cython error.

For example, given this proto file

syntax = "proto3";

message Foo {
    bool a = 1;
}

The codegen gives this:

$ protoc --python_out=gen proto/a.proto

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: proto/a.proto
"""Generated protocol buffer code."""
from google.protobuf.internal import builder as _builder
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\rproto/a.proto\"\x10\n\x03\x46oo\x12\t\n\x01\x61\x18\x01 \x01(\x08\x62\x06proto3')

_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'proto.a_pb2', globals())
if _descriptor._USE_C_DESCRIPTORS == False:

  DESCRIPTOR._options = None
  _FOO._serialized_start=17
  _FOO._serialized_end=33
# @@protoc_insertion_point(module_scope)

This can no longer be compiled by cython:

$ cython gen/proto/a_pb2.py
Error compiling Cython file:
------------------------------------------------------------
...
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'proto.a_pb2', globals())
if _descriptor._USE_C_DESCRIPTORS == False:

  DESCRIPTOR._options = None
  _FOO._serialized_start=17
 ^
------------------------------------------------------------

gen/proto/a_pb2.py:23:2: undeclared name not builtin: _FOO

Cython does not dynamically check variables from modifications of globals(), and thinks the variable is not set.

This isn't really a regression since I suppose you do not support Cython. However, a small change could make this work with native python and cython:

Instead of generating this:

_FOO._serialized_start=17

generating this would fix the issue:

globals()["_FOO"]._serialized_start=17

This is not an easy fix to do with some post processing of the generated files with regexes and seds and stuff like this, but shouldn't be too hard to do in the code generator I suppose.

Would you be OK with such a change? That would be really useful for cython users. I can try to make this change if needed.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions