Skip to content

UUID encoded in CassandraToGCSOperator but not other operators #22846

@fuxiao224

Description

@fuxiao224

Apache Airflow version

2.2.5 (latest released)

What happened

I noticed that UUID is encoded in CassandraToGCSOperator by:

elif isinstance(value, UUID): return b64encode(value.bytes).decode('ascii')

Therefore, for example, UUID 000e0000-5719-12a3-0000-000028327d4a is represented as "AA4AAFcZEqMAAAAAKDJ9Sg==". However, this seems inconsistent with other *TOGCSOperators. For example, UUID in MySQL/oracle is represented as a UTF8 string of five hexadecimal numbers in format "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", so in the previous example, UUID 000e0000-5719-12a3-0000-000028327d4a would still be represented as "UUID 000e0000-5719-12a3-0000-000028327d4a". Therefore, when using MySQLToGCSOperator/OracleToGCSOperator, UUID will preserve as "UUID 000e0000-5719-12a3-0000-000028327d4a" format, which is not encoded.

Thus, I wonder what is the main concern of encoding UUID in CassandraToGCSOperator, and if possible, can we change it to not encoding UUID when loading Cassandra table to GCS using Airflow? Please let me know your thoughts about this issue. Thanks!

What you think should happen instead

No response

How to reproduce

No response

Operating System

macOS

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions