Skip to content

GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix  #22675

@Yao-ATG

Description

@Yao-ATG

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

No response

Apache Airflow version

2.2.4 (latest released)

Operating System

MacOS 12.2.1

Deployment

Composer

Deployment details

No response

What happened

I have file "hourse.jpeg" and "hourse.jpeg.copy" and a folder "hourse.jpeg.folder" in source bucket.
I use the following code to try to copy only "hourse.jpeg" to another bucket.
gcs_to_gcs_op = GCSToGCSOperator(
task_id="gcs_to_gcs",
source_bucket=my_source_bucket,
source_object="hourse.jpeg",
destination_bucket=my_destination_bucket
)

The result is the two files and one folder mentioned above are copied.
From the source code it seems there is no way to do what i want.

What you think should happen instead

Only the file specified should be copied, that means we should treat source_object as exact match instead of prefix.
To accomplish the current behavior as prefix, the user can/should use wild char
source_object="hourse.jpeg*"

How to reproduce

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions