Update download_file method in s3_hook to accept files and directories#23654
Update download_file method in s3_hook to accept files and directories#23654kalyani-subbiah wants to merge 6 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
There was a problem hiding this comment.
Filename is made optional here with None as default value. It seems filename cannot be None as per Boto S3 docs. What happens when filename is not passed?
This is also a change in the documented interface and might need changelog update : https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/s3/index.html?highlight=download_file#airflow.providers.amazon.aws.hooks.s3.S3Hook.download_file
There was a problem hiding this comment.
Thanks for the review.
Good catch on the filename - I can change its type from Optional[str] to str.
Not sure where to edit the docs for the s3_hook, the link at the bottom of the page: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/s3/index.html that reads: "Suggest a change on this page" gives me a 404 error: https://github.com/apache/airflow/edit/main/docs/apache-airflow-providers-amazon/_api/airflow/providers/amazon/aws/hooks/s3/index.rst
There was a problem hiding this comment.
Please also don't rename the parameter like that.
renaming it is a breaking change and we don't want to break user code if we can avoid it.
Users may use this function directly in thier code.
If you decide that it's best to rename (and can explain why this rename is needed) then you must support both local_path and filename.
for local_path you need to raise deprecation warning.
There was a problem hiding this comment.
Good point. In that case we can keep the argument local_path.
If the local_path argument is a path to a directory, then we run the old download_fileobj method from boto3. If its a path to a file, then we run download_file from boto3. I think this workflow would be non-breaking.
We would still have to update the docs to reflect this change though.
There was a problem hiding this comment.
don't get me wrong.. if you think it's best to use the new name we can do the deprecation..
it's not that complex it just add 3-4 code lines :)
See for example:
airflow/airflow/providers/trino/hooks/trino.py
Lines 165 to 173 in ccb5ce9
There was a problem hiding this comment.
Yeah. Deprecations should be when we have some external circumstances - such as changing naming conventions from libraries etc. But if we are in full control of the API and can choose, then it's better to keep it backwards compatible.
There was a problem hiding this comment.
Hi! I have updated this pull request based on the feedback above. It's now non-breaking. Thanks so much everyone.
I've added a conditional for using local_path as path to a directory or a file.
Also added an additional unit-test.
…oto3 instead of download_fileobj in order to keep track of the file name & update unit tests
|
Needsrebase/fixing static checks. |
| s3_hook.get_key.assert_called_once_with(key, bucket) | ||
| s3_obj.download_fileobj.assert_called_once_with(mock_temp_file) | ||
|
|
||
| def test_download_file2(self, s3_bucket): |
There was a problem hiding this comment.
Please use a more descriptive test/method name. What is this testing that test_download_file doesn't? These are test methods, so you can rename them to something like test_download_file_with_filename and test_download_file_with_directory or something like that, which is much more useful.
There was a problem hiding this comment.
Hi, thanks for pointing it out - will do once I get some time.
|
How about fixing static checks and doc failures @kalyani-subbiah - we won't even look at merging it before they are fixed really. |
|
Rebase and use pre-commit/breeze to build docs. |
|
Docs build failing. |
|
@kalyani-subbiah are you still working on this PR? |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Checking in again here, @kalyani-subbiah are you still working on this PR? It looks like this one was close to the finish line but needed some rebasing and docs fixes. |
|
Followup: #26886 |
closes: #23514
Change the download_file method in s3_hook (aws provider) to use the download_file method from boto3 instead of download_fileobj. This allows a local filename to be passed as an argument, making it easier to track files in a bucket.