Skip to content

BUG: fix the method for checking local files#23728

Merged
seberg merged 2 commits intonumpy:mainfrom
LoveEatCandy:bugfix/check_local_file
May 9, 2023
Merged

BUG: fix the method for checking local files#23728
seberg merged 2 commits intonumpy:mainfrom
LoveEatCandy:bugfix/check_local_file

Conversation

@LoveEatCandy
Copy link
Copy Markdown

BufferedReader and BufferedWriter cannot be used to determine local files. For example, users can implement CustomFile to operate on OSS files, and then use BufferedReader(CustomFile) to achieve the buffered effect. But fileno method can do it.

@LoveEatCandy
Copy link
Copy Markdown
Author

For example:

from megfile import s3_open
import numpy as np

np.save(s3_open("s3://xxx/test.npy", "wb"), np.zeros(10))

Example not work, because s3_open return BufferedWriter(s3_fd), it isn't local file.

def isfileobj(f):
return isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter))
try:
return isinstance(f.fileno(), int)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess fileno() needs to be tried to see if it is defined, but I think this needs a test and narrowig the exception. Python defines the exception clearly here, AFAIK.

Also, I suspect this breaks e.g. if you pass in a bz2.BZ2File, so I would like to test that as well (and fix it if necessary).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. bz2.BZ2File raise error: io.UnsupportedOperation: Seeking is only supported on files open for reading. Looks like need seekable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly worried that the old code manages to reject file-like objects that do have a fileno() but that are also compressed. I.e. for those writing directly to the fileno() would be wrong because that would side-step compression. But even a compressed file-like might be seekable?

We have a problem with that in tofile() which fails to reject writing to such files, actually I am curious if the old isinstance check we had here is specific enough to get that right (at least in practice).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def isfileobj(f):
    if isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter)):
        try:
            return isinstance(f.fileno(), int)
        except (io.UnsupportedOperation, AttributeError, NotImplementedError):
            pass
    return False

This seems more stable, although it looks a bit strange.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I think it makes sense at least as a minimal fix (I do think its plausible that this should be improved more, but I am not exactly sure how).

Do we really need more than OSError there? Those classes have fileno() defined already, and as I read it are supposed to raise OSError when there is no valid fileno?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OSError is enough.


"""
if isfileobj(filename):
if is_seekable_local_file(filename):

This comment was marked as outdated.

@LoveEatCandy

This comment was marked as outdated.

@seberg
Copy link
Copy Markdown
Member

seberg commented May 8, 2023

py3k is somewhat public, I think. If we need to modify the behavior in weirder ways, we should maybe just introduce a new private function.

@LoveEatCandy LoveEatCandy force-pushed the bugfix/check_local_file branch from 7cc0187 to 9775f9d Compare May 9, 2023 01:45

def isfileobj(f):
return isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter))
if isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove one level of indentation via not ...:?

We should also have a mock-up test for the type of object you wish to support.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

BufferedReader and BufferedWriter cannot be used to determine local
files. For example, users can implement CustomFile to operate on OSS
files, and then use BufferedReader(CustomFile) to achieve the buffered
effect. But fileno method can do it.
@LoveEatCandy LoveEatCandy force-pushed the bugfix/check_local_file branch from 9775f9d to a307e0f Compare May 9, 2023 07:24
@LoveEatCandy
Copy link
Copy Markdown
Author

I noticed that setup.py now requires Python >= 3.9. Is it possible to release a post version for the final versions of Python 3.6-3.8? There are still a few users who use Python 3.6 due to historical reasons.

@seberg
Copy link
Copy Markdown
Member

seberg commented May 9, 2023

Hmmm, there should be one more 1.24.x release, which is Python 3.8 compatible. We don't backport more than one release normally (or practically ever).

Maybe @charris has an opinion. For backporting to 1.24.x it may be best to add an _isfileobj() and use that internally, since its easy enough and avoids changing the somewhat public function. (Otherwise, we gamble a bit if there are any downstream libs that use the check in a weird way, although I couldn't even find a usage in the wild.)

@seberg
Copy link
Copy Markdown
Member

seberg commented May 9, 2023

Let me put this in, I think its good to go, thanks.

Pending Chuck's opinion. If you wish a 1.24.x backport, could you do a PR against maintenance/1.24.x this with an _? That we can do for sure.

Beyond that, I guess it will just be a no, sorry. Python 3.7 is barely in security support anyway and its just not something we normally do.

@seberg seberg merged commit 181c15b into numpy:main May 9, 2023
@LoveEatCandy LoveEatCandy deleted the bugfix/check_local_file branch May 10, 2023 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants