-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Added support for compressed files in astropy.io.ascii #425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
astropy/io/ascii/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the with statement here to ensure that the file gets closed.
|
Note that |
|
@taldcroft @mdboom - thanks for the feedback. I agree it would be best to factor this out into a function (that does the better checking you are suggesting) and then have that be used by io.ascii and io.vo. I'll work on it! |
|
@mdboom - so are you suggesting essentially extracting |
|
@mdboom - so reading over what you said again, I guess I should instead write a function/context manager that returns a file object (not the read function). I'll write it separately from the existing _convert_to_fd_or_read_function, and then we can see if _convert_to_fd_or_read_function can use this new function to simplify things? |
|
Agreed. |
|
@mdboom @taldcroft - I created a context manager that can understand filenames, URLs (or at least a subset), and file objects, and also understands compression. I think this addresses all your comments above, but the main caveat is that I'll still new to context managers, so I'm perfectly happy working more on this - just let me know :-) Also feel free to open a PR against my branch with suggested changes. One issue so far: bz2 doesn't support reading from file objects, so I have to create a temporary file. Any ideas? |
|
Might be worth noting that there are also some relevant routines in https://github.com/astropy/astropy/blob/master/astropy/io/fits/file.py and in https://github.com/astropy/astropy/blob/master/astropy/io/fits/util.py; I need to take a closer look at what you've done here though and see if there's anything worth adding that isn't already correct. Otherwise I need to see if I can refactor the io.fits end to make use of this code in any way so there isn't this duplication of effort... |
|
@iguananaut - yes, we should try and reduce duplication - in the end what we're trying to do at that level is pretty generic, so it would be great to have a single function/context manager than can be used across packages. Feel free to hack away with what I have right now. |
This integrates `astropy.io.vo` to use the new function. It fixes the following issues with the new function: - Python 3 compatibility - Adds a "binary" flag to determine whether the resulting file object should return bytes - Doesn't close file descriptors that were passed in
|
I've filed a follow-on pull request for this against @astrofrog's branch. It moves It also addresses a couple of issues:
There are a few lingering issues. This removes any universal newline handling, so it breaks when handed Windows files. We should add some to the test suite to catch this. While it seems like On Python 3, since the expectation is that the returned file handle provides Unicode strings, I wrap it in a With the above two problems, it seems like the best way out is to write our own IO wrappers, one for Unicode decoding and one for newline handling that could be separated out and won't be so strict about the kinds of objects passed in. It's totally doable, I just lost my patience ;) (end rant). Lastly, I think we should probably name this function |
astropy/utils/misc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function concerns me, as it duplicates the functionality in the astropy.config.get_data_* family of functions, especially the URL handling. Might it make more sense to port this behavior there, and use that in io.ascii?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also apply to @mdboom's PR he submitted to @astrofrog - I didn't look...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL stuff in here was actually just added as a secondary feature, but the compression stuff seems to be different from what's in config.get_data_*. How about having this general function (without cache support) in utils, and then have config use as much of it as possible to avoid duplication, and add the cache stuff? It somehow feels more natural for io.votable and io.ascii to use a file-reading utility from utils than from config...
Follow on to astropy#425.
|
@astrofrog (referencing the now outdated diff discussion): I see your point, but I think this means the This might be easier to handle by merging this PR and later combining them and moving the |
|
@eteq - I agree with putting all the functionality in one place, so then we can create
I guess my main point is that there are some things that are specific to getting files internal to astropy, but if we want to create a generic I/O function, it might be better to keep the internal-specific stuff in a separate layer that can wrap around the generic I/O function (in other words, the 'core' function could be astropy-agnostic apart from the caching, and all the rest would be in wrapping functions. What do you think? How about having ? @mdboom, @taldcroft, @iguananaut - do you have any thoughts about this too? |
|
@eteq: I agree. I see this function in this PR as being the low-level one that handles compression and URLs (and eventually newline handling since it looks like we have to do that ourselves once compression is involved). The existing config/data functions that do caching and module-relative lookup could be updated to use this function internally. (I think, unless I'm missing a detail, which possibly I am). |
|
@astrofrog - answering your questions/comments in-order:
So I guess I'm mostly in agreement to your proposal, but I don't think we need both So to be completely clear, I'm suggesting that instead the current How does that sound? And @astrofrog, do you want to do this all, or should we split the effort somehow? |
|
@eteq - I agree with most of what you said - we should have two core functions Regarding the cache, I think it should be available for users who want it, but should be disabled by default. It's easier to explain to users that they can use the cache if they need to improve performance (which allows us to give them information about timeout, etc. in the docs), than to have users who don't understand why the table they read still has an old version when the online one has changed. I do agree the cache has to be available as an option at the low level. Note that if you want, the We can even control the default caching strategy via the config file, defaulting to @mdboom @iguananaut - what do you think about the cache stuff? @eteq - in the mean time, I'll see to working on this - the cache thing is easy to change whatever we decide. |
|
@astrofrog - ok, what you suggest sounds good. I would say yes that the data functions should use the cache by default (that was always the intent), but you're right it makes sense to have the user-level functions' default controlled by a configuration option. Are you also thinking at the same time you'll implement a timeout for the cache? If so, just let me know if you need any clarification regarding the way I was storing the cache metadata on-disk (using |
|
@eteq - do you think the cache specific code should stay in |
|
@eteq - I'm currently working on this, and I was thinking, |
|
(or I could use |
|
And just another question for @eteq - should EDIT: only a few tests use |
|
@astrofrog - I don't think it's that important where the I like Hmm... Now that you mention it, if you're feeling up to it, (Oh, btw, @astrofrog, be sure to run the tests with the |
|
@eteq - ok, I'm making progress on this, but it's not fully functional yet (just pushed what I have so far, but still needs a lot of work). I have another question: should Regarding the context manager thing, I think that once you write it as a context manager, you can't use it as a normal function, but I might be wrong about that. I will check. |
|
@astrofrog - I would agree we don't allow As for the context manager, you definitely can - in fact, now that I look at it, the current For URLs, it's a bit more complicated. In Py 2.x (but not 3.x), |
…o we have to use delete==False and then delete the file later on.
|
I think I've managed to fix all the tests in Windows. The issue was the usual one that Also, rather than strip the Windows By the way, to try and fix this I set up a virtual machine with Windows 2008 Server 64-bit on AWS, and to my surprise, could not reproduce these issues! So it seems the NamedTemporaryFile is a win32-specific thing. I've also rebased. Let's see what Travis says! |
|
That Windows fix seems reasonable. Probably that's the better thing to do in any case. Looks good to me. |
|
Looks good to me as well. I'll let you do the honors, @astrofrog :) |
|
Thanks! I made a last minute commit, and will merge tomorrow morning if Travis passes (it should). |
Added support for compressed files in astropy.io.ascii
|
Hi. I just updated my fork to reflect the latest master, which is a result of this merge, and it broke the conesearch code I inherited from @mdboom .
Error message:: Any advice on the fix? I am using Thanks, |
|
@pllim - you need to now use Can you confirm whether this works? |
|
@astrofrog , thanks for the help, it works indeed. Turns out I need Sample code:: |
|
This looks fine, but you don't need to use the Also, because you're using the |
|
Thanks, @mdboom ! That cleaned up the affected function considerably. Great work, everyone! |
|
I assume this means I don't have to do anything with this anymore? |
|
Oh yeah, and I think this needs a changelog entry. |
|
@iguananaut - indeed! the only thing you might want to do is see if you want to make use of the function in |
|
@iguananaut - good point about the changelog, I'll do it. |
This integrates `astropy.io.vo` to use the new function. It fixes the following issues with the new function: - Python 3 compatibility - Adds a "binary" flag to determine whether the resulting file object should return bytes - Doesn't close file descriptors that were passed in
Added support for compressed files in astropy.io.ascii
MNT: Replace master with main
It turns out it was easy to modify
astropy.io.asciito support gzip- and bzip2-compressed files.@taldcroft - what do you think?