Skip to content

Conversation

@bmerry
Copy link
Contributor

@bmerry bmerry commented Jan 8, 2021

Description

The lexer and parser generation is all collected together in one place
(a new astropy.utils.parsing module), with the side benefit that a lot
of code duplication is eliminated. This new shared code uses appropriate
locking to ensure that both parser creation and use is thread-safe.

Fixes #11225 (when combined with #11224).

The lexer and parser generation is all collected together in one place
(a new astropy.utils.parsing module), with the side benefit that a lot
of code duplication is eliminated. This new shared code uses appropriate
locking to ensure that both parser creation and use is thread-safe.

Fixes astropy#11225 (when combined with astropy#11224).
@bmerry
Copy link
Contributor Author

bmerry commented Jan 8, 2021

I haven't added a new module before, so I'm not sure if there are any associated steps that need to be done e.g. to hook it in to the documentation. I'm also not sure if it should be mentioned under "API Changes" in the changelog, given that it is new API albeit not really intended to external consumption.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmerry - this looks fantastic, thanks! The locking seems simple and good and I really like the cleanup! Only a few small comments inside.

On the larger one: for most stuff in utils, we keep the information to the minimum but we do expose it, by adding an .. automodapi:: ... to docs/utils/index.rst. There is a general warning in that file that utils is mostly for our own use.

For now, I'll put the milestone at 4.3 - am not sure what the thought would be about backporting to the LTS release (or whether that's needed since the files produced are shipped with astropy).

@mhvk mhvk added this to the v4.3 milestone Jan 8, 2021
@mhvk
Copy link
Contributor

mhvk commented Jan 8, 2021

The code coverage I think cannot easily be helped, since the parser code does not get re-generated in testing.

- Add utils.parsing package to documentation, and list it in CHANGES.rst
as new API.
- Make TAB_HEADER and ThreadSafeParser public.
- Some rewording of docstrings.
- Use double-backticks to refer to function arguments (otherwise Sphinx
  was printing warnings) and to refer to undocumented PLY functions.
@bmerry
Copy link
Contributor Author

bmerry commented Jan 11, 2021

The code coverage I think cannot easily be helped, since the parser code does not get re-generated in testing.

One could write a test for this module that deleted and regenerated a dummy lexer+parser. It would end up making modification in the installation tree, which feels a bit icky to me. Let me know if you'd like me to have a go at that.

@mhvk
Copy link
Contributor

mhvk commented Jan 11, 2021

Agreed that changing the installation during testing is not worth it. In any case, for this PR probably best to stick to your (very nice) reorganization.

@mhvk mhvk self-requested a review January 11, 2021 17:01
@mhvk
Copy link
Contributor

mhvk commented Jan 11, 2021

@pllim, @bsipocz, @eteq - could you have a quick look at this reorganization of the unit and angle parsers? I think this is really nice refactoring, which also solves thread-safety issues, but want to be sure you agree, in particular with having common code in the new utils.parsers module.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, and quite like that all those duplicates are gone and condensed into their separate module.
Some minor nitpicks, and maybe a test to cover the file generation, but none of them are big deal so I'm happy to see this merged without those.

@bsipocz
Copy link
Member

bsipocz commented Jan 11, 2021

As for backporting to LTS: I'm a bit worried to backport this much (not this PR, but generally all the threadsafe fixes) to LST. My intuitions may be wrong here, but I would rather not risk changing behaviour in for LTS.
We never promised to be threadsafe, so can always look at it that it will be a new feature for the next release rather than something that requires a fix in LTS.

@mhvk
Copy link
Contributor

mhvk commented Jan 11, 2021

@bsipocz - yes, I also worried this was getting too much, but wanted your thoughts. Let's stick with 4.3, as a nice new feature!

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! "Allowed failure" failures are expected and unrelated to this PR.

I also agree with @bsipocz that a test should be added to cover the missing coverage in parsing.py, specifically the _add_tab_header function. A simple test using tmpdir and a dummy template generated on-the-fly in tmpdir should do, I think? As for testing lex and yacc, I am not sure how easy it would be because it invokes __file__ and all that, so that one needs some thinking.

My review does not address the internals of the parsing logic, which I defer to @mhvk 's expertise.

CHANGES.rst Outdated

- Removed usage of deprecated ``ipython`` stream in ``utils.console``. [#10942]

- Add new ``utils.parsing`` module to with helper wrappers around ``ply``. [#11227]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better suited for "New Features" section, rather than "API change".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# You can then commit the changes to this file.
"""
import astropy.utils.parsing as parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This is more consistent with the style of the existing imports.

Suggested change
import astropy.utils.parsing as parsing
from astropy.utils import parsing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (I merged it into the same line as another from astropy.utils import ...).

from astropy.units.utils import is_effectively_unity
from astropy.utils import classproperty
from astropy.utils.misc import did_you_mean
import astropy.utils.parsing as parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import astropy.utils.parsing as parsing
from astropy.utils import parsing

from .base import Base
from astropy.utils import classproperty
from astropy.utils.misc import did_you_mean
import astropy.utils.parsing as parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import astropy.utils.parsing as parsing
from astropy.utils import parsing


from . import core, generic, utils

import astropy.utils.parsing as parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import astropy.utils.parsing as parsing
from astropy.utils import parsing

Use from `astropy.utils import parsing` and not
`import astropy.utils.parsing as parsing`.
It covers the currently-public API. There is still discussion over
whether TAB_HEADER and/or _add_tab_header should be public.
Most of the coverage comes from the main test suite, but this covers the
case where the lextab/parsetab does not yet exist and has to be
generated.
@bmerry
Copy link
Contributor Author

bmerry commented Jan 12, 2021

As for testing lex and yacc, I am not sure how easy it would be because it invokes file and all that, so that one needs some thinking.

I added a test that writes Python modules with a trivial lexer and parser to a temporary directory, then imported them to generate table files. It turned out to be reasonably straightforward.

@bmerry
Copy link
Contributor Author

bmerry commented Jan 12, 2021

I think this is ready for another look; the one question left in my mind is whether TAB_HEADER and/or add_tab_header should be public or private.

@mhvk
Copy link
Contributor

mhvk commented Jan 12, 2021

The 32-bit error seems related...

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The only thing left is to remove the HEADER from __all__ again - you convinced me it was better.

import threading


__all__ = ['lex', 'TAB_HEADER', 'ThreadSafeParser', 'yacc']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mhvk wants TAB_HEADER removed as per comment above?

Suggested change
__all__ = ['lex', 'TAB_HEADER', 'ThreadSafeParser', 'yacc']
__all__ = ['lex', 'ThreadSafeParser', 'yacc']

@bmerry
Copy link
Contributor Author

bmerry commented Jan 13, 2021

Looks great! The only thing left is to remove the HEADER from all again - you convinced me it was better.

Done (I also put the underscore back).

@bmerry
Copy link
Contributor Author

bmerry commented Jan 13, 2021

The 32-bit error seems related...

Indeed. My guess is that whatever parallelisation is being used doesn't play nicely with tmp_path. I'll take a look.

@bmerry
Copy link
Contributor Author

bmerry commented Jan 13, 2021

Indeed. My guess is that whatever parallelisation is being used doesn't play nicely with tmp_path. I'll take a look.

Actually it's not the parallel run, it's the one where it runs the tests twice in the same process. I'm guessing the modules that are dynamically loaded in the test persist in sys.modules. I'll have to find a workaround for that - possibly monkey-patching sys.modules, or else generating a unique name.

PLY has some restrictions about only defining one parser per module (no
doubt related to whatever deep magic it's using), so generate a unique
module name each time the test is run.
@bmerry
Copy link
Contributor Author

bmerry commented Jan 13, 2021

The 32-bit parallel build is passing now. readthedocs is failing, but I think that's because numpy's docs have vanished.

@mhvk
Copy link
Contributor

mhvk commented Jan 13, 2021

Great! Thanks for fixing that double-test problem. Merging...

@mhvk mhvk merged commit 7726650 into astropy:master Jan 13, 2021
@pllim
Copy link
Member

pllim commented Jan 13, 2021

Just to close the loop, RTD failure was transient, as it built fine after merge. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unit string parsing is not thread-safe

4 participants