-
Notifications
You must be signed in to change notification settings - Fork 28
add get_name patch/spool method #471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #471 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 115 115
Lines 9507 9548 +41
=======================================
+ Hits 9493 9534 +41
Misses 14 14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
@ahmadtourei, mind taking this for a spin to see if it behaves how you wanted ? |
dascore/utils/patch.py
Outdated
| Examples | ||
| -------- | ||
| >>> import dascore as dc | ||
| >>> from dascore.utils.patch import get_patch_names | ||
| >>> patch = dc.get_example_patch() | ||
| >>> name = get_patch_names(patch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name is:
0 DAS_______random__2017_09_18__2017_09_18T00_00_07
Name: network, dtype: objectWe get DAS_______random__2017_09_18__2017_09_18T00_00_07 using name[0] because the function returns a <class 'pandas.core.series.Series'>. Is there a specific reason that it returns a pandas series with Name: network, dtype: object (a little confusing) instead of the names in string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the function to work with a spool it needs to return multiple names. I avoided returning multiple types (based on if the input is a Patch or not) from get_patch_names as that is a bit of an anti-pattern.
I understand the confusion though. I will add get_patch_name as a method of Patch that will do this. I think with the different name (get_patch_name vs get_patch_names) it will be clear.
| patch_data: pd.DataFrame | dc.Patch | dc.BaseSpool, | ||
| prefix="DAS", | ||
| attrs=("network", "station", "tag"), | ||
| coords=("time",), | ||
| sep="__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need test(s) for optional inputs . For example, get_patch_names(patch, patch) does not raise an error.
tests/conftest.py
Outdated
| @pytest.fixture(scope="session") | ||
| def random_spool_directory(tmp_path_factory): | ||
| """A directory with a few patch files.""" | ||
| out = Path(tmp_path_factory.mktemp("one_file_file_spool")) | ||
| spool = ex.get_example_spool("random_das") | ||
| out_path = ex.spool_to_directory(spool, path=out) | ||
| return dc.spool(out_path).update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this as an example in dascore.examples as all of the example spools are memory spools.
dascore/utils/patch.py
Outdated
| def _get_filename(path_ser): | ||
| """Get the file name from a path series.""" | ||
| ser = path_ser.astype(str) | ||
| file_names = [x[-1].split(".")[0] for x in ser.str.split("/")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we introduce an optional argument (such as keep_extension) so the user can decide whether to get patch's name with or without file extension? I can work on this as I already implemented this in my version #461. What do you think?
ahmadtourei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have posted my comments below.
One general comment that I have is because this can be a very useful function, it would be great to create a shortcut for it (something like patch.name and spool.name so the user can quickly get name(s) in string format. Then we can include it in the Shortcuts section of dascore/docs/tutorial/patch.qmd. I assume users with limited DASCore experience cannot quickly locate it in dascore.utils.patch.
I agree. I think it should be a patch/spool method. Because there are optional parameters though, probably best not to make it an attribute (because then the users are always stuck with the defaults). |
|
Thanks @ahmadtourei for the great review. I have addressed all of your suggestions/concerns. Anything else you want to see in here? I also included a fix for the doc build so I am keen to merge this soon to see if the fix works. |
dascore/examples.py
Outdated
| return dc.spool(out) | ||
|
|
||
|
|
||
| @register_func(EXAMPLE_SPOOLS, key="radom_directory_das") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*random
| The "name" of the patch, which is the filename that would be used if the patch were saved to disk by DASCore, can be generated via the [`Spool.get_patch_names`](`dascore.BaseSpool.get_patch_names`) to get a series of the names of the managed patches, or [`Patch.get_patch_name`](`dascore.Patch.get_patch_name`) to get a string of the patch name. | ||
|
|
||
| ```python | ||
| import dascore as dc | ||
|
|
||
| patch = dc.get_example_patch() | ||
| spool = dc.get_example_spool() | ||
|
|
||
| patch_name = patch.get_patch_name() | ||
| patch_names = spool.get_patch_names() | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
Thanks, Derrick, for your work on this PR! It looks great. The only new comment I have is a typo I listed below. |
Description
This PR implements a
get_namemethod for the patch and the spool to return a series of names. The names are either generated by columns provided to the function (which has sensible defaults to maintain the old behavior) or by existing columns such asnameorpath.This PR supersede #461.
Checklist
I have (if applicable):