Conversation
…usage to notebook.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hi @alimanfoo . Do we know how we want to allow any value in the
def prep_samples_for_cohort_grouping(
*, df_samples, area_by, period_by, taxon_by="taxon"
):
# Take a copy, as we will modify the dataframe.
df_samples = df_samples.copy()
# Fix "intermediate" or "unassigned" taxon values - we only want to build
# cohorts with clean taxon calls, so we set other values to None.
loc_intermediate_taxon = (
df_samples[taxon_by].str.startswith("intermediate").fillna(False)
)
df_samples.loc[loc_intermediate_taxon, taxon_by] = None
loc_unassigned_taxon = (
df_samples[taxon_by].str.startswith("unassigned").fillna(False)
)
df_samples.loc[loc_unassigned_taxon, taxon_by] = None
# Add period column.
if period_by == "year":
make_period = _make_sample_period_year
elif period_by == "quarter":
make_period = _make_sample_period_quarter
elif period_by == "month":
make_period = _make_sample_period_month
else: # pragma: no cover
raise ValueError(
f"Value for period_by parameter must be one of 'year', 'quarter', 'month'; found {period_by!r}."
)
sample_period = df_samples.apply(make_period, axis="columns")
df_samples["period"] = sample_period
# Add area column for consistent output.
df_samples["area"] = df_samples[area_by]
return df_sampleswith def _make_sample_period_month(row):
year = row.year
month = row.month
if year > 0 and month > 0:
return pd.Period(freq="M", year=year, month=month)
else:
return pd.NaT
def _make_sample_period_quarter(row):
year = row.year
month = row.month
if year > 0 and month > 0:
return pd.Period(freq="Q", year=year, month=month)
else:
return pd.NaT
def _make_sample_period_year(row):
year = row.year
if year > 0:
return pd.Period(freq="Y", year=year)
else:
return pd.NaT |
|
@cclarkson @ahernank @jonbrenas Any thoughts or knowledge regarding the above question for this issue? |
|
Your guess is probably as good as mine. Maybe allow to use a column that already exists as "period" or a dictionary of queries? It might be worth asking for details at the next tools meeting. |
|
Restricting this PR to just adding the "taxon_by" param. I've split the original issue into two sub-issues, in order to try to address the |
|
To do: add to other |
|
To do: investigate and resolve potential merge conflict issue around |
…frq_base.py build_cohorts_from_sample_grouping()
*_frequencies_advanced() functions
…_advanced(). Merge util.py prep_samples_for_cohort_grouping() into frq_base.py.
|
Now that I have some clarity on the sibling issue #723 I'm tempted to include it under this PR, and then update the PR title accordingly. |
…uping(). Add nb test. Remove old _make_sample_period_...() funcs from util.py.
*_frequencies_advanced() functions*_frequencies_advanced() functions and allow the "period_by" param to specify a column name
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #694 +/- ##
==========================================
- Coverage 96.13% 95.97% -0.16%
==========================================
Files 47 47
Lines 4683 4699 +16
==========================================
+ Hits 4502 4510 +8
- Misses 181 189 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
To do: increase test coverage, in light of codecov test failures. |
…eriod_by as random_year
… period_by as random_year
alimanfoo
left a comment
There was a problem hiding this comment.
Looking good, just a couple of minor suggestions re parameter checking.
jonbrenas
left a comment
There was a problem hiding this comment.
None of my comments are required changes and I think @leehart addressed all the concerns expressed by @alimanfoo (but I will let @alimanfoo confirm that it is so).
…with underscore when non-default taxon_by col
Re: parent issue #391
Re: issues #722 and #723