Add unrestricted_use_only and surveillance_use_only constructor params#724
Add unrestricted_use_only and surveillance_use_only constructor params#724
unrestricted_use_only and surveillance_use_only constructor params#724Conversation
unrestricted_use_only and surveillance_use_only constructor params
…_releases property.
|
Opening this for a WIP review, to potentially avoid going too far down the wrong path. @alimanfoo @cclarkson @ahernank @jonbrenas I'm trying to identify other functions that we need to filter results for, according to:
Those seem like the main ones, which some other functions use, but I want to make sure we plug all potential leaks. |
|
|
|
It looks like Ag3 currently has about 137 public methods... 🤔 |
|
It also looks like about 119 of Ag3's public methods cannot be called without specifying params (cannot rely on defaults), which makes the testing of these constructor params somewhat difficult to automate. This also smells a lot like a "god object" anti-pattern https://en.wikipedia.org/wiki/God_object We should probably consider re-organising all of those methods, despite the inconvenience, but that will probably have to wait. In the meantime, I hope to be able to figure out which functions are vulnerable to leaking unfiltered data, relating to either the surveillance-only or unrestricted-use-only flags. |
|
@ahernank @jonbrenas Checkmarks indicate whether the function gets its data from an upstream public function, or file, or param, or otherwise looks covered. Unchecked functions indicate some doubt and require further investigation, discussion or coding.
|
unrestricted_use_only and surveillance_use_only constructor paramsunrestricted_use_only and surveillance_use_only constructor params
|
Now investigating unexpected test failures after merging with master branch. Local pytest:
CI pytest:
|
|
Thanks @leehart. For IGV to work, it needed to have access to a 'public' version of the reference genomes, e.g., at |
|
Submitting this for review while we figure out |
# Apply the sample_query or sample_indices, if specified.
if prepared_sample_query is not None:
# Assume a pandas query string.
sample_query_options = sample_query_options or {}
df_samples = df_samples.query(prepared_sample_query, **sample_query_options)
df_samples = df_samples.reset_index(drop=True)
# Determine which samples match the sample query.
if sample_query != "":
loc_samples = df_samples.eval(sample_query, **sample_query_options)
if prepared_sample_query is not None:
# Resolve query to a list of integers for more cache hits - we
# do this because there are different ways to write the same pandas
# query, and so it's better to evaluate the query and use a list of
# integer indices instead.
df_samples = self.sample_metadata(sample_sets=prepared_sample_sets)
sample_query_options = sample_query_options or {}
loc_samples = df_samples.eval(
prepared_sample_query, **sample_query_options
).values
sample_indices = np.nonzero(loc_samples)[0].tolist()Basically, it seems to be around using Pandas |
|
Since the I'm tempted to force the use of the less efficient
|
|
Hi @jonbrenas @ahernank - I've tried to elucidate the problem with What is the
|
…le_indices. Add sample_query_options to biallelic_snps_to_plink.
|
Plan to amend the behaviour for the following functions, with regards to using the
|
|
Thanks @leehart. It is kind of wild (and also wrong) that we had three different potential behaviours for the interaction between As far as I am concerned, |
|
Thanks @jonbrenas. I think I understand some of your concerns, but I reckon I still need some clarification. I guess it might boil down to how the users are (or should be) determining the samples_df = ag3_default.sample_metadata(sample_sets=sample_set)Then it would seem natural to the user to expect the same behaviour for an object with samples_df = ag3_surveillance.sample_metadata(sample_sets=sample_set)In this context, the user is not explicitly passing any However, if the user is getting their As far as I can tell, these
Banning the use of I'm not sure I understand the solution regarding In any case, it would be good to have a clear understanding and position on this. Ideally, I think we want to preserve existing functionality and expectations, and minimise potential confusion and complexity, while keeping the user experience as unsurprising and intuitive as possible. I feel like I'm missing something or overlooking something important here! |
|
Thanks @leehart. Sorry, I had misinterpreted the way these parameters were going to be used and, as a result, my comment didn't make much sense. I thought that |
|
Ah, I see, thanks @jonbrenas . Yeah, these params are only going to be used (AFAIK) during object construction, e.g. |
|
OK, I think I've sorted out the
It seems good to at least have an internal method of getting an unfiltered list of samples though, e.g. via |
|
Investigating CI test failures, which didn't fail locally: [ It looks like that was one of those transient random failures, due to simulated test data. Resolved by re-running. ] |
|
@jonbrenas Just to note that this code hasn't been reviewed or merged in yet, so work on the advanced course might need to be provisional or require changes if something here needs to change. But I expect the general behaviour will be the same. |
|
Thanks @leehart. Yes, the work on the advanced course is going to be provisional, it is also going to be mostly textual (i.e., explaining why some sample sets are restricted, why some samples might be biased in a way that means they are not to be used for surveillance, ...). I don't remember if we had agreed that we would meet for you to show what has changed how and why or if you thought that I could figure it out my own. If the latter, I'll get to reviewing this PR asap. |
|
Hi @jonbrenas . Maybe take a look and take some notes, and then we can meet to discuss, if you like. There's quite a bit, so we might need to do a few rounds. Or you might think it all looks OK! 🚀 |
|
@jonbrenas Let's try to meet to go through this PR in September. |
jonbrenas
left a comment
There was a problem hiding this comment.
It looked good to me last time I checked and the additional tests I ran all worked fine.
|
Also approved by @ahernank . Merging in now. |
Re: issue #716