gene_cnv_frequencies - fixes pandas fragment warning#68
Conversation
|
Hi @cclarkson, suggestion to resolve this... Instead of this block of code: ...store new columns in a dict, then use pd.concat, e.g.: |
|
@alimanfoo - works with a dict now, should be good to go. |
alimanfoo
left a comment
There was a problem hiding this comment.
Hi @cclarkson, this looks good.
I added a couple of comments about possible ways to reduce the number of times dataframes are copied. Because the dataframe is likely small here then copies are not necessarily a problem, but it's probably not a bad practice to be thinking about when copies are made and if they are necessary or not - could stand us in good stead if/when we have to handle larger dataframes. Up to you how to address, merge when you feel ready.
malariagen_data/ag3.py
Outdated
| freq_cols[f"{coh}_del"] = del_freq_coh | ||
|
|
||
| # build a dataframe with the frequency columns | ||
| df_freqs = pandas.DataFrame.from_dict(freq_cols) |
There was a problem hiding this comment.
Nit, I don't think from_dict is necessary, can just use DataFrame constructor.
malariagen_data/ag3.py
Outdated
|
|
||
| # set gene ID as index for convenience | ||
| df.set_index("ID", inplace=True) | ||
| df = df.set_index("ID") |
There was a problem hiding this comment.
This creates a copy of the dataframe. Here the dataframe is likely to be small and so extra copies probably don't hurt, but it seems unnecessary. Using inplace=True avoids the dataframe copy.
malariagen_data/ag3.py
Outdated
| df = df.reset_index(drop=True) | ||
| df = pandas.concat([df, df_freqs], axis=1) |
There was a problem hiding this comment.
| df = df.reset_index(drop=True) | |
| df = pandas.concat([df, df_freqs], axis=1) | |
| df = pandas.concat([df, df_freqs], axis=1, ignore_index=True) |
Calling reset_index causes a dataframe copy. As with comment below, copying a relatively small dataframe probably doesn't hurt, but in this case could be avoided by using concat with ignore_index=True which will cause the columns to just get stacked up without trying to do any kind of join.
resolves #67
Making a
df.copy()of the dataframe each iteration consolidates the df and stops the warning.