-
Notifications
You must be signed in to change notification settings - Fork 40
Closed
Description
Hi, the code below is problematic for two reasons:
CoolBox/coolbox/core/track/gtf.py
Lines 109 to 123 in 36a86b2
| name_attr = self.properties.get("name_attr", "auto") | |
| if name_attr == "auto": | |
| gene_name = df['attribute'].str.extract(".*gene_name (.*?) ").iloc[:, 0].str.strip('\";') | |
| if gene_name.hasnans: | |
| gene_id = df['attribute'].str.extract(".*gene_id (.*?) ").iloc[:, 0].str.strip('\";') | |
| gene_name.fillna(gene_id, inplace=True) | |
| if gene_name.hasnans: | |
| pos_str = df['seqname'].astype(str) + ":" +\ | |
| df['start'].astype(str) + "-" +\ | |
| df['end'].astype(str) | |
| gene_name.fillna(pos_str, inplace=True) | |
| df['feature_name'] = gene_name | |
| else: | |
| df['feature_name'] = df['attribute'].str.extract(f".*{name_attr} (.*?) ").iloc[:, 0].str.strip('\";') | |
| return df |
- it does not do a sanity check for
NaNvalues when thename_attris not set toauto. This means that anyNaNwill be passed as a label to DnaFeaturesViewer and the code will crash because it tries to split a float. - The regex will no work if name_attr is the last of the list.
The code can be fixed doing a sanity check for NaN out outside of the if..else and adjusting the regular expression pattern, in the following way:
name_attr = self.properties.get("name_attr", "auto")
if name_attr == "auto":
gene_name = df['attribute'].str.extract(".*gene_name (.*?) ").iloc[:, 0].str.strip('\";')
if gene_name.hasnans:
gene_id = df['attribute'].str.extract(".*gene_id (.*?) ").iloc[:, 0].str.strip('\";')
gene_name.fillna(gene_id, inplace=True)
else:
gene_name = df['attribute'].str.extract(f".*{name_attr} (.*?)(?:[ ;])").iloc[:, 0].str.strip('\";')
if gene_name.hasnans:
pos_str = df['seqname'].astype(str) + ":" +\
df['start'].astype(str) + "-" +\
df['end'].astype(str)
gene_name.fillna(pos_str, inplace=True)
df['feature_name'] = gene_name
return df
Hope this helps.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels