Skip to content

Comments

Alternative source of protein domains#78

Merged
pbnjay merged 4 commits intomasterfrom
repr-domains
Jan 27, 2024
Merged

Alternative source of protein domains#78
pbnjay merged 4 commits intomasterfrom
repr-domains

Conversation

@matthiasblum
Copy link
Collaborator

@matthiasblum matthiasblum commented Jan 27, 2024

lollipop fetches Pfam domains and additional motif features using the InterPro REST API. If there are no Pfam domains or motif features, the REST API response will return an HTTP status of 204 (No Content). Currently, the code checks that responses have an HTTP status of 200 and exits if they don't. This pull request changes this behavior to accepts 204 HTTP responses (although no regions will be drawn in this case).

Example for CCDC61 (no Pfam annotation)

./lollipops -legend -show-motifs -show-disordered -w=1400 -o CCDC61.png CCDC61 R21Q Q63T G102A

CCDC61


This pull request introduces a new option, -D <source>, which enables the customization of the domain source. By default, it is set to Pfam (consistent with the current behavior). However, it also accepts InterPro as a source. In the case of InterPro, representative domains, selected from all domain and repeat databases within the InterPro consortium (including CDD, NCBIfam, Pfam, PROSITE, and SMART), are retrieved and then drawn.

Example for CCDC61 (no Pfam annotation, but CDD annotation)

./lollipops -legend -show-motifs -show-disordered -D interpro -w=1400 -o CCDC61.png CCDC61 R21Q Q63T G102A

CCDC61

There are cases where two databases of protein families and domains describe the same domain, but with slightly different annotation locations. The use of -D interpro allows for the drawing of the "best" domain, typically the longest one.

Example with PIK3CA:

./lollipops -labels -legend -D pfam -w=1400 -o PIK3CA-Pfam.png PIK3CA N345K

PIK3CA-Pfam

The Pfam PI3K-type C2 domain does not include the N345K variant.

./lollipops -labels -legend -D interpro -w=1400 -o PIK3CA-InterPro.png PIK3CA N345K

PIK3CA-InterPro

However, the same domain described by CDD does.


Finally, in cases where the -o <filename> option is not provided, the graphic is saved as <gene-symbol>.svg. However, if a UniProtKB accession is used instead of a gene symbol (-U <accession>), the graphic is saved as .svg. This pull request resolves this issue by using the UniProt accession for the output file.

@matthiasblum matthiasblum marked this pull request as ready for review January 27, 2024 13:55
@matthiasblum matthiasblum requested a review from pbnjay January 27, 2024 13:55
@pbnjay
Copy link
Member

pbnjay commented Jan 27, 2024

Nicely done. Thank you!

@pbnjay pbnjay merged commit e1da885 into master Jan 27, 2024
@matthiasblum matthiasblum deleted the repr-domains branch August 23, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants