Skip to content

Comments

Add SiteRM model#73

Merged
pascalnotin merged 1 commit intoOATML-Markslab:mainfrom
sprillo:siterm
Mar 9, 2025
Merged

Add SiteRM model#73
pascalnotin merged 1 commit intoOATML-Markslab:mainfrom
sprillo:siterm

Conversation

@sprillo
Copy link
Contributor

@sprillo sprillo commented Mar 7, 2025

Summary

This PR adds the SiteRM model from our NeurIPS 2024 paper Ultrafast classical phylogenetic method beats large protein language models on variant effect prediction . SiteRM is an independent-sites model which relies only on MSA information (like EVMutation) and yet does remarkably well.

SiteRM scores both the DMS zero shot substitutions and the clinical zero shot substitutions datasets. The results are as follows:

DMS zero shot substitutions

image

Clinical zero shot substitutions

image

Reproducing results

To reproduce the results above, just make sure to pip install cherryml. Then you can follow the standard pipeline, e.g. for DMS zero shot substitutions:

$ cd scripts/scoring_DMS_zero_shot/
$ time bash scoring_SiteRM_substitutions.sh
$ bash merge_all_scores.sh && bash performance_substitutions.sh
$ cd ../..
$ cat benchmarks/DMS_zero_shot/substitutions/AUC/Summary_performance_DMS_substitutions_AUC.csv

The model will be trained on the spot when running the script scoring_SiteRM_substitutions.sh. It takes ~2 hours to run on my Mac. The number of cores used to parallelize computation can be changed in the script scoring_SiteRM_substitutions.sh (the default is 8). The results reproduce to within +-0.001 for all metrics owing to random seed initialization.

Thanks for this great benchmarking resource!

Best,
Sebastian Prillo

@pascalnotin pascalnotin merged commit 445722b into OATML-Markslab:main Mar 9, 2025
@pascalnotin
Copy link
Contributor

Hi Sebastian -- congrats again on the paper! I was able to reproduce results, and just merged into main. Thank you for the PR!
Pascal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants