Add SiteRM model by sprillo · Pull Request #73 · OATML-Markslab/ProteinGym

sprillo · 2025-03-07T08:28:25Z

Summary

This PR adds the SiteRM model from our NeurIPS 2024 paper Ultrafast classical phylogenetic method beats large protein language models on variant effect prediction . SiteRM is an independent-sites model which relies only on MSA information (like EVMutation) and yet does remarkably well.

SiteRM scores both the DMS zero shot substitutions and the clinical zero shot substitutions datasets. The results are as follows:

DMS zero shot substitutions

Clinical zero shot substitutions

Reproducing results

To reproduce the results above, just make sure to pip install cherryml. Then you can follow the standard pipeline, e.g. for DMS zero shot substitutions:

$ cd scripts/scoring_DMS_zero_shot/
$ time bash scoring_SiteRM_substitutions.sh
$ bash merge_all_scores.sh && bash performance_substitutions.sh
$ cd ../..
$ cat benchmarks/DMS_zero_shot/substitutions/AUC/Summary_performance_DMS_substitutions_AUC.csv

The model will be trained on the spot when running the script scoring_SiteRM_substitutions.sh. It takes ~2 hours to run on my Mac. The number of cores used to parallelize computation can be changed in the script scoring_SiteRM_substitutions.sh (the default is 8). The results reproduce to within +-0.001 for all metrics owing to random seed initialization.

Thanks for this great benchmarking resource!

Best,
Sebastian Prillo

pascalnotin · 2025-03-09T02:40:46Z

Hi Sebastian -- congrats again on the paper! I was able to reproduce results, and just merged into main. Thank you for the PR!
Pascal

Add SiteRM model

f9610e7

pascalnotin merged commit 445722b into OATML-Markslab:main Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add SiteRM model#73

Add SiteRM model#73
pascalnotin merged 1 commit intoOATML-Markslab:mainfrom
sprillo:siterm

sprillo commented Mar 7, 2025

Uh oh!

pascalnotin commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

sprillo commented Mar 7, 2025

Summary

DMS zero shot substitutions

Clinical zero shot substitutions

Reproducing results

Uh oh!

pascalnotin commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants