Skip to content

Improve neighbour-joining tree performance#641

Merged
alimanfoo merged 12 commits intomasterfrom
njt-refactor-2024-10-15-alimanfoo
Oct 17, 2024
Merged

Improve neighbour-joining tree performance#641
alimanfoo merged 12 commits intomasterfrom
njt-refactor-2024-10-15-alimanfoo

Conversation

@alimanfoo
Copy link
Copy Markdown
Member

@alimanfoo alimanfoo commented Oct 15, 2024

  • Use anjl instead of biotite to build neighbour-joining trees, which is capable of building trees for 20,000+ samples in around 5 minutes. Resolves Neighbour-joining tree performance #627.
  • Cache the results of tree construction to avoid repeated computation.
  • Use anjl for tree plotting.
  • Also work towards Anopheles refactor #366.
  • Along the way, fix issues encountered when plotting and data includes missing values.

  • Create a new module malariagen_data.anoph.distance.
  • Add a class AnophelesDistanceAnalysis.
  • Move the public method biallelic_pairwise_distances() and associated private methods from the AnophelesDataResource class to the new AnophelesDistanceAnalysis class.
  • Add a public method njt() to the new class which returns an array Z and which also saves results to the results cache.
  • Add a private method _njt() which is called from within njt() to run the actual neighbour-joining if the result is not cached, calling biallelic_pairwise_distances() internally to obtain a distance matrix then anjl.rapid_nj() to compute a tree.
  • Add a public method plot_njt() to the new class which calls njt() first to obtain a tree then calls anjl.plot() to create a plot.
  • Remove the plot_njt() method from the AnophelesDataResource class.
  • Add AnophelesDistanceAnalysis as a parent class of AnophelesDataResource.
  • Add tests to get coverage of the new AnophelesDistanceAnalysis class.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@alimanfoo alimanfoo mentioned this pull request Oct 15, 2024
24 tasks
@alimanfoo alimanfoo requested a review from leehart October 15, 2024 23:09
Copy link
Copy Markdown
Collaborator

@leehart leehart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks @alimanfoo 👍

@alimanfoo alimanfoo merged commit 319df58 into master Oct 17, 2024
@alimanfoo alimanfoo deleted the njt-refactor-2024-10-15-alimanfoo branch October 17, 2024 22:10
@alimanfoo alimanfoo added the BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027). label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Neighbour-joining tree performance

2 participants