Skip to content

Implement skani as fastani alternative#30

Merged
wwood merged 8 commits intowwood:mainfrom
AroneyS:implement-skani
Aug 5, 2023
Merged

Implement skani as fastani alternative#30
wwood merged 8 commits intowwood:mainfrom
AroneyS:implement-skani

Conversation

@AroneyS
Copy link
Copy Markdown
Collaborator

@AroneyS AroneyS commented Jul 14, 2023

Add skani as fastani alternative

  • Refactor fastani implementation to allow enumeration
    • Assert threshold > 1 (in new method?)
    • Move find_representatives and find_memberships back into clusterer.rs
    • Pass Clusterer through above functions so it needs only implement calculate_ani
    • Need get_threshold method?
  • Add skani as fastani alternative
    • Implement calculate_skani
  • Add to cli

@AroneyS AroneyS changed the title Implement skani Implement skani as fastani alternative Jul 19, 2023
Copy link
Copy Markdown
Owner

@wwood wwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I think that's roughly the idea

num_kmers: 1000,
kmer_length: 21,
},
&crate::skani::SkaniClusterer { threshold: 99.0 },
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher than for fastani. Skani gives ANI of >98% for all pairs measured.

@AroneyS
Copy link
Copy Markdown
Collaborator Author

AroneyS commented Jul 25, 2023

Few steps closer. Still need to add to cli.
Also may be faster to let skani handle the fileio in bulk, like in https://github.com/bluenote-1577/skani-lib-example/blob/main/src/main.rs. But then we would need independent logic for skani/fastani, since it would calculate all pairs up-front. Could add it to the initialise method for skani and then reference a look-up table in calculate_ani...

@wwood
Copy link
Copy Markdown
Owner

wwood commented Jul 26, 2023

Are you thinking per-disconnected component after the preclusterer? Or just in total? If the latter then no point in preclustering, I think.

@AroneyS
Copy link
Copy Markdown
Collaborator Author

AroneyS commented Jul 26, 2023

Not sure. Both are possibilities, though skani doesn't recommend comparing genomes with <82% ANI, so we would have to deal with that if we skip preclustering, right? Though it says "If the resulting aligned fraction for the two genomes is < 15%, no output is given.", so maybe <82% just doesn't give an answer, rather than giving an unreliable answer.

options: fastani, skani
fastani_min_aligned_threshold,
fastani_fraglen,
),
Preclusterer::Dashing { min_ani, threads } => match self.clusterer {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting a bit unwieldy. I tried a let preclusterer = match, but it doesn't work since they are different types. Should the Preclusterer/Clusterer enum's be defined using the underlying structs instead of dummy ones?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that here we have to define the behaviour for every combination of clusterer and preclusterer, right?

I think the answer is yes, well enum or dyn, up to you

@AroneyS
Copy link
Copy Markdown
Collaborator Author

AroneyS commented Jul 27, 2023

Also, I get this warning on compile: warning: the following packages contain code that will be rejected by a future version of Rust: buf_redux v0.8.4, partitions v0.2.4

@AroneyS
Copy link
Copy Markdown
Collaborator Author

AroneyS commented Jul 27, 2023

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

@AroneyS AroneyS requested a review from wwood July 27, 2023 23:15
@wwood
Copy link
Copy Markdown
Owner

wwood commented Jul 28, 2023

I didn't go through every line, but seems about good. I think you need to add skani to the conda yml, and can you enable runs on PR using

on: [push, pull_request]

in the actions yml please?

@wwood
Copy link
Copy Markdown
Owner

wwood commented Aug 5, 2023

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

You added that argument, so won't show up until docs are redployed from main/release.

@wwood wwood marked this pull request as ready for review August 5, 2023 22:12
@wwood wwood merged commit 1c3d905 into wwood:main Aug 5, 2023
@AroneyS AroneyS deleted the implement-skani branch August 6, 2023 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants