Skip to content

is it possible to collapse SVs within known groups #228

@Han-Cao

Description

@Han-Cao

Hi,

I would like to use Truvari to merge SVs deconstructed from a pangenome graph. The VCF usually has many records with a lot of similar alleles (e.g., only 1 bp difference). An example from HPRC VCF is given below:

chr1	591437	>26051>26006	TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA	T,TAGAAGGAATAAGACCGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGCGCGGTGGCTCACGCCTGGAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACCGCACTCCAGCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAA	60	.	AC=36,4,1,1,1,1,25,9;AF=0.461538,0.0512821,0.0128205,0.0128205,0.0128205,0.0128205,0.320513,0.115385;AN=78;AT=>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26006,>26051>26050<26049>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26046>26044>26042>26041>26040>26038>26036>26035>26033>26032>26030>26029>26028>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26050>26048>26047>26045>26044>26043>26041>26039>26038>26036>26035>26034>26032>26031>26029>26027>26026>26024>26023>26022>26020>26019>26017>26016>26014>26013>26011>26010<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26037>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26025>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26008>26007>26006;NS=43;LV=0	GT	1	.|.	8|8	1|7	7|7	2|7	2|2	7|7	7|7	.|1	.|1	7|.	3|1	7|.	7|1	1|2	1|1	7|7	7|7	7|7	1|7	5|8	1|7	1|8	8|7	.|7	8|8	1|.	1|1	7|1	1|1	1|1	7|7	4|1	1|1	.|8	1|8	1|1	.|.	1|1	1|1	1|1	1|1	1|7	6|1

Because all the records in the VCF are non-overlapping, I expect that most of the redundant SVs are in the same multi-allelic record. Therefore, I would like to collapse alleles within a multi-allelic record.

I understand that Truvari don't process all alleles in a multi-allelic record. But if I assign a unique group ID for each multi-allelic record (e.g., INFO/GROUP) and split it into bi-allelic records, would it possible for Truvari to only compare SVs within the same group and then collapse them as usual?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions