-
Notifications
You must be signed in to change notification settings - Fork 58
Closed
Description
Hi,
I would like to use Truvari to merge SVs deconstructed from a pangenome graph. The VCF usually has many records with a lot of similar alleles (e.g., only 1 bp difference). An example from HPRC VCF is given below:
chr1 591437 >26051>26006 TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA T,TAGAAGGAATAAGACCGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGCGCGGTGGCTCACGCCTGGAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACCGCACTCCAGCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAAA,TAGAAGGAATAAGACGGGCCGGGTGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAGGTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCATGGTGGTGGGCGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGAGCGACAGAGTGAGACTCTGTCTCAAAAAAAAAAAAAAAAAA 60 . AC=36,4,1,1,1,1,25,9;AF=0.461538,0.0512821,0.0128205,0.0128205,0.0128205,0.0128205,0.320513,0.115385;AN=78;AT=>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26006,>26051>26050<26049>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26046>26044>26042>26041>26040>26038>26036>26035>26033>26032>26030>26029>26028>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26050>26048>26047>26045>26044>26043>26041>26039>26038>26036>26035>26034>26032>26031>26029>26027>26026>26024>26023>26022>26020>26019>26017>26016>26014>26013>26011>26010<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26037>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26025>26023>26021>26020>26018>26017>26015>26014>26012>26011>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26009<26008>26007>26006,>26051>26050>26048>26047>26045>26044>26042>26041>26039>26038>26036>26035>26033>26032>26030>26029>26027>26026>26024>26023>26021>26020>26018>26017>26015>26014>26012>26011<26008>26007>26006;NS=43;LV=0 GT 1 .|. 8|8 1|7 7|7 2|7 2|2 7|7 7|7 .|1 .|1 7|. 3|1 7|. 7|1 1|2 1|1 7|7 7|7 7|7 1|7 5|8 1|7 1|8 8|7 .|7 8|8 1|. 1|1 7|1 1|1 1|1 7|7 4|1 1|1 .|8 1|8 1|1 .|. 1|1 1|1 1|1 1|1 1|7 6|1
Because all the records in the VCF are non-overlapping, I expect that most of the redundant SVs are in the same multi-allelic record. Therefore, I would like to collapse alleles within a multi-allelic record.
I understand that Truvari don't process all alleles in a multi-allelic record. But if I assign a unique group ID for each multi-allelic record (e.g., INFO/GROUP) and split it into bi-allelic records, would it possible for Truvari to only compare SVs within the same group and then collapse them as usual?
Thanks a lot!
Metadata
Metadata
Assignees
Labels
No labels