Skip to content

Allele dosages in VCFs count REF allele, _should_ count ALT to match header, marker-effects #3906

@wolfemd

Description

@wolfemd

I have noticed a possible bug with allele dosages on Cassavabase.

I am looking at data in the:
Gentoyping Protocol: IITA DArT-GBS 08 Aug 2021

Allele dosages in downloaded data appear to count the REF allele.
According to the header of the VCF:
##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + P(AA)]">

However, DS field in the VCF, which I downloaded using the Wizard, as well as corresponding tab-separated dosage file both appear to count the REF allele. Can someone confirm this? I could be crazy :) ? Or it could only be this dataset.

I think the DS field is calculated by Cassavabase, is that correct? I guess o because the source VCF for the genotyping protocol (which I provided) does not have the DS field. I very much appreciate that the DB computes dosages and adds them to the VCF! Just need to tweak it.

My use case / why this matters:
I am doing predictions that use the phased haplotypes extracted from the VCF file. I need the genomic-predicted marker-effects (computed using the dosages as predictors in an RR-BLUP model) to measure the effect of the "1" allele in the VCF (the ALT allele). To ensure this, in the past, I have used the haplotypes extracted from the VCF to manually compute my own dosages (summing the two haplotypes for each individual).

Related to: #3853

Hope this makes sense, happy to clarify! Thanks!

Metadata

Metadata

Labels

Type: BugIssue describes a bug.Type: ImplementationIssue proposes a non-feature change to implementation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions