Advanced Walkthrough

Overview

The purpose of this walkthrough will be to illustrate an execution gNALI to find high-confidence loss-of-function variants of genes using additional optional parameters.

Input Data

The input file should contain a list of genes as HGNC symbols, separated by a newline character. Genes in non-standard formats will not be analyzed.

The example input file located in the following location:

inputs/genes.txt

The example input file contains the following contents and can be found here:

> cat inputs/genes.txt
CCR5
ALCAM

Running gNALI

gNALI only requires an input file, and has other optional parameters. We will use an input file with the path inputs/genes.txt and query gnomADv2.1.1. Since gnomADv2.1.1 is the default database, we do not have to specify it with the -d/--database parameter. To see what predefined filters are available with this database, we can use the help command (more info here). Available additional filters can be found here.

Next, we will filter the variants keeping only those with a non-zero number of homozygous control samples (available as a predefined filter) and alternate allele count greater than 2 (by specifying an additional filter). We will also generate a VCF file for variants passing filtering, generate population frequencies, and write our results to a directory called output-advanced.

Here is what such a command would look like:

gnali
    --input inputs/genes.txt
    --predefined_filters homozygous-controls
    --additional_filters "AC>2"
    --vcf
    --pop_freqs
    --output output-advanced/

Output

By default, gNALI will have two output files in output-advanced/: a basic output file, and a detailed output file. When using the -v/--vcf flag, a third additional output file will be generated. More information on outputs can be found here.

Output files for this example can be found here.

Basic Output

The basic output file contains a subset of the input genes, the ones that have high-confidence loss-of-function variants that pass filtering. It also contains a list of genes that could not be found in the specified database, if any.

The file shown below can also be found here.

output-advanced/Nonessential_Host_Genes_\(Basic\).txt:
HGNC_Symbol Status
CCR5 HC LoF found
ALCAM HC LoF found, failed filtering

Detailed Output

The detailed output file contains the high-confidence loss-of-function variants that pass filtering with some annotations extracted. Since we are using the --pop_freqs flag, we will also have the population frequency data added.

The file shown below can also be found here.

output-advanced/Nonessential_Host_Genes_\(Detailed\).txt:
Chromosome Position_Start RSID Reference_Allele Alternate_Allele Score Quality LoF_Variant LoF_Annotation HGNC_Symbol Ensembl Code HGVSc african-AC african-AN african-AF ashkenazi-jewish-AC ashkenazi-jewish-AN ashkenazi-jewish-AF european-non-finnish-AC european-non-finnish-AN european-non-finnish-AF finnish-AC finnish-AN finnish-AF south-asian-AC south-asian-AN south-asian-AF latino-AC latino-AN latino-AF east-asian-AC east-asian-AN east-asian-AF other-AC other-AF other-AN male-AC male-AF male-AN female-AC female-AN female-AF
3 46415066 rs146972949 C T 120238.89 PASS T stop_gained CCR5 ENSG00000160791 ENST00000292303.4:c.673C>T 23 16252 1.4152100000e-03 0 10016 0.0000000000e+00 8 113418 7.0535500000e-05 0 21590 0.0000000000e+00 0 30566 0.0000000000e+00 3 34516 8.6916200000e-05 0 18382 0.0000000000e+00 0 0.0000000000e+00 6108 14 1.0326900000e-04 135568 20 115280 1.7349100000e-04
3 46415066 rs146972949 C T 120238.89 PASS T stop_gained CCR5 ENSG00000160791 ENST00000343801.4:c.673C>T 23 16252 1.4152100000e-03 0 10016 0.0000000000e+00 8 113418 7.0535500000e-05 0 21590 0.0000000000e+00 0 30566 0.0000000000e+00 3 34516 8.6916200000e-05 0 18382 0.0000000000e+00 0 0.0000000000e+00 6108 14 1.0326900000e-04 135568 20 115280 1.7349100000e-04
3 46415066 rs146972949 C T 120238.89 PASS T stop_gained CCR5 ENSG00000160791 ENST00000445772.1:c.673C>T 23 16252 1.4152100000e-03 0 10016 0.0000000000e+00 8 113418 7.0535500000e-05 0 21590 0.0000000000e+00 0 30566 0.0000000000e+00 3 34516 8.6916200000e-05 0 18382 0.0000000000e+00 0 0.0000000000e+00 6108 14 1.0326900000e-04 135568 20 115280 1.7349100000e-04
3 46414943 rs775750898 TACAGTCAGTATCAATTCTGGAAGAATTTCCAG T 1947603.90 PASS - frameshift_variant CCR5 ENSG00000160791 ENST00000292303.4:c.554_585delGTCAGTATCAATTCTGGAAGAATTTCCAGACA 168 8706 1.9297000000e-02 35 290 1.2069000000e-01 1621 15392 1.0531400000e-01 478 3468 1.3783200000e-01 - - - 23 848 2.7122600000e-02 0 1558 0.0000000000e+00 102 9.3922700000e-02 1086 1289 7.3893600000e-02 17444 1138 13904 8.1847000000e-02
3 46414943 rs775750898 TACAGTCAGTATCAATTCTGGAAGAATTTCCAG T 1947603.90 PASS - frameshift_variant CCR5 ENSG00000160791 ENST00000343801.4:c.554_585delGTCAGTATCAATTCTGGAAGAATTTCCAGACA 168 8706 1.9297000000e-02 35 290 1.2069000000e-01 1621 15392 1.0531400000e-01 478 3468 1.3783200000e-01 - - - 23 848 2.7122600000e-02 0 1558 0.0000000000e+00 102 9.3922700000e-02 1086 1289 7.3893600000e-02 17444 1138 13904 8.1847000000e-02
3 46414943 rs775750898 TACAGTCAGTATCAATTCTGGAAGAATTTCCAG T 1947603.90 PASS - frameshift_variant CCR5 ENSG00000160791 ENST00000445772.1:c.554_585delGTCAGTATCAATTCTGGAAGAATTTCCAGACA 168 8706 1.9297000000e-02 35 290 1.2069000000e-01 1621 15392 1.0531400000e-01 478 3468 1.3783200000e-01 - - - 23 848 2.7122600000e-02 0 1558 0.0000000000e+00 102 9.3922700000e-02 1086 1289 7.3893600000e-02 17444 1138 13904 8.1847000000e-02

VCF Output

When using the -v/--vcf flag, variants passing filtering as well as headers from the database will be written to a VCF file.

The file shown below can also be found here.

> cat output-advanced/Nonessential_Gene_Variants.vcf

##fileformat=VCFv4.2
##hailversion=0.2.7-c860755b5da3
...
#CHROM  POS         ID          REF                                 ALT QUAL        FILTER  INFO
3       46415066    rs146972949 C                                   T   120238.89   PASS    AC=34;AN=250848;AF=1.35540e-04;...
3       46414943    rs775750898 TACAGTCAGTATCAATTCTGGAAGAATTTCCAG   T   1947603.90  PASS    AC=2427;AN=31348;AF=7.74212e-02;...