dbGaP data validation tool. GaPTools is distributed as a docker image on Docker Hub. See GaPTools.md for more information about the tool.
You must have Docker installed and working to be able to run GaPTools. Docker is available on many different operating systems, including most modern Linux distributions, like CentOS, Debian, Ubuntu, etc. Follow the link below for more information about how to install Docker on your particular operating system.
To ensure that you can run GaPTools under your user account, run the below command and check for a response similar to the one below (your version and build numbers might be different than the ones below). The minimum supported Docker version to run GaPTools is 17.04.0.
docker -v
Docker version 19.03.6, build 369ce74a3c
GaPTools uses docker-compose to run multiple containers under a single service. Follow the link below for more details on how to install docker-compose.
The docker host running GaPTools requires access to the data files that need to be validated. The files can either be on a local file system, a network file share (NFS) or in a storage bucket on the cloud. If the files are on a network file share (NFS) or in a storage bucket on the cloud, they need to be mounted as file system on the docker host. Below are some tools that are commonly used to mount cloud storage buckets as file systems on linux servers
GaPTools requires port 8080 to be available on the host system running docker. Run the below command to check if port 8080 is available on the docker host. If the below command does not produce any output, then port 8080 is available on the docker host.
netstat -an | grep "8080"
Once all pre-requisites are met, follow the instructions below to setup GaPTools. The setup can be validated using a sample study that is included as part of GaPTools installation. The input files for the sample study are inside the input_files/1000_Genomes_Study/ directory of the cloned GaPTools GitHub repository.
For the sample study, we will have GaPTools generate all output files inside the output_files/1000_Genomes_Study/ directory.
git clone https://github.com/ncbi/gaptools
cd gaptools
mkdir -p output_files/1000_Genomes_Study
# Change file permissions to allow GaPTools to write output files on docker host
chmod -R o+w output_files
Once GaPTools is setup, to execute it on the included sample study, run the below script from inside the same directory where the GaPTools GitHub repository is cloned.
./dbgap-docker.bash -i ./input_files/1000_Genomes_Study/ -o ./output_files/1000_Genomes_Study -m ./input_files/1000_Genomes_Study/metadata.json up
GaPTools uses Apache Airflow behind the scenes as the workflow orchestrator to perform all the validation tasks. To view the validation results of the dbGaP validation tool, browse to the following URL:
http://<your_docker_host_ip>:8080
At the end of the workflow, the output files will be created under the specified output directory.
To use GaPTools for your study, modify the above command and pass as input parameters:
-i -- path to the input files for your study
-o -- path where output files should be generated
-m -- path to the manifest file for your study
Once your study is processed, run the below command to stop the GaPTools service.
./dbgap-docker.bash down
If you have any questions or to report any issues, please contact us at: [email protected]