EMU is a software for performing principal component analysis (PCA) in the presence of missingness for genetic datasets. EMU can handle both random and non-random missingness by modelling it directly through a truncated SVD approach. EMU uses binary PLINK files as input.
Please cite our paper in Bioinformatics
# Option 1: Build and install via PyPI
pip install emu-popgen
# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/emu.git
cd emu
pip install .
# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/emu.git
conda env create -f emu/environment.yml
conda activate emuYou can now run the program with the emu command.
If you run into issues with your installation on a HPC system, it could be due to a mismatch of CPU architectures between login and compute nodes (illegal instruction). You can try and remove every instance of the march=native compiler flag in the setup.py file which optimizes emu to your specific hardware setup. Another alternative is to use the uv package manager, where you can run emu in a temporary and isolated environment by simply adding uvx in front of the emu command.
# uv tool run example
uvx emu --bfile test --eig 2 --threads 64 --out test.emuProvide emu with the file prefix of the PLINK files.
# Check help message of the program
emu -h
# Model and extract 2 eigenvectors using the EM-PCA algorithm
emu --bfile test --eig 2 --threads 64 --out test.emu
# Use 2 eigenvectors for modelling but extract 10 eigenvectors
emu --bfile test --eig 2 --eig-out 10 --threads 64 --out test.emuVery memory-efficient variant of emu for large-scale datasets.
# Example run using '--mem' argument
emu --mem --bfile test -eig 2 -threads 64 -out test.emu.mem