Hello all,
I’m looking for the best way to install NeMo on my SLURM cluster. The cluster currently has no GPUs—it’s CPU-only—but I plan to upgrade with GPUs later. In the meantime, I’d like to start experimenting with NeMo using small models (<1B parameters) since anything larger may not be practical to run and train.
I’ve been trying to install the NeMo framework on my SLURM cluster but haven’t had much success, even with a simple “Hello, World” program. I want to confirm whether my installation approach is correct.
I looked into using Docker, but my impression is that it’s better suited for a single-computer setup. Since I need NeMo installed across multiple nodes, I believe the pip installation method might be more appropriate. However, when trying to install nemo-curator via pip, I encountered an error.
For my setup, is there anything specific I need to do to make the pip installation work? Or should I explore Docker further despite using SLURM?
Thanks!