Transformers-Labs is a comprehensive research-oriented repository for experimentation, benchmarking, and fine-tuning of transformer-based models. The project provides a unified platform for training, evaluation, quantization, benchmarking, and deployment of state-of-the-art models, with robust support for multimodal and distributed workflows. It is designed to facilitate reproducible research, scalable experimentation, and rapid prototyping for both academic and industrial use cases.
This repository exists to advance research in transformer architectures, quantization techniques, and large-scale model deployment. Key goals include:
- Benchmarking transformer models across hardware and cloud platforms
- Fine-tuning and evaluating models for NLP and multimodal tasks
- Experimenting with quantization (GPTQ, 4/8-bit) and efficient inference
- Integrating with cloud infrastructure (Azure, AWS SageMaker)
- Supporting multimodal research (video, text)
Capabilities include:
- Model training and evaluation pipelines
- Inference benchmarking (latency, throughput, accuracy)
- Quantization and deployment workflows
- Infrastructure-as-code for reproducible cloud setups
- Multimodal experimentation (video-llava)
model-train/– Jupyter notebooks and scripts for model training and fine-tuningmodel-eval/– Evaluation pipelines, metrics, and analysis notebooksinference-benchmark/– Scripts for benchmarking inference performanceoptimum-benchmark/– Advanced benchmarking using HuggingFace Optimumsagemaker-benchmark/,sagemaker-labs/– AWS SageMaker integration for distributed training and benchmarkingterraform/azure-workstation/– Terraform scripts for provisioning Azure GPU workstationsvideo-llava/– Multimodal (video+text) model experimentationAutoGPTQ/– GPTQ quantization, CUDA builds, and extension modulesmistral-common/– Utilities and shared code for Mistral modelsrequirements.txt,pyproject.toml– Python dependencies and environment configurationbenchmarks/,model-info/,model/– Model artifacts, configs, and benchmark results
- Transformer fine-tuning (BERT, T5, LLaMA, Mistral, etc.)
- GPTQ quantization (4/8-bit) via AutoGPTQ
- HuggingFace TRL integration for RLHF and advanced training
- SageMaker benchmarking and distributed training
- Azure infrastructure provisioning with Terraform
- CUDA-enabled PyTorch builds for efficient GPU utilization
- Multimodal research (video-llava)
- Inference benchmarking and reporting
- Environment management with Conda and .env files
- Code formatting and linting with Ruff
- CUDA Toolkit (>=11.x recommended)
- NVIDIA drivers (latest)
- GCC (>=9.x)
- pkg-config, libmysqlclient-dev (for some quantization/builds)
conda env create -f environment.yml
conda activate transformers-labsKey packages:
- torch
- transformers
- trl
- optimum
- auto-gptq
- langchain
- evaluate
- Store Hugging Face token and other secrets in
.env - Example:
HF_TOKEN=your_huggingface_token AWS_ACCESS_KEY_ID=... AZURE_SUBSCRIPTION_ID=...
- See
model-train/andmodel-eval/notebooks for training and evaluation workflows - Example:
# Train python model-train/train-gpt2.ipynb # Evaluate python model-eval/eval.ipynb
- Use scripts in
inference-benchmark/andoptimum-benchmark/ - Example:
python inference-benchmark/benchmark.py
- See
sagemaker-benchmark/andsagemaker-labs/for distributed training and benchmarking - Example:
python sagemaker-benchmark/run_benchmark.py
- Use Terraform scripts in
terraform/azure-workstation/ - Example:
cd terraform/azure-workstation terraform init terraform apply -auto-approve
- See
video-llava/for video+text model workflows
- Run benchmarking pipelines in
inference-benchmark/,optimum-benchmark/, andbenchmarks/ - Results are stored in CSV/JSON format for reproducibility
- Example:
python inference-benchmark/benchmark.py --model gpt2 --output results/gpt2_benchmark.csv
- Interpret results using provided analysis notebooks in
model-eval/
- Code formatting: Use Ruff (
ruff format ...) for linting and formatting - Jupyter/interactive workflow: Use
%load_ext autoreloadand%autoreload 2for live code reload - Debugging: Common issues include CUDA setup, missing drivers, and environment variables
- Use
.envfor secrets and tokens
- Extend to new transformer architectures (e.g., Mixtral, Phi-3)
- Larger scale distributed experiments (multi-node, multi-GPU)
- Advanced quantization and pruning strategies
- Multimodal fusion and cross-modal benchmarks
- Integration with additional cloud providers (GCP, OCI)
- Automated hyperparameter tuning and experiment tracking
- Fork the repository and submit pull requests
- Add new models, benchmarks, or infrastructure scripts
- Cite this work in academic publications
- See
CONTRIBUTING.mdfor guidelines
- Hugging Face Transformers
- PyTorch
- CUDA Toolkit
- AutoGPTQ
- TRL
- Optimum
- SageMaker
- Azure Machine Learning
This repository is licensed under the MIT License. See LICENSE for details.