This repository is a comprehensive resource for experimenting with DSPy—a framework for declarative language model pipelines. It includes code, Jupyter notebooks, runnable demos, tests, and documentation for evaluating, orchestrating, and optimizing LLM workflows. The materials support research and practical exploration of DSPy’s approach to efficient, scalable, and self-improving AI systems.
Traditional prompt engineering for LLMs is limited by manual trial-and-error, lack of reproducibility, and poor scalability. DSPy reframes LLM orchestration as a machine learning problem—using datasets, evaluation metrics, and hyperparameter optimization to automate and improve pipeline design.
Key Insights (YouTube walkthrough):
- Limitations of Prompt Engineering: Manual, brittle, hard to scale, and resource-intensive.
- DSPy’s ML Mindset: Treats LLM pipelines as trainable programs, enabling systematic evaluation, optimization, and reproducibility.
- Benefits:
- Higher efficiency and scalability.
- Lower resource requirements (fewer API calls, less manual tuning).
- Modular composition for building complex systems.
Explore the following system architecture diagrams for DSPy pipelines:
-
– Basic Company Analysis Pipeline (No Compilation):
This diagram shows a straightforward DSPy pipeline for company analysis using declarative modules and prompt engineering. The workflow processes inputs and generates outputs without automated optimization, illustrating the baseline approach before leveraging DSPy’s compilation features. -
– Compiled & Optimized Company Analysis Pipeline:
Here, the pipeline incorporates DSPy’s compilation and optimization capabilities. Prompts and module parameters are automatically tuned using datasets and evaluation metrics, resulting in improved accuracy and efficiency for company analysis tasks. -
– Comprehensive Multi-Stage Pipeline with Composable Compiled Programs:
This diagram presents an advanced DSPy workflow for company and market analysis. Multiple compiled and optimized modules are composed to handle diverse subtasks—such as fact retrieval, social media analysis, comparative evaluation, and scoring. The architecture demonstrates scalable composition, iterative improvement, and robust evaluation across the pipeline.
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines – Introduces DSPy’s core concepts and compilation approach.
- DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines – Explores evaluation and self-refinement in DSPy pipelines.
- notebooks/ – Jupyter notebooks with DSPy examples:
- Financial info retrieval
- Company valuation
- Market analysis
- RAG (Retrieval-Augmented Generation)
- Program composition and evaluation
- dspy/ – Core DSPy framework modules, primitives, adapters, and utilities.
- examples/ – Runnable Python demos for DSPy modules and pipelines.
- tests/ – Validation code for modules, signatures, and optimizers.
- Supporting configs:
requirements.txt,pyproject.toml,setup.py– Dependency and environment management.docs/– Documentation, diagrams, and API references.
- Python Environment
- Recommended: Python 3.8+
- Create a virtual environment:
python -m venv .venv source .venv/bin/activate
- Install Dependencies
- With pip:
pip install -r requirements.txt
- Or with Poetry:
poetry install
- With pip:
- Jupyter/Conda Setup (Optional for notebooks):
- Install Jupyter:
pip install jupyter
- Or use Conda:
conda create -n dspy python=3.8 conda activate dspy conda install jupyter
- Install Jupyter:
Run an Example Notebook:
jupyter notebook notebooks/company-valuation.ipynbBasic DSPy Pipeline (Python):
from dspy.modules import Predict, ChainOfThought, ReAct
# Example: Simple prediction module
predictor = Predict(signature="What is the capital of France?")
result = predictor.run()
print(result)- Modules:
Predict– Direct LLM calls for prediction tasks.ChainOfThought– Stepwise reasoning and intermediate outputs.ReAct– Combines reasoning and action for interactive tasks.
- Signatures & Dataset Orchestration:
- Declarative task definitions and automated dataset management.
- Evaluation Metrics Abstraction:
- Built-in support for accuracy, F1, custom metrics.
- Optimizers & Compilation:
- Automated tuning of pipeline parameters for improved performance.
- Initial Program:
- Simple retrieval using
Predict.
- Simple retrieval using
- Adding Evaluation Metrics:
- Integrate accuracy/F1 scoring for outputs.
- Introducing RAG:
- Use retrieval-augmented generation for better factuality.
- Compiling/Optimizing:
- Apply DSPy optimizers to tune pipeline and improve scores.
- Final Score Improvement:
- Demonstrate measurable gains in evaluation metrics.
- Manual Sample Validation:
- Review outputs for correctness and reliability.
DSPy enables composition of programs—building larger, more capable systems by integrating smaller modules (e.g., chaining Predict, ChainOfThought, and ReAct). This supports scalable, maintainable, and extensible LLM workflows.