A multi-provider RAG (Retrieval-Augmented Generation) system for automating teaching assistant tasks. Features hybrid search, cross-encoder reranking, RAGAS evaluation, and support for both cloud and local LLMs.
- OpenAI: GPT-4o, GPT-4o-mini, GPT-3.5-turbo
- Anthropic: Claude 3.5 Sonnet, Claude 3.5 Haiku
- Google: Gemini 1.5 Pro, Gemini 1.5 Flash
- Cohere: Command R+, Command R
- Ollama (Local): Llama 3.1, Mistral, Mixtral, Phi-3, Qwen, DeepSeek
- ChromaDB for persistent vector storage with automatic caching
- Hybrid Search: BM25 + semantic search with configurable weights
- Cross-Encoder Reranking using ms-marco-MiniLM for improved relevance
- Query Expansion for better retrieval coverage
- Source Citations with confidence scores
- OpenAI (text-embedding-3-large/small)
- Cohere (embed-english-v3.0, embed-multilingual-v3.0)
- Ollama (nomic-embed-text, mxbai-embed-large)
- HuggingFace (BGE, MPNet)
- FastEmbed (optimized local embeddings)
- Faithfulness scoring
- Answer relevancy metrics
- Context precision/recall/relevancy
- Automated evaluation reports
- Piazza integration for course Q&A management
- Token usage and cost tracking
- Modern Streamlit web interface
- Comprehensive test suite
- Python 3.10 or higher
- Poetry (Python package manager)
- At least one LLM provider API key (or Ollama for local models)
-
Clone the repository:
git clone https://github.com/gr8monk3ys/TAlker.git cd TAlker -
Set up environment variables:
cp .env.example .env
Edit
.envand add your API keys:# At least one of these is required OPENAI_API_KEY=sk-your-key ANTHROPIC_API_KEY=sk-ant-your-key GOOGLE_API_KEY=your-key COHERE_API_KEY=your-key # Optional: Piazza integration PIAZZA_EMAIL=your-email PIAZZA_PASSWORD=your-password PIAZZA_COURSE_ID=your-course-id
-
Install dependencies:
make setup
-
(Optional) For local models, install Ollama:
# macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Pull recommended models ollama pull llama3.1:8b ollama pull nomic-embed-text
-
Start the application:
make run
-
Access the web interface at
http://localhost:8501 -
Navigate to:
- Upload: Add course materials (PDF, TXT, CSV, MD)
- Test: Chat with the RAG system
- Analysis: View course analytics
- Evaluation: Run RAGAS evaluation
- Settings: Configure providers and parameters
TAlker/
├── src/
│ ├── dashboard/ # Streamlit web interface
│ │ ├── Home.py # Main application
│ │ ├── llm.py # RAG pipeline implementation
│ │ ├── providers.py # Multi-provider LLM/embedding support
│ │ ├── evaluation.py # RAGAS evaluation framework
│ │ └── pages/
│ │ ├── 1_Upload.py # Document upload
│ │ ├── 2_Test.py # Chat interface
│ │ ├── 3_Analysis.py # Analytics dashboard
│ │ ├── 4_Evaluation.py # RAGAS evaluation UI
│ │ └── 5_Settings.py # Provider configuration
│ └── piazza_bot/ # Piazza API integration
│ ├── bot.py # Piazza bot logic
│ ├── profile.py # User profiles
│ └── responses.py # Response types
├── tests/ # Test suite
│ ├── test_llm.py # RAG pipeline tests
│ ├── test_evaluation.py # Evaluation tests
│ └── test_providers.py # Provider tests
├── data/ # Document storage
├── pyproject.toml # Dependencies
└── Makefile # Development commands
| Command | Description |
|---|---|
make setup |
Install Poetry and dependencies |
make run |
Start the Streamlit application |
make dev |
Start with hot reload |
make test |
Run test suite |
make test-cov |
Run tests with coverage |
make lint |
Run linter |
make format |
Format code with Black |
make check |
Run all code quality checks |
make evaluate |
Run RAGAS evaluation |
make rebuild-index |
Rebuild vector index |
make clean |
Clean cache files |
Configure in Settings page or via environment variables:
| Parameter | Default | Description |
|---|---|---|
LLM_MODEL |
gpt-4o | LLM model to use |
EMBEDDING_MODEL |
text-embedding-3-large | Embedding model |
CHUNK_SIZE |
1000 | Document chunk size |
CHUNK_OVERLAP |
200 | Overlap between chunks |
INITIAL_K |
20 | Documents to retrieve |
FINAL_K |
5 | Documents after reranking |
BM25_WEIGHT |
0.3 | Weight for keyword search |
For fully offline operation:
-
Install Ollama and pull models:
ollama pull llama3.1:8b ollama pull nomic-embed-text
-
In Settings, select:
- LLM:
llama3.1:8b(Ollama) - Embeddings:
nomic-embed-text(Ollama)
- LLM:
-
No API keys required!
# Install dev dependencies
make install-dev
# Run tests
make test
# Format and lint
make format && make lint
# Full check before commit
make all- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Run tests and linting (
make all) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the GNU License - see the LICENSE file for details.