GitHub - SuccessSoham/Data-Analytics-Agent: Data Analytics and Visualization Agent

📊 Multi-Agent Data Analytics & RAG System 🤖 Welcome to the Multi-Agent Data Analytics & RAG System! This project provides an intelligent platform for analyzing both structured and unstructured data using a powerful multi-agent architecture, Retrieval-Augmented Generation (RAG), and Google's Gemini models.

✨ Features 📈 Data Analysis: Summarize datasets, extract key insights, and assess data quality.

📊 Dynamic Visualizations: Generate interactive charts (bar, line, scatter, pie, histogram, heatmap, etc.) based on natural language queries.

🧠 Intelligent Q&A: Leverage Google's Gemini 1.5 Flash LLM for comprehensive answers grounded in your data.

📚 Retrieval-Augmented Generation (RAG): Enhance LLM responses by retrieving relevant information from your uploaded documents using LangChain and ChromaDB.

📄 Multi-format File Support: Process CSV, XLSX, PDF, DOCX, and TXT files.

🗣️ Conversational Interface: Interact with the system using natural language via a Streamlit chat interface.

🚀 Technology Stack This project is built primarily with Python and leverages the following key technologies:

Framework:

Streamlit: For building the interactive web application interface (app.py).

Large Language Models (LLM) & Embeddings:

Google Gemini 1.5 Flash: The core LLM for generating responses, insights, and explanations (agents/gemini_agent.py).

GoogleGenerativeAIEmbeddings: For creating vector embeddings of text.

RAG & Orchestration:

LangChain: Framework for building LLM applications, used for RAG pipeline orchestration, text splitting, and managing LLM/embedding integrations (dual_rag_agent_system.py).

ChromaDB: A lightweight vector database for storing and querying text embeddings for efficient information retrieval (dual_rag_agent_system.py).

Data Handling & Analysis:

Pandas: For structured data manipulation and analysis.

NumPy: For numerical operations.

Plotly: For creating interactive data visualizations (utils/chart_generator.py).

File Processing:

PyMuPDF (fitz): For PDF file processing.

python-docx: For DOCX file processing.

Custom Agents:

AgentCoordinator: Orchestrates the flow between different agents and RAG components (dual_rag_agent_system.py).

GeminiAgent: Handles direct interactions with the Gemini LLM (agents/gemini_agent.py).

DataAnalyticsAgent: Focuses on data summarization, insights, and quality assessment (agents/data_analytics_agent.py).

VisualizationAgent: Manages chart generation and explanation (agents/visualization_agent.py).

FileProcessor: Utility for handling various file uploads (utils/file_processor.py).

DataHandler: Utility for data preprocessing and summarization (utils/data_handler.py).

ChartGenerator: Utility for creating Plotly charts (utils/chart_generator.py).

🛠️ Setup & Installation To get this project up and running locally, follow these steps:

Clone the Repository:

git clone cd

Create a Virtual Environment (Recommended):

python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate

Install Dependencies: Create a pyproject.toml or requirements.txt file (as discussed in previous steps). If you have pyproject.toml, you can use uv or pip with pip install -e . or pip install -r requirements.txt.

Using pyproject.toml (recommended with uv): Ensure uv is installed (pip install uv).

uv pip install -e .

Or, if using requirements.txt:

pip install -r requirements.txt

Make sure all required libraries including google-generativeai, langchain-google-genai, langchain-community, chromadb, PyMuPDF, python-docx are listed.

Set up your Google Gemini API Key:

Obtain your API key from Google AI Studio.

Do NOT hardcode your API key in the code.

Create a .env file in the root directory of your project (same level as app.py) and add your API key:

GOOGLE_API_KEY="YOUR_ACTUAL_GOOGLE_API_KEY_HERE"

Alternatively, when deploying to Streamlit Cloud, you'll use Streamlit's built-in secrets management (as explained in previous steps).

In your app.py ensure you are loading the API key securely, e.g., using os.getenv("GOOGLE_API_KEY") or st.secrets.get("GOOGLE_API_KEY").

▶️ How to Run the Application Once setup is complete, you can run the Streamlit application:

streamlit run app.py

This will open the application in your default web browser (usually at http://localhost:8501).

🚀 How to Use Upload Your File: On the sidebar, use the file uploader to select your CSV, XLSX, PDF, DOCX, or TXT file.

Authenticate (if prompted): Enter your Google Gemini API key if you haven't set it up as an environment variable or secret.

Chat with the Agents: In the main chat interface, ask questions about your data.

For structured data (CSV/XLSX): You can ask for summaries, insights, specific data points, or request visualizations.

Example: "Summarize the dataset."

Example: "Show me the distribution of age."

Example: "Which strikers has overall rating more than 85? mention their name and overall rating."

Example: "Plot a bar chart of sales by region."

For unstructured data (PDF/DOCX/TXT): You can ask questions about the document's content. The RAG system will retrieve relevant information.

Example: "What is the main topic of this document?"

Example: "Summarize the key findings."

View Results: The system will provide textual responses and, for structured data, interactive charts.

🤝 Contributions Contributions are welcome! If you have suggestions or find issues, please open an issue or submit a pull request on the GitHub repository.

Enjoy exploring your data with the power of AI! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.devcontainer		.devcontainer
agents		agents
attached_assets		attached_assets
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
seed_database.py		seed_database.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages