Iconoscope is an enterprise-grade AI platform designed to analyze video streams at scale for complex pattern recognition. This project moves beyond simple scripting to demonstrate a complete, end-to-end AI strategy for high-volume unstructured data.
The thesis is to analyze a video corpus for a specific cinematic trope—the use of presidential portraits—and quantitatively prove its correlation with director/studio data. This serves as a high-value use case to build and demonstrate an enterprise-wide AI governance framework, a mature MLOps lifecycle, and the delivery of statistically-backed, Generative AI (GenAI)-driven insights.
This architecture is built on the Azure AI Platform (Azure ML, Azure DevOps, Azure Cognitive Services) to showcase a robust, scalable, and Responsible AI implementation.
This project implements a complete AI model lifecycle, emphasizing security, governance, and traceability from development to production.
- Platform Strategy: Utilizes a cloud-native Azure stack. Azure ML Studio serves as the central hub for experimentation, training, and model registry. Azure Cognitive Services (Computer Vision) is used for rapid baseline modeling.
- MLOps Lifecycle: Implements a complete MLOps design using Azure DevOps (CI/CD) and MLflow. This automates model versioning, validation, retraining pipelines, and governed deployment.
- Responsible & Ethical AI: The Azure ML Responsible AI Dashboard is leveraged to profile custom-trained models, explicitly focusing on bias mitigation and model transparency before deployment.
| Category | Technology | Purpose & Keywords |
|---|---|---|
| AI Strategy & Governance | AI Governance Frameworks, Responsible AI | Bias Mitigation, Transparency, Lifecycle Management |
| Cloud AI Platform | Azure ML, Azure Cognitive Services | Enterprise ML, Model Training, Registry, Deployment |
| Computer Vision | OpenCV, PyTorch, YOLO, Azure Computer Vision | Video Preprocessing, Custom Object Detection |
| Data & Statistical Analysis | Pandas, Scikit-learn, SQL | Quantitative Thesis Validation, Correlation Analysis |
| Generative AI & LLM | GenAI, RAG Architecture, LangChain | Statistical Report Synthesis, Insight Generation |
| MLOps & CI/CD | Azure DevOps, MLflow, GitHub Actions, Docker | Model Versioning, Retraining, Monitoring, CI/CD |
| Serverless & API | Azure Functions, FastAPI | Event-Driven Triggers, High-Performance Inference API |
| Code & Environment | Python, Poetry, SonarQube | Dependency Management, Code Quality, Security |
This project's workflow is designed to be fully auditable and statistically robust, separating detection, analysis, and reporting into distinct, governed stages.
- Goal: To establish a fully automated, auditable, and secure CI/CD pipeline for the AI model.
- Implementation: An Azure DevOps pipeline that triggers on a
git pushto:- Run SonarQube for static code analysis and vulnerability scanning.
- Execute unit tests and package the code.
- Trigger an Azure ML training job, versioning the model in the MLflow-backed registry.
- Deploy the new model to a governed endpoint.
- Goal: To design, train, and deploy a high-performance custom object detector for the "presidential portrait" trope.
- Implementation:
- Baseline: An initial model using Azure Computer Vision (Cognitive Services) to rapidly detect "person" and "painting" (a "Dataiku-like" rapid-value approach).
- Custom Model: A fine-tuned YOLO or PyTorch model trained in Azure ML Studio on a custom dataset.
- Governance: The model is evaluated using the Azure ML Responsible AI Dashboard to identify and mitigate potential biases (e.g., ensuring detection works across different film eras/lighting) before it is approved for deployment.
- Goal: To move beyond "hand-waving" and quantitatively prove the thesis with a dedicated statistical layer, before involving GenAI.
- Implementation:
- Data Collection: The CV model (Feature 2) detects a portrait and writes structured data (e.g.,
video_id,timestamp,detected_president) to an Azure SQL Database. - Data Enrichment (RAG): A LangChain agent enriches this record, pulling film metadata (director, studio, political leaning) from a search index (Azure Cognitive Search) and joining it in the database.
- Statistical Proof: A scheduled Azure Function runs a Pandas and Scikit-learn script to perform correlation analysis (e.g., Chi-squared test) on the enriched data.
- The Output: This layer outputs a verifiable JSON object (e.g.,
{ "correlation": 0.85, "p_value": 0.002, "significance": "High" }), which constitutes the proof.
- Data Collection: The CV model (Feature 2) detects a portrait and writes structured data (e.g.,
- Goal: To use GenAI for its correct purpose: synthesizing complex, pre-validated statistical findings into a human-readable executive summary.
- Implementation:
- The verifiable JSON proof from Feature 3 is passed to an Azure OpenAI (GenAI) endpoint.
- The LLM is given a strict prompt to only synthesize the provided data (the proof and 2-3 examples) into a one-paragraph summary.
- The Result: A clear, concise, and—most importantly—defensible insight is generated, suitable for a business-level dashboard.