PROJECT #1
Name: AI-Driven Portfolio Optimization & Risk Management Platform
One-sentence description
A real-time, reinforcement-learning–powered system that ingests market
ticks to construct, rebalance, and hedge portfolios with explainable risk
metrics—all on a fully open-source stack.
Use case & target user
Quant traders and asset managers at fintech startups seeking dynamic,
data-driven portfolio strategies and live risk monitoring without vendor lock-
in.
Tech stack
• Data ingestion & streaming: Apache Kafka, Apache Spark Structured
Streaming
• Historical time-series store: TimescaleDB (PostgreSQL extension)
• Feature store: Feast (using Redis backend)
• RL modeling & backtest: Python, PyTorch, Stable Baselines3 (PPO/DDPG),
Zipline backtester
• Hyperparameter sweeps & tracking: Optuna + MLflow Tracking +
TensorBoard
• Containerization & orchestration: Docker, upstream Kubernetes (k8s), Helm
• Real-time inference: KServe on k8s
• Object storage: MinIO (S3-compatible)
• Infrastructure as code: Terraform (open source)
• Monitoring & dashboards: Prometheus, Grafana, Streamlit with SHAP plots
Advanced features
• Fully automated RL training pipeline with Optuna hyperparameter tuning
• Sub-second inference API for rebalancing signals via KServe
• Walk-forward backtester with live paper-trading feed
• Explainable-AI dashboard (SHAP) for portfolio driver attribution and VaR
• Self-healing k8s streaming pipelines with Kafka Connect dead-letter queues
• Cost-aware autoscaling using Kubernetes HPA + spot-instance scheduling
Resume-pitch bullet
“Built a 100% open-source RL portfolio optimizer with PyTorch, Kafka
streaming, TimescaleDB & KServe—automating rebalancing, backtests, and
SHAP-driven risk explainability, achieving a simulated 12% Sharpe uplift.”
PROJECT #2
Name: Intelligent Document Understanding & Automation Platform
One-sentence description
An open-source, active-learning system that ingests semi-structured
documents, extracts entities, classifies content, and auto-routes them into
business workflows.
Use case & target user
Insurance claims processors and legal teams needing scalable, accurate
triage and data extraction from PDFs and scans—without per-page fees.
Tech stack
• OCR & preprocessing: Tesseract (with OCRmyPDF), OpenCV
• NLP & transformers: Hugging Face Transformers (BERT/Longformer)
• Function orchestration: OpenFaaS (Lambda-style), Kong API Gateway
• Workflow pipelines: Argo Workflows or Kubeflow Pipelines (OSS)
• Storage & metadata: MinIO for docs, PostgreSQL for metadata
• Embedding store & similarity: FAISS on disk
• Active-learning UI: Label Studio (OSS) + React
• Model/version tracking & CI: MLflow, GitHub Actions, Terraform
• Monitoring & logging: Prometheus, Grafana, ELK stack (Elasticsearch /
Logstash / Kibana)
Advanced features
• End-to-end Argo/Kubeflow retraining pipeline triggered by new labeled
batches
• Explainable AI via LIME integrated in a Streamlit audit dashboard
• Automated labeling suggestions from FAISS k-NN embedding search
• MinIO bucket notifications driving continuous ingestion and dynamic
schema inferencing
• Canary roll-outs of new models via OpenFaaS and Kong traffic splitting
• Role-based access and audit logs using Keycloak + ELK
Resume-pitch bullet
“Developed a fully open-source document-AI pipeline with Tesseract, Hugging
Face, Argo Workflows & FAISS—reducing manual triage time by 70% and
achieving 95% extraction accuracy.”