Graph-Based Job Recommendation
System
Prepared by: Nasrin Sabet
1. Project Overview
This project implements a personalized and fair job recommendation system using a graph-
based approach. By integrating SBERT for semantic profiling, PyTorch Geometric for graph
learning, and Streamlit for deployment, the system provides top-N job suggestions along
with interpretability and fairness insights.
2. System Components
Below is a breakdown of the system's components by module:
• feature_extraction.py:
- Uses SBERT (Sentence-BERT) to embed resume and job descriptions into high-
dimensional semantic vectors.
- Output is used for initializing user/job/skill node features.
• build_graph.py:
- Loads CSVs ([Link], [Link], [Link]).
- Constructs a heterogeneous NetworkX graph: nodes = users, jobs, skills; edges = User–
Skill, Job–Skill.
- Graph is saved in .gpickle format for reuse.
• graph_gnn.py:
- Defines and trains a GNN model (GraphSAGE or LightGCN) using PyTorch Geometric.
- Learns node embeddings from the constructed graph.
- Stores embeddings in gnn_node_embeddings.pt.
• graph_recommender.py:
- Loads trained node embeddings.
- Computes cosine similarity between user and job embeddings.
- Returns ranked job recommendations per user (Top-N).
• explanation_builder.py:
- Generates explanation text for each recommendation.
- Example: 'User matches 82% of required job skills.'
• fairness_metrics.py:
- Audits fairness of recommendations using Fairlearn / AIF360.
- Outputs metrics: Demographic Parity (DP), Equalized Odds (EO), Disparate Impact (DI).
• ui_app.py:
- Implements a Streamlit interface where users can upload resumes and receive suggestions
interactively.
3. Datasets and Preprocessing
- Real and synthetic resumes ([Link] to [Link])
- Job postings from Kaggle and [Link]
- Skills mapping in [Link]
- Users metadata in [Link]
Preprocessing includes:
- SBERT embedding generation for text fields
- Graph construction via node and edge creation
- Node IDs are mapped and normalized for training
4. Results
- Final user and job embeddings stored as .npy/.pt
- Recommendations saved in gnn_recommendations.csv
- Fairness metrics (from fairness_report.txt):
- DP: 0.000 | EO: 0.000 | DI: 1.000 (Perfect fairness on education)
- All users receive interpretable job match reports
- Streamlit UI successfully demonstrates complete pipeline
5. Future Extensions
- Upgrade GNN to Relational GCN or HGT for better heterogeneous learning
- Implement temporal split (LinkSAGE-style) to simulate chronological training/testing
- Integrate SHAP or interpretable attention into explanation_builder
- Handle cold-start more robustly using GNP or TIMBRE
- Add support for few-shot user/job generalization