This repository implements Multi-Context Principal Component Analysis (MCPCA), a method for analyzing multiple datasets (“contexts”) that share the same features. MCPCA decomposes context-dependent covariance structure into components that are shared across subsets of contexts.
From the repository root, install the package into your current Python environment (Python >= 3.10):
pip install .This installs the package as mcpca so it can be imported directly.
For development (recommended):
pip install -e .Given
The matrix
contains the multi-context principal components (MCPCs). Each column of A
is a feature-space direction shared across all contexts.
Unlike standard PCA:
- components are learned jointly across contexts
- components are not required to be orthogonal
The matrix
encodes how strongly each shared component appears in each context
Interpretation:
- rows correspond to contexts
- columns correspond to shared components
B[i, j]measures the strength of componentjin contexti
After fitting the model, the components are available as:
A = model.components_ # shared components (A)
B = model.loadings_ # context loadings (B)The main sklearn-style entry point is mcpca.MCPCA. You pass in a list of context datasets, each a numpy array, fit the model, then optionally transform the data and visualize context loadings.
A synthetic data illustration is available at tests/test.py .
The data in each context should be a 2D NumPy array of shape
import numpy as np
from mcpca import MCPCA
# Example: two contexts
X_list = [X1, X2, X3]model = MCPCA(
n_components=None, # or an integer rank r
rank_range=[1, 2, 3, 4],
n_seed_pairs=5,
cos_sim_threshold=0.9,
random_state=0,
)
model.fit(X_list)
# Shared components A (multi-context PCs)
A = model.components_ # shape (p_features, r)
# Context loadings B
B = model.loadings_ # shape (k_contexts, r)Here:
Acontains the shared directions (MCPCs) across contexts.Btells you how strongly each context loads onto each component.
If n_components=None, the rank is chosen automatically from rank_range via choose_rank in MCPCA_core.
Use transform to project each sample in each context into the MCPC space:
Z_list = model.transform(X_list)
Z1, Z2, Z3 = Z_list # each has shape (N_i, r)Setting return_list=False instead concatenates all Z_i vertically and returns the concatenated array and context indices:
Z_all, ctx = model.transform(X_list, return_list=False)You can also call fit_transform in one step:
Z_list = model.fit_transform(X_list)To visualize context loadings during fitting:
model.fit(X_list, plot_B=True)When plot_B=True, the algorithm will produce a heatmap of the context loadings matrix B, which helps visualize how different contexts load onto the shared components.
If you prefer, you can also take B = model.loadings_ and create custom plots using Matplotlib or Seaborn.
Similarly, setting plot_A = True visualizes the shared component matrix A. For custom plots, access A via model.components_ .
MCPCA can also be applied directly to covariance matrices rather than raw data. This mode is useful when only second-order statistics are available or when raw data cannot be stored or shared.
Suppose M_list is a list of covariance matrices, one per context, each of
shape (p, p):
import numpy as np
from mcpca import MCPCA_decompose
T = np.stack(M_list, axis=2) # shape (p, p, k)
A, B = MCPCA_decompose(T, r) Automatic rank selection from a specific rank range is also supported.
T = np.stack(M_list,axis = 2)
rank_range = [1,2,3,4]
r = choose_rank(T, rank_range = rank_range)
A,B = mcpca.MCPCA_decompose(T,r)If you use MCPCA in your research, please cite the associated paper:
@article{mcpca2025,
title={Multi-Context Principal Component Analysis},
author={Kexin Wang, Salil Bhate, Jo\~ao M. Pereira, Joe Kileel, Matylda Figlerowicz, Anna Seigal},
journal = {arXiv preprint arXiv:2601.15239},
year={2025}
}