It would be interesting to be able to see the PCA fields after preprocessing, to see what space the clusters are actually fitting to.
If the PCA fields were attached to the dataset, it should be possible to add the conditional logic to prevent another preprocessing run on the same dataset for predict_proba().