MLPClassifier could have a transform method, which returns a n_samples x n_units, where n_units is the number of hidden units in the last layer (before the output layer). This would allow to re-use the features learned by the network in another estimator.