Multi-GPU operation and data / model / hybrid parallelism are planned and in development for Caffe. The purpose of the thread is to focus the conversation, since this has been asked here, there, and everywhere. There are several ways to approach parallelization, so feel free to discuss your own work to this end here.
Note that Caffe does work with multiple GPUs in a standalone fashion right now: you can train on one GPU while extracting features on another and so on.