d69669e has broken DataParallel tests. Now that device_ids are explicitly sent to parallel_apply, this assert https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py#L30 gets triggered if inputs are not big enough to be on all devices (e.g. batch size of 20 on 8 GPUs gets chunked into 6*3+2, so that 8-th GPU is idle, and assert gets triggered).