Right now, apart from figuring out API changes around freezing parts of the graph, another problem is:
- specifying per-layer learning rates optionally.
This is a huge pain point in Torch, and is actually a common use case for many.
Cover this use-case properly and provide an example.