MaxAndArgmax is redundant and serving a very questionable role.
First, its implementation(s) perform distinct max and arg-max steps, so there doesn't appear to be any sort of efficiency gain behind its combined-Op approach.
Second, there are already distinct Max and ArgMax Ops, but they're only used when an "uncanonicalization" rewrite replaces MaxAndArgmax with them. This is unnecessarily roundabout, especially for a something of questionable implementation value. If anything, this situation work the opposite way: i.e. replace Max and ArgMax nodes with a single MaxAndArgmax.
Anyway, it seems like all we need to do is copy over the MaxAndArgmax.[R_op, grad] implementations to their respective Ops.