Some of the operations have scalar parameters, e.g. linear, gemm's alpha and beta.
These are passed as float numbers. I think we should specify in the processing algorithm that they get casted to match with the input type. So that if input is float16, they also get downcast to float16.
Background is - in some cases, CoreML requires the parameter types to match. So we either downcast the scalar params to float16, or upcast input operands to float32, the latter is less ideal because on CoreML only float16 get executed on NPU.
Any concerns with adding this step in the algorithm?