Added support for float16
ADD/SUB/MUL/DIV operations in the CUDA backend of ggml. Also fixed the CPU implementation of these operations in float16 to work with repeating tensors, and added test cases. PR: https://github.com/ggml-org/ggml/pull/1121
Discussed making ggml-cpu.c into a C++ file, so that we can use function templates to de-duplicate a huge amount of code in that file.
Also worked on adding float16
support (in CUDA and CPU) for a number of unary operators, like SQRT
, RELU
, GELU
, SIGMOID
, LOG
, COS
, CLAMP
etc. It seems to be passing the tests, so will propose this as a PR soon.