cmdr2's notes

Added support for float16 ADD/SUB/MUL/DIV operations in the CUDA backend of ggml. Also fixed the CPU implementation of these operations in float16 to work with repeating tensors, and added test cases. PR: https://github.com/ggml-org/ggml/pull/1121

Discussed making ggml-cpu.c into a C++ file, so that we can use function templates to de-duplicate a huge amount of code in that file.

Also worked on adding float16 support (in CUDA and CPU) for a number of unary operators, like SQRT, RELU, GELU, SIGMOID, LOG, COS, CLAMP etc. It seems to be passing the tests, so will propose this as a PR soon.