cmdr2's notes

Good tutorial for understanding the basics of CUDA: https://www.pyspur.dev/blog/introduction_cuda_programming. It also links to NVIDIA's simple tutorial.

Implemented a simple float16 addition kernel in CUDA at https://github.com/cmdr2/study/blob/main/ml/cuda/half_add.cu. Compile it using nvcc -o half_add half_add.cu.