Tricks for LLM diversity
•1 min read
How to make outputs more diverse and creative
How to make outputs more diverse and creative
Less criticizing, more construction.
Why advice often fails
How do we turn superhuman AI capabilities into human skills?
Startup as RL problem
Bottom-up vs Top-down
FlashMLA, DeepEP, DeepGEMM, DualPipe, EPLB, 3FS and Smallpond
Basics of GPU programming with CUDA.
Token selection, post-hoc compression, and architectural redesigns including MLA, GQA, and MQA