SAIL@Princeton

Blogs of SAIL@Princeton

Blogs written by SAIL@Princeton members. They highlight key innovations, practical implications, and performance insights that go beyond what’s captured in academic papers.

Getting Memory-bound Kernels to Speed-of-Light

Leverage the GPU memory hierarchy to implement efficient reduction kernels.

Wentao Guo, Ted Zadouri, Tri Dao · Last updated on Jul 12, 2025