Skip to content

Latest commit

 

History

History
43 lines (24 loc) · 2.32 KB

readme.md

File metadata and controls

43 lines (24 loc) · 2.32 KB

Applied AI repo

For experiments and research on Applied AI.

Projects

Kernels

Housing a variety of Triton and CUDA kernels for training and inference.

Inference kernels = no backward pass support.

Triton Kernels

1 - Triton - MoE (Mixtral) GEMM for accelerating inference. Uses col major access pattern to increase locality.

moe_gemm_a100

2 - Triton - Fused Softmax for both training and inference.

softmax_fused

3 - Triton - Fused RMSNorm for both training and inference.

Fused RMSNorm Kernel

Other projects from Applied AI

  1. CUDA Mode - Reading group for learning CUDA programming - (Discord, Lecture Materials, Lecture recordings)
  2. llama-recipes - Recipes for fine-tuning and inference for Llama model series
  3. NeurIPS'23 LLM Efficiency Challenge - 1LLM + 1GPU + 1Day competition - (website, code, NeurIPS Workshop recordings)

Papers and Publications

  1. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation paper
  2. Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK Work Decomposition paper
  3. PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel paper
  4. Sustainable AI: Environmental Implications, Challenges and Opportunities paper

License

The applied-ai repo is released under the BSD 3 license.