Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

clean CP implementation for flash attention and cuDNN 9.6
#1387 opened Dec 30, 2024 by xrennvidia Loading…
8 of 13 tasks
Update README.rst
#1385 opened Dec 23, 2024 by sbhavani Loading…
1 of 6 tasks
bug fix for using return_layernorm_output=True
#1382 opened Dec 20, 2024 by LiyuanLucasLiu Loading…
8 tasks done
Don't touch nor send messages to the root logger.
#1380 opened Dec 19, 2024 by sagostinho-nvidia Loading…
4 of 13 tasks
[MoE][PyTorch] Add mask-based MoE permutation
#1373 opened Dec 13, 2024 by hxbai Loading…
8 of 13 tasks
Add paged attention support
#1355 opened Dec 4, 2024 by cyanguwa Loading…
8 of 13 tasks
[PyTorch] Adding TP overlap support for te.Linear with parallel_mode="column" 1.14.0 enhancement New feature or request
#1343 opened Nov 20, 2024 by denera Loading…
8 of 13 tasks
[PyTorch] Bugfix for wgrad bulk overlap conflict when dgrad overlap is reduce-scatter bug Something isn't working
#1341 opened Nov 18, 2024 by denera Loading…
6 of 13 tasks
[C/JAX] Comm+GEMM Overlap API for TE/JAX enhancement New feature or request jax
#1337 opened Nov 15, 2024 by denera Draft
3 of 13 tasks
[COMMON/JAX] Support sliding window on THD format
#1327 opened Nov 11, 2024 by zlsh80826 Loading…
6 of 13 tasks
Build with uv instead of just pip
#1324 opened Nov 8, 2024 by jennifgcrl Loading…
5 of 13 tasks
TP communication overlap: enable the overlap between GEMM chunk at Ho…
#1311 opened Nov 4, 2024 by erhoo82 Loading…
1 of 13 tasks
[PyTorch] Add heuristics for intializing FP8 params enhancement New feature or request
#1300 opened Oct 30, 2024 by timmoon10 Loading…
8 of 13 tasks
Offloading example
#1299 opened Oct 29, 2024 by sanandaraj5597 Loading…
[PyTorch] Fix autocast deprecation warnings
#1277 opened Oct 21, 2024 by yaox12 Loading…
13 tasks
attention_mask fill with -inf for UnfusedDotProductAttention
#1268 opened Oct 18, 2024 by Agoniii Loading…
1 of 13 tasks
Draft: reduce cudagraph mem via preoallcations
#1253 opened Oct 15, 2024 by JimmyZhang12 Loading…
13 tasks
fused out correction in CP
#1248 opened Oct 14, 2024 by xiaoyao0115 Loading…
12 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234 opened Oct 9, 2024 by buptzyb Loading…
5 of 13 tasks
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.