Skip to content

[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang

Notifications You must be signed in to change notification settings

VITA-Group/READ-ME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Ruisi Cai1, Yeonju Ro1, Geon-Woo Kim1, Peihao Wang1, Babak Ehteshami Bejnordi2, Aditya Akella1, Zhangyang Wang1

1University of Texas at Austin, 2Qualcomm AI Research

Usage

The code is based on the Hugging Face Transformers repository. We modified src/transformers/model/modeling_llama.py to integrate the MoE-fication process.

The main scripts are located in the moefication directory. Start by running the preprocessing scripts, moefication/scripts/preprocess_1.sh and moefication/scripts/preprocess_2.sh, to generate experts. After preprocessing, train the model using moefication/scripts/train.sh.

Citation

If you find this useful, please cite the following paper:

@inproceedings{
cai2024textitreadme,
title={\${\textbackslash}textit\{Read-{ME}\}\$: Refactorizing {LLM}s as Router-Decoupled Mixture of Experts with System Co-Design},
author={Ruisi Cai and Yeonju Ro and Geon-Woo Kim and Peihao Wang and Babak Ehteshami Bejnordi and Aditya Akella and Zhangyang Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=i8JaxY7tDI}
}

About

[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published