Ruisi Cai1, Yeonju Ro1, Geon-Woo Kim1, Peihao Wang1, Babak Ehteshami Bejnordi2, Aditya Akella1, Zhangyang Wang1
1University of Texas at Austin, 2Qualcomm AI Research
The code is based on the Hugging Face Transformers repository. We modified src/transformers/model/modeling_llama.py
to integrate the MoE-fication process.
The main scripts are located in the moefication directory. Start by running the preprocessing scripts, moefication/scripts/preprocess_1.sh
and moefication/scripts/preprocess_2.sh
, to generate experts. After preprocessing, train the model using moefication/scripts/train.sh.
If you find this useful, please cite the following paper:
@inproceedings{
cai2024textitreadme,
title={\${\textbackslash}textit\{Read-{ME}\}\$: Refactorizing {LLM}s as Router-Decoupled Mixture of Experts with System Co-Design},
author={Ruisi Cai and Yeonju Ro and Geon-Woo Kim and Peihao Wang and Babak Ehteshami Bejnordi and Aditya Akella and Zhangyang Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=i8JaxY7tDI}
}