model: Add support for PhiMoE arch #11003
Open
+205
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PhiMoE
Overview
Phi-3.5-MoE is a lightweight, open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data.
The model supports multilingual and comes with 128K context length (in tokens).
The PhiMoE model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.
Mixtral
with the main difference of [Phi3LongRoPEScaledRotaryEmbedding
], where they are used to extend the context of the rotary embeddings. The query, key and values are fused, and the MLP's up and gate projection layers are also fused.LlamaTokenizer
], with additional tokens.License
MIT
Implementation details
The convert script reuses the
Phi3MiniModel
class as parameter names and long rope scaling logic is the same.The MOE branch is included in the phi3 model graph implementation with missing bias tensors.
It would be possible to merge phi3 and phimoe into a single arch, but I kept the spirit of separated moe arch as in granite recently. Also, since Microsoft introduced a dedicated architecture, it can evolve independently in the future.
Testing
full output
Check that phi3 is still working
full output
Links