ModernBERT for Apple Neural Engine

ModernBERT model optimized for Apple Neural Engine.

🏎️ 2.4 TFLOP/s (1024 token input; base model)

🔋 2.1 W of power (1024 token input; base model)

🤏 1 file for the model definition (a la nanoGPT)

Install

$ python -m venv env
$ . env/bin/activate
$ pip install -r requirements.txt

Convert to CoreML

$ python convert.py
$ python predict.py $path_to_model.mlpackage "The sky is [MASK]."

Compare to 🤗

Compare accuracy of models to HuggingFace's implementation.

# Compare the PyTorch model in model.py
$ python diff_torch.py
# Compare a converted CoreML model
$ python diff_coreml.py $path_to_model.mlpackage

A Note on Precision

The Neural Engine requires float16 weights and activations. Some computations can be performed in float32, but outlier activations can still severely degrade output predictions.

ModernBERT, like other modern decoder-only LLMs, exhibits outlier activations on the order of 20-30k. Without intervention these are enough to visibly impact the CoreML model's predictions on the Neural Engine.

To mitigate this, the model conversion process in this repo uses QuaRot/SpinQuant-style orthogonal rotations. This greatly improves the model's fidelity (as measured by the KL divergence). However token predictions will not exactly match a PyTorch model that does some/all computation in a higher precision (bfloat16, float32). Be sure to test for your use case.

Credits

Borrows heavily from:

Areas for Improvement

support longer sequence lengths (> 1024)
- alternative attention implementations (split einsum, efficient attention for longer sequence length)
generate/use SpinQuant matrices for improved outlier reduction
investigate PrefixQuant for improved outlier reduction
convert core model separately from heads to allow hot-swapping of different heads
pack short sequences into single prediction
support for heads beyond masked LM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModernBERT for Apple Neural Engine

Install

Convert to CoreML

Compare to 🤗

A Note on Precision

Credits

Areas for Improvement

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
convert.py		convert.py
diff_coreml.py		diff_coreml.py
diff_torch.py		diff_torch.py
model.py		model.py
predict.py		predict.py
requirements.txt		requirements.txt

smpanaro/ModernBERT-AppleNeuralEngine

Folders and files

Latest commit

History

Repository files navigation

ModernBERT for Apple Neural Engine

Install

Convert to CoreML

Compare to 🤗

A Note on Precision

Credits

Areas for Improvement

About

Topics

Resources

Stars

Watchers

Forks

Languages