Releases: smpanaro/more-ane-transformers
v0-2023-October-31
gpt2 model family, 2-4x faster compared to the prior release.
- gpt2 now uses KV caching for faster generation
- all models generate multiple tokens per second (to get the fastest speeds, see the instructions in SETUP.md)
- iOS 16+/macOS 13+ now required
gpt2-xl is split up into multiple files, per Github's restrictions. Download both parts and decompress them like so:
cat gpt2-xl.mlpackage.tar.gz.* | tar -xzvf -
v0-2023-May-29
pythia model family, up to the 2.8B variant. specifically:
- pythia-70m
- pythia-160m
- pythia-410m
- pythia-1b
- pythia-1.4b
- pythia-2.8b (ANE requires M2)
The larger models are split up into multiple files, per Github's restrictions. Download all the parts and decompress them like so:
cat pythia-1.4b.tar.gz.* | tar -xzvf -
and cat pythia-2.8b.tar.gz.* | tar -xzvf -
v0-2023-April-02
gpt2 model family, but much faster.
- all models generate text ~2x as fast
- xl model runs on Neural Engine now (so from 5.5s on CPU only → 450ms Neural Engine only)
- models are compiled + cached automatically on first run of generate. order of magnitude faster time to first token (1.5 minutes → 2.5s for xl) for the 2nd-nth runs.
note: gpt2-xl is too big for github, so you will need to download both files and then run the following command to join them back into a single zip: zip -F gpt2-xl-split.mlpackage.zip --out gpt2-xl.mlpackage.zip
note2: I accidentally made xl require macOS 13. think it might not need that, let me know if you want to try.
v0-2023-march-27
gpt2 model family (base, medium, large, xl) converted to CoreML
note: gpt2-xl is too big for github, so you will need to download both files and then run the following command to join them back into a single zip: zip -F gpt2-xl-split.mlpackage.zip --out gpt2-xl.mlpackage.zip