DualAR Transformer Laboratory

This repo is a personal laboratory for training fully autoregressive text-audio multimodal models with the DualAR Transformer architecture. This architecture is most popularly used as the neural codec seq2seq backbone for:

Fish Speech TTS
Kyutai's Moshi model early in pretraining before adaptation to duplex audio.

Models trained here will be compatible with my DualAR fish-speech.rs inference engine.

Please do not expect anything here to be usable currently. Full documentation will come once an early artifact is good enough to release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DualAR Transformer Laboratory

Files

README.md

Latest commit

History

README.md

File metadata and controls

DualAR Transformer Laboratory