DualAR Transformer Laboratory

This repo is a personal laboratory for training fully autoregressive text-audio multimodal models with the DualAR Transformer architecture. This architecture is most popularly used as the neural codec seq2seq backbone for:

Fish Speech TTS
Kyutai's Moshi model early in pretraining before adaptation to duplex audio.

Models trained here will be compatible with my DualAR fish-speech.rs inference engine.

Please do not expect anything here to be usable currently. Full documentation will come once an early artifact is good enough to release.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
dataset		dataset
model		model
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DualAR Transformer Laboratory

About

Releases

Packages

Languages

License

EndlessReform/dual-ar

Folders and files

Latest commit

History

Repository files navigation

DualAR Transformer Laboratory

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages