This is the model trained on shakespearian dataset using the decoder only transformer architecture utilizing pytorch framework to generate random shakespearian text. This project is for educational purpose and to get deep look into the inner workings of the transofrmer architecture which is used in GPT3.5 and other LLMs.
Model Specifications :
- Parameters : 408897
- Dataset trained on size : 1.06 Mb text file
- Context_length used for predictions in self attention block: 32
- Multi-Head Attention blocks: 16
- Layers : 8 (decoder blocks)
- Learning Rate: 0.02