Magic constants #7

platers · 2024-03-04T17:26:54Z

platers
Mar 4, 2024

This is a very clean implementation, thanks for sharing it!

I have a few questions about some constants in the code. Why is the MLP init scaled by 0.5 instead of alpha?

for name, param in self.mlp.named_parameters():
      if "weight" in name:
            init.normal_(param, mean=0, std=0.5 * (1 / config.n_embd) ** 0.5)

And for the embeddings where does the 3.3 constant come from?

init.normal_(self.embed.weight, mean=0, std=alpha * 3.3)

Answered by cloneofsimo

Mar 6, 2024

Because in the paper, they did the sweeping for transformer and found that multiplicative scaling factor for embedding is good when its something like 3 ~ 10 times, so naturally, since we dont have multiplicative factor, scale the init and lr 3 ~ 10 times each. Kinda random hyperparameter tbh, and the sweeping results of muP paper doesnt how that much difference.

View full answer

cloneofsimo · 2024-03-06T02:48:40Z

cloneofsimo
Mar 6, 2024
Maintainer

Because in the paper, they did the sweeping for transformer and found that multiplicative scaling factor for embedding is good when its something like 3 ~ 10 times, so naturally, since we dont have multiplicative factor, scale the init and lr 3 ~ 10 times each. Kinda random hyperparameter tbh, and the sweeping results of muP paper doesnt how that much difference.

2 replies

cloneofsimo Mar 6, 2024
Maintainer

Same goes for the initialization. look how 0.5 seems good on the left chart. so I chose that, but if anything, sweep shows you it doesn't really matter that much

platers Mar 7, 2024
Author

got it thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magic constants #7

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Magic constants #7

platers Mar 4, 2024

Replies: 1 comment · 2 replies

cloneofsimo Mar 6, 2024 Maintainer

cloneofsimo Mar 6, 2024 Maintainer

platers Mar 7, 2024 Author

platers
Mar 4, 2024

Replies: 1 comment 2 replies

cloneofsimo
Mar 6, 2024
Maintainer

cloneofsimo Mar 6, 2024
Maintainer

platers Mar 7, 2024
Author