Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The optimizer is not receiving the FSDP model parameters. #3209

Open
2 of 4 tasks
eljandoubi opened this issue Nov 1, 2024 · 8 comments · May be fixed by #3213
Open
2 of 4 tasks

The optimizer is not receiving the FSDP model parameters. #3209

eljandoubi opened this issue Nov 1, 2024 · 8 comments · May be fixed by #3213

Comments

@eljandoubi
Copy link
Contributor

eljandoubi commented Nov 1, 2024

System Info

- `Accelerate` version: 1.0.1
- Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
- `accelerate` bash location: /home/a/anaconda3/envs/trans/bin/accelerate
- Python version: 3.12.7
- Numpy version: 2.1.2
- PyTorch version (GPU?): 2.5.0+cu124 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 15.46 GB
- GPU type: NVIDIA GeForce RTX 3070 Laptop GPU
- `Accelerate` default config:
	Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

See code in the pdf
fsdp_acc.pdf

Expected behavior

The optimizer has the FSDP model parameters.

@weixiong-ur
Copy link

does it mean that the optimizer actually does not take any parameters to update? So the model parameters won't be updated in the training loop?

@eljandoubi
Copy link
Contributor Author

I guess.

@eljandoubi
Copy link
Contributor Author

@weixiong-ur have you had the same result as me ?

@muellerzr
Copy link
Collaborator

a few things:

  • I'm confused by this repr. You only have a single GPU no? (per accelerate env). We don't support non-multi-GPU FSDP.
  • And if FSDP truly were broken in such a way, it'd be a much larger problem which we know it's not.

Can you give a non jupyter notebook-based repr, and if you are sharing a jupyter notebook, for security reasons, please share the notebook in a gist with outputs not as a PDF as I'm hesitant to open these due to security reasons, and it's difficult to copy/paste from them etc

@BenjaminBossan
Copy link
Member

Not sure if it's related, but users reported an error in PEFT that points in a similar direction. Note that the error is not caused by PEFT, as I could reproduce it without PEFT. From the error message, it seems like the params passed to the optimizer are not consistent with the model parameters. I could resolve the error by downgrading to:

  • trl==0.11.0
  • tokenizers>=0.19,<0.20
  • transformers==4.44.2
  • accelerate==0.33.0

@eljandoubi Maybe you could check if these versions resolve your issue.

@eljandoubi
Copy link
Contributor Author

eljandoubi commented Nov 21, 2024

@muellerzr
You can use my branch of the transformers repository [here] to train a model like Donut with FSDP wrapping based on layer size. It will print the number of parameters before and after applying FSDP wrapping.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@BenjaminBossan
Copy link
Member

Can you check if huggingface/transformers#35212 has solved the issue? If not, could you try if additionally switching off flash attention helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants