questions about the upcoming fp8 support #13735

Amin456789 · 2023-10-23T15:09:01Z

Amin456789
Oct 23, 2023

Hi!
@KohakuBlueleaf i see u r working on a fp8 in the changelogs. nice job and thanks, will it work with cpu and can we convert 1.5 models with it too?

Answered by KohakuBlueleaf

Oct 23, 2023

Just model.to(float8_e4m3fn)
(e4m3 is normally enough for all the usecase, but if you meet some problems that some param will overflow, use e5m2)

View full answer

KohakuBlueleaf · 2023-10-23T15:57:17Z

KohakuBlueleaf
Oct 23, 2023
Collaborator

1.5 models are already supported for cuda(that's why it has -xl variation)

And for cpu I haven't checked
Since pytorch have autocast for cpu
Theoretically it could work(or I need to do lot of manual cast for it)

Basically the idea is param in fp8 but calculation in other precision
(But since for inference basically vram usage are from params, so it helps a lot)

0 replies

Amin456789 · 2023-10-23T16:01:18Z

Amin456789
Oct 23, 2023
Author

thank u for answer!
please check and and make it cpu compatible as it will be a ram saver for us cpu users too, i have tested int8 onnx in the past [they are half the fp16] and they were so good for my cpu and ram. lots of people will benefit from this,
also, is there any tutorial how to convert the models to fp8?

thank u so much for ur hard work

2 replies

KohakuBlueleaf Oct 23, 2023
Collaborator

Just model.to(float8_e4m3fn)
(e4m3 is normally enough for all the usecase, but if you meet some problems that some param will overflow, use e5m2)

Answer selected by KohakuBlueleaf

Amin456789 Oct 23, 2023
Author

thank u!

KohakuBlueleaf · 2023-10-23T17:51:15Z

KohakuBlueleaf
Oct 23, 2023
Collaborator

@Amin456789 I push a commit for CPU
and "it should work"
It can load model correctly, it can run sampling.
But since it is super slow on my machine (don't know if you have some optimization for CPU speed?)
I never run a full sampling loop for it

Need you guys to help to check if it work.

8 replies

KohakuBlueleaf Oct 23, 2023
Collaborator

I think I misunderstand your request
Your request should be fp8 for onnx right?

I don't know anything about onnx

KohakuBlueleaf Oct 23, 2023
Collaborator

In my implementation
The fp8 is running in cpu and don't need any kind of conversion for the model

Amin456789 Oct 23, 2023
Author

no i meant fp8 for torch models such as safetensors or ckpt.
so i can use a fp16 model here without converting to fp8? i mean we convert 32 to 16 to get the half size, i thought we could shrink it even more by going 8

KohakuBlueleaf Oct 23, 2023
Collaborator

Wut, just convert it.
What I'm doing is for computing not store the model

Amin456789 Oct 23, 2023
Author

oh now i get it :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about the upcoming fp8 support #13735

{{title}}

Replies: 3 comments 10 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

questions about the upcoming fp8 support #13735

Amin456789 Oct 23, 2023

Replies: 3 comments · 10 replies

KohakuBlueleaf Oct 23, 2023 Collaborator

Amin456789 Oct 23, 2023 Author

KohakuBlueleaf Oct 23, 2023 Collaborator

Amin456789 Oct 23, 2023 Author

KohakuBlueleaf Oct 23, 2023 Collaborator

KohakuBlueleaf Oct 23, 2023 Collaborator

KohakuBlueleaf Oct 23, 2023 Collaborator

Amin456789 Oct 23, 2023 Author

KohakuBlueleaf Oct 23, 2023 Collaborator

Amin456789 Oct 23, 2023 Author

Amin456789
Oct 23, 2023

Replies: 3 comments 10 replies

KohakuBlueleaf
Oct 23, 2023
Collaborator

Amin456789
Oct 23, 2023
Author

KohakuBlueleaf Oct 23, 2023
Collaborator

Amin456789 Oct 23, 2023
Author

KohakuBlueleaf
Oct 23, 2023
Collaborator

KohakuBlueleaf Oct 23, 2023
Collaborator

KohakuBlueleaf Oct 23, 2023
Collaborator

Amin456789 Oct 23, 2023
Author

KohakuBlueleaf Oct 23, 2023
Collaborator

Amin456789 Oct 23, 2023
Author