Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DINOv2 model slow CPU evaluation #2682

Open
liamwhite opened this issue Dec 27, 2024 · 0 comments
Open

DINOv2 model slow CPU evaluation #2682

liamwhite opened this issue Dec 27, 2024 · 0 comments

Comments

@liamwhite
Copy link

liamwhite commented Dec 27, 2024

Candle is about 10x slower at evaluating this model on the CPU. I have provided a demonstration repository with all the code needed to reproduce.

Output of a typical run of python main.py:

Took 0.12951040267944336 seconds to evaluate

Output of a typical run of target/release/candle_issue_demo:

Took 1.016947847 seconds to evaluate Tensor[dims 1, 1536; f32]

This is unfortunate because loading the model from Rust is much faster than loading it from Python, and would be nice to avoid the need for a server process when running feature extraction on demand.

I tried to keep the gist of the code the same between these, but the Rust version contains two necessary alterations:

  1. The imagenet code from the examples crate is pasted into a module (it probably should be available within the candle_transformers crate, but this is an incredibly minor issue)
  2. The dinov2 code is not designed for the facebook safetensors model which has different parameter names; the most significant difference among these is that qkv is split up into query,key,value. This was addressed by pasting the dinov2 module from DinoV2 & Depth Anything V2: Bigger Models #2288 (c9ed473)

My system specs:
CPU: Ryzen 9 5950X
RAM: 64GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant