DINOv2 model slow CPU evaluation #2682

liamwhite · 2024-12-27T04:57:51Z

Candle is about 10x slower at evaluating this model on the CPU. I have provided a demonstration repository with all the code needed to reproduce.

Output of a typical run of python main.py:

Took 0.12951040267944336 seconds to evaluate

Output of a typical run of target/release/candle_issue_demo:

Took 1.016947847 seconds to evaluate Tensor[dims 1, 1536; f32]

This is unfortunate because loading the model from Rust is much faster than loading it from Python, and would be nice to avoid the need for a server process when running feature extraction on demand.

I tried to keep the gist of the code the same between these, but the Rust version contains two necessary alterations:

The imagenet code from the examples crate is pasted into a module (it probably should be available within the candle_transformers crate, but this is an incredibly minor issue)
The dinov2 code is not designed for the facebook safetensors model which has different parameter names; the most significant difference among these is that qkv is split up into query,key,value. This was addressed by pasting the dinov2 module from DinoV2 & Depth Anything V2: Bigger Models #2288 (c9ed473)

My system specs:
CPU: Ryzen 9 5950X
RAM: 64GB

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DINOv2 model slow CPU evaluation #2682

DINOv2 model slow CPU evaluation #2682

liamwhite commented Dec 27, 2024 •

edited

Loading

DINOv2 model slow CPU evaluation #2682

DINOv2 model slow CPU evaluation #2682

Comments

liamwhite commented Dec 27, 2024 • edited Loading

liamwhite commented Dec 27, 2024 •

edited

Loading