more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better? #1894
Replies: 2 comments 3 replies
-
CPU inference is very memory bandwidth dependent. Once you hit the point where that's the bottleneck, adding more cores or threads just increases overhead. Also, probably not relevant for you but the recent CUDA stuff resulted in using more than one thread (usually?) being a performance loss if you're offloading all layers to the GPU. |
Beta Was this translation helpful? Give feedback.
-
It is something that is very hardware-dependant. Memory type, amount, speed, CPU sockets, cores, threads, NUMA, big-LITTLE, etc. You need to test what numbers are optimal for you, but generally it won't be for CPU-based generation. CUDA or CL will be faster, but probably most consumer cards don't have enough memory to run the larger models. |
Beta Was this translation helpful? Give feedback.
-
more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better?
i have 6 cpu cores with a vps, using 3 cores is more optimum than 6 total. wish not to guess why but i found others mentioned the same etc.
also, 12-16 cores seems optimum for a 28 cpu core machine. (i read somewhere)
asking so coz i was wondering how many cores are optimum for my next vps purchase / laptop investment for this.
will be paying 1 year vps hosting coz it's cheaper.
possible to highlight some "gotchas" / caveats / current limitations?
Beta Was this translation helpful? Give feedback.
All reactions