more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better? #1894

kolinfluence · 2023-06-16T14:56:45Z

kolinfluence
Jun 16, 2023

more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better?

i have 6 cpu cores with a vps, using 3 cores is more optimum than 6 total. wish not to guess why but i found others mentioned the same etc.

also, 12-16 cores seems optimum for a 28 cpu core machine. (i read somewhere)
asking so coz i was wondering how many cores are optimum for my next vps purchase / laptop investment for this.
will be paying 1 year vps hosting coz it's cheaper.

possible to highlight some "gotchas" / caveats / current limitations?

KerfuffleV2 · 2023-06-16T20:34:28Z

KerfuffleV2
Jun 16, 2023
Collaborator

CPU inference is very memory bandwidth dependent. Once you hit the point where that's the bottleneck, adding more cores or threads just increases overhead.

Also, probably not relevant for you but the recent CUDA stuff resulted in using more than one thread (usually?) being a performance loss if you're offloading all layers to the GPU.

3 replies

kolinfluence Jun 16, 2023
Author

is there a formula for calculating what's the optimum mem bandwidth to cpu hz ratio? about to buy 1 yr vps plan and also tempted to buy a good laptop for 65b parameters.

suggestion on how to calculate what's ideal / optimum and also if the issue will be fix coz no point getting good hardware and ending up using half of what is needed.

KerfuffleV2 Jun 16, 2023
Collaborator

Sorry, can't help you there. You could possibly look at stuff like DDR4/DDR5 bandwidth and compare it to the model size to get a very, very rough estimate of how much data might need to flow through the memory. The entire model is needed per token.

Also, if you really care about optimizing performance you'd probably be better off buying GPU time than a GPU-less server which is generally just not going to be all that efficient.

Also, bear in mind llama.cpp, ggml, etc are evolving rapidly. What's true right now may not be true tomorrow (although generally speaking it's very hard to get around the issue of the entire model being needed for each token).

kolinfluence Jun 16, 2023
Author

@KerfuffleV2 yes u are right about the momentum of R&D, now have orca. waiting for it to be open sourced. not sure if llama can use with orca or something. will wait a bit and see.

cant find any github stuff to download here:
https://www.marktechpost.com/2023/06/13/microsoft-ai-introduces-orca-a-13-billion-parameter-model-that-learns-to-imitate-the-reasoning-process-of-lfms-large-foundation-models/

if any of u know how to orca, do mention.

SlyEcho · 2023-06-17T13:30:35Z

SlyEcho
Jun 17, 2023
Collaborator Sponsor

It is something that is very hardware-dependant.

Memory type, amount, speed, CPU sockets, cores, threads, NUMA, big-LITTLE, etc.

You need to test what numbers are optimal for you, but generally it won't be for CPU-based generation. CUDA or CL will be faster, but probably most consumer cards don't have enough memory to run the larger models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better? #1894

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

more cores slower right? now it's using half cores. possible to put as "bugs" or milestone on roadmap in issue to track better? #1894

kolinfluence Jun 16, 2023

Replies: 2 comments · 3 replies

KerfuffleV2 Jun 16, 2023 Collaborator

kolinfluence Jun 16, 2023 Author

KerfuffleV2 Jun 16, 2023 Collaborator

kolinfluence Jun 16, 2023 Author

SlyEcho Jun 17, 2023 Collaborator Sponsor

kolinfluence
Jun 16, 2023

Replies: 2 comments 3 replies

KerfuffleV2
Jun 16, 2023
Collaborator

kolinfluence Jun 16, 2023
Author

KerfuffleV2 Jun 16, 2023
Collaborator

kolinfluence Jun 16, 2023
Author

SlyEcho
Jun 17, 2023
Collaborator Sponsor