Using apple accelerate/intel optimized BLAS with gensim and other numpy numeric computation #147

danieltomasz · 2024-12-28T15:18:33Z

danieltomasz
Dec 28, 2024

I wonder how much of the the workload here could be make faster by using platform optimized BLAS?
I am on MacOS M4Pro and simple ebook (in French) took me more than 1h to create an audiobook, I wonder if this might by made faster by using platform optimised conda BLAS installs instead of default one (which might not be platform optimised, depending from platform)

ROBERT-MCDOWELL · 2024-12-28T15:34:45Z

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

many factors are involved to speed up a process, including GPU drivers, the language itself (python can be slow), the library and components used reacting differently on each OS and hardware.... it's not one recipe for all but many recipe for each....
btw BLAS cannot do anything without optimization from the TTS code itself

0 replies

danieltomasz · 2024-12-28T18:15:00Z

danieltomasz
Dec 28, 2024
Author

’m asking because I come from a numerical background, where it makes a significant difference if I use an optimized library or not—sometimes as much as a 10x improvement when using NumPy—without changing much of the code. For example, with Gensim, using an optimized BLAS instead of the default libraries installed via Conda can make Word2Vec faster... but if the bottleneck here is the TTS code I dont if it will help ...

Currently numpy installed in the env installed via script is using default openblas64 (because it is pip installed)
python -c "import numpy; numpy.show_config()"

If I will have time I will check if installing some dependencies directly from conda forge and forcing - libblas=*=*accelerate will help with time

It will probably will be still much slower that GPU, but it still might cut here the time (depending what is bottleneck here)

0 replies

ROBERT-MCDOWELL · 2024-12-28T18:36:52Z

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

if you have an idea to implement it for Windows/Linux/Mac so please create a PR thanks

0 replies

ROBERT-MCDOWELL · 2024-12-28T18:46:48Z

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

btw did you hear about modular? https://www.modular.com/mojo

0 replies

danieltomasz · 2024-12-28T19:18:30Z

danieltomasz
Dec 28, 2024
Author

yes, but mojo is rather a separate language, in a big subset interoperable with python; many people looking for performance now choose instead to rewrite some parts of code in rust and combine it with high level Python (this is probably a thing that I would like to learn in the future)

my idea here is much more modest, is just to see if I can made the python libraries in the environment to link to platform optimized libraries via explicitly defining it via conda env, so it would be a really low hanging fruit, and maybe modest improvement for people running code on their CPUI
Numpy 2.0 support it out of the box via pip as well, but many libraries still uses numpy 1.26

I will indeed play with this idea and If it help with performance provide a PR

0 replies

ROBERT-MCDOWELL · 2024-12-28T20:20:32Z

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

don't you have to recompile all?

3 replies

danieltomasz Dec 28, 2024
Author

My experience is when is want to use libraries installed via pip, I indeed need to compile with numpy <2.0 via 'pip install numpy --no-binary numpy' if I install directly from conda forge I can install optimized libraries within env and numpy/Numba linked to them (MKL in case of Intel machines, Accelerate in case of Apple Silicon et cetera)

danieltomasz Dec 28, 2024
Author

The install script (line 217) doesn't use conda forge channel to specify required libraries, it just pip install requirements, but using conda provided numpy/numba can have advantages https://numba.readthedocs.io/en/stable/user/performance-tips.html

Not every package is available on conda forge, so the installation process needs the careful checks when mixing conda and pip to avoid problem with dependencies

ROBERT-MCDOWELL Dec 28, 2024
Collaborator

for now it's not our priority as the script must be the most universal as possible as it must cover all compatible OSes.
we have a lot of more issues to solve and make the project with its essential and crucial functionalities stable.
but you can of course experiment on your side and report the results with a possible patch for specific OS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using apple accelerate/intel optimized BLAS with gensim and other numpy numeric computation #147

{{title}}

Replies: 6 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Using apple accelerate/intel optimized BLAS with gensim and other numpy numeric computation #147

danieltomasz Dec 28, 2024

Replies: 6 comments · 3 replies

ROBERT-MCDOWELL Dec 28, 2024 Collaborator

danieltomasz Dec 28, 2024 Author

ROBERT-MCDOWELL Dec 28, 2024 Collaborator

ROBERT-MCDOWELL Dec 28, 2024 Collaborator

danieltomasz Dec 28, 2024 Author

ROBERT-MCDOWELL Dec 28, 2024 Collaborator

danieltomasz Dec 28, 2024 Author

danieltomasz Dec 28, 2024 Author

ROBERT-MCDOWELL Dec 28, 2024 Collaborator

danieltomasz
Dec 28, 2024

Replies: 6 comments 3 replies

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

danieltomasz
Dec 28, 2024
Author

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

danieltomasz
Dec 28, 2024
Author

ROBERT-MCDOWELL
Dec 28, 2024
Collaborator

danieltomasz Dec 28, 2024
Author

danieltomasz Dec 28, 2024
Author

ROBERT-MCDOWELL Dec 28, 2024
Collaborator