Using apple accelerate/intel optimized BLAS with gensim and other numpy numeric computation #147
Replies: 6 comments 3 replies
-
many factors are involved to speed up a process, including GPU drivers, the language itself (python can be slow), the library and components used reacting differently on each OS and hardware.... it's not one recipe for all but many recipe for each.... |
Beta Was this translation helpful? Give feedback.
-
’m asking because I come from a numerical background, where it makes a significant difference if I use an optimized library or not—sometimes as much as a 10x improvement when using NumPy—without changing much of the code. For example, with Gensim, using an optimized BLAS instead of the default libraries installed via Conda can make Word2Vec faster... but if the bottleneck here is the TTS code I dont if it will help ... Currently numpy installed in the env installed via script is using default If I will have time I will check if installing some dependencies directly from conda forge and forcing - libblas=*=*accelerate will help with time It will probably will be still much slower that GPU, but it still might cut here the time (depending what is bottleneck here) |
Beta Was this translation helpful? Give feedback.
-
if you have an idea to implement it for Windows/Linux/Mac so please create a PR thanks |
Beta Was this translation helpful? Give feedback.
-
btw did you hear about modular? https://www.modular.com/mojo |
Beta Was this translation helpful? Give feedback.
-
yes, but mojo is rather a separate language, in a big subset interoperable with python; many people looking for performance now choose instead to rewrite some parts of code in rust and combine it with high level Python (this is probably a thing that I would like to learn in the future) my idea here is much more modest, is just to see if I can made the python libraries in the environment to link to platform optimized libraries via explicitly defining it via conda env, so it would be a really low hanging fruit, and maybe modest improvement for people running code on their CPUI I will indeed play with this idea and If it help with performance provide a PR |
Beta Was this translation helpful? Give feedback.
-
don't you have to recompile all? |
Beta Was this translation helpful? Give feedback.
-
I wonder how much of the the workload here could be make faster by using platform optimized BLAS?
I am on MacOS M4Pro and simple ebook (in French) took me more than 1h to create an audiobook, I wonder if this might by made faster by using platform optimised conda BLAS installs instead of default one (which might not be platform optimised, depending from platform)
Beta Was this translation helpful? Give feedback.
All reactions