Developers Club geek daily blog

1 year, 7 months ago
Already many use numpy library in the python-programs as it considerably accelerates work with data and execution of mathematical operations. However in many cases of numpy works many times more slowly, than she can … because uses only one processor though she could use everything that you have.

The matter is that for execution of many operations numpy causes functions from library of linear algebra. Here in it it is normal and the problem is covered. Fortunately, everything is quite easily adjustable.

So, perhaps three situations:

  • you set no libraries of linear algebra and then numpy uses the built-in library, and it, it is necessary to tell, very slow;
  • you already set classical libraries like ATLAS and BLAS, and they are able to use only one processor;
  • you set modern library OpenBLAS, MKL and it similar.

Let's carry out the simple test. Start this program:

import numpy as np
size = 10000
a = np.random.random_sample((size, size))
b = np.random.random_sample((size, size))
n = np.dot(a,b)

Then if you work in Linux, then start top and if you work in Windows, come on a tab into "High-speed performance" in a task manager (is caused on Ctrl+Shift+Esc) … If top shows load at the level of 100%, and the Loading of the CPU indicator on the High-speed performance tab, on the contrary, shows value repeatedly lower than 100%, means calculations only one kernel — and this article for you is occupied. Those at whom all processors are involved can rejoice — at them everything is all right — it is possible not to read further.

Solution for Windows

Theoretically, it is possible to find, of course, source codes of libraries, to recompile them and to rebuild numpy. I even heard that someone wrote that he saw people who said that they managed it … Generally, the easiest way is to set the scientific Python distribution kit, for example, of Anaconda or Canopy. Enters a distribution kit not only python and numpy, but also the whole heap of useful libraries for calculations and visualization.

Then you can restart the initial test to be convinced that speed increased many times.

Solution for Linux

Actually you can also set the ready distribution kit Anaconda, Canopy or something else with all libraries at once. But if you prefer to collect by the hands, then read further — there are all recipes.

Check of libraries


As you remember, two options are possible:

  • at you libraries are set "oldskulny" (or "outdated" as it is pleasant to whom) (for example, ATLAS);
  • you did not set libraries, and numpy uses the built-in library (which even more slowly)

If you have a fresh numpy version (> 1.10), then, having come into the directory where numpy (normally it /usr/local/lib/python2.7/dist-packages/numpy is set, but depending on the Linux and Python version the way can change) and execute the following commands in the console:

cd core
ldd multiarray.so

In earlier numpy versions there is no multiarray.so library, but there is _dotblas.so:

ldd _dotblas.so

The output of the ldd command will show you whether uses numpy third-party libraries of linear algebra.

linux-vdso.so.1 =>  (0x00007fffe58a2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8adbff4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8adbdd6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8adba10000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8adc68c000)

If in listing you do not see libblas.so, your numpy means uses the internal library. If you see, means at you costs ATLAS or BLAS.

Anyway at first to you it is necessary the library of linear algebra is correct.

OpenBLAS installation


OpenBLAS — good library of algorithms and functions of linear algebra which are the cornerstone of modern methods of data analysis and machine learning.

First of all you will need the Fortran compiler as OpenBLAS is not compatible to the standard compiler g77.

sudo apt-get install gfortran

Load OpenBLAS with github'a (previously having returned to the directory, suitable for installation):

git clone https://github.com/xianyi/OpenBLAS.git

Now come into the directory and start assembly:

cd OpenBLAS
make FC=gfortran

When compilation and assembly successfully come to the end, set library.

sudo make install

By default, the library will be set in / opt/OpenBLAS. If you want to set it to other place, start make install with PREFIX key:

sudo make install PREFIX=/your/preferred/location

Reassignment of libraries


If earlier you found out that you already set some library of linear algebra, then it is enough to you to start command of reassignment of libraries:

sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 \
	/opt/OpenBLAS/lib/libopenblas.so 50

After this OpenBLAS by default will become library of a linear alegbra not only for numpy, and in general for all your programs and libraries.

And again start the test to see how at you at calculations all processors are involved now.

We collect the correct numpy


If your numpy worked at the built-in library, then you should rebuild it that it picked up just set OpenBLAS.

At first get rid of defective library:

sudo pip uninstall numpy

Then create the .numpy-site.cfg file of the following contents in the house directory:

[default]
include_dirs = /opt/OpenBLAS/include
library_dirs = /opt/OpenBLAS/lib

[openblas]
openblas_libs = openblas
include_dirs = /opt/OpenBLAS/include
library_dirs = /opt/OpenBLAS/lib

[lapack]
lapack_libs = openblas

[atlas]
atlas_libs = openblas
libraries = openblas

If you selected a non-standard arrangement for OpenBLAS earlier, then change ways in the file. And now set numpy again:

sudo pip install numpy

When compilation and installation come to the end, start the initial test to be convinced that now processors do not stand idle. That's all.

This article is a translation of the original post at habrahabr.ru/post/274331/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus