PyTorch: fast array computations on GPU

When using numpy at the beginning of a project we don’t think about performance. This is because it is more important for us to get solution that works. And only then, with real data, observing how our computer begins to slow down, we have to find answer on how to avoid this. And then we remember the GPU inside our computer and think: can it help us?

Yes, sure. PyTorch can utilize GPU and perform all computations on it. PyTorch uses Tensor primitives like numpy arrays. Unlike numpy, once declared tensors reside on GPU, and are calculated on it. Due to a lot of the parallel working GPU inits, computing performance arises dramatically.

We will take the matrix multiplication algorithm for the example.

First, let’s make sure the multiplication computations are performed correctly.

import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print(a @ b)

import numpy as np

a = np.array([[1,2],[3,4]])

b = np.array([[5,6],[7,8]])

print(a @ b)

Output:

[[19 22]
[43 50]]

1 2	[[19 22] [43 50]]

Everything is as expected.

Now, tensor comes into play. At this point, all the necessary modules and drivers have already been installed:

import torch

print("Cuda is available"
if torch.cuda.is_available()
else "Cpu only available"
)

import torch

print("Cuda is available"

if torch.cuda.is_available()

else "Cpu only available"

)

Output:

Cuda is available

1	Cuda is available

Let’s allow PyTorch to multiply matricies:

ta = torch.from_numpy(a)
tb = torch.from_numpy(b)
print(torch.matmul(ta, tb))

ta = torch.from_numpy(a)

tb = torch.from_numpy(b)

print(torch.matmul(ta, tb))

Output:

tensor([[19, 22],
[43, 50]])

1 2	tensor([[19, 22], [43, 50]])

The result is the same as for the numpy.

Take a larger matrices and see how much time it takes to calculate in each of the three modes:

1) PyTorch in «cpu» (numpy — like) CPU only mode
2) PyTorch in «cuda» (GPU) mode
3) Numpy CPU mode

import datetime as dt
dtype = torch.float

M = 100

def calc(device):
  a = torch.rand(M, M, device=device, dtype=dtype)
  b = torch.rand(M, M, device=device, dtype=dtype)
  n1=dt.datetime.now()
  torch.matmul(a, b).size()
  n2=dt.datetime.now()
  print(device, '\t', M, n2-n1)

a = np.random.rand(M, M).astype('f')
b = np.random.rand(M, M).astype('f')
n1=dt.datetime.now()
np.matmul(a, b).size
n2=dt.datetime.now()
print("numpy", '\t', M, n2-n1)

calc(torch.device("cpu"))
calc(torch.device("cuda"))

import datetime as dt

dtype = torch.float

M = 100

def calc(device):

a = torch.rand(M, M, device=device, dtype=dtype)

b = torch.rand(M, M, device=device, dtype=dtype)

n1=dt.datetime.now()

torch.matmul(a, b).size()

n2=dt.datetime.now()

print(device, '\t', M, n2-n1)

a = np.random.rand(M, M).astype('f')

b = np.random.rand(M, M).astype('f')

n1=dt.datetime.now()

np.matmul(a, b).size

n2=dt.datetime.now()

print("numpy", '\t', M, n2-n1)

calc(torch.device("cpu"))

calc(torch.device("cuda"))

Output for M=100

numpy 100 0:00:00.000190
cpu 100 0:00:00.006382
cuda 100 0:00:00.017093

numpy 100 0:00:00.000190

cpu 100 0:00:00.006382

cuda 100 0:00:00.017093

Output for M=1000

numpy 1000 0:00:00.014212
cpu 1000 0:00:00.064719
cuda 1000 0:00:00.014513

numpy 1000 0:00:00.014212

cpu 1000 0:00:00.064719

cuda 1000 0:00:00.014513

Output for M=10000

numpy 10000 0:00:12.415283
cpu 10000 0:00:12.241426
cuda 10000 0:00:00.036787

numpy 10000 0:00:12.415283

cpu 10000 0:00:12.241426

cuda 10000 0:00:00.036787

For small matrices (M=100) numpy looks like a performance champion. Even torch in cpu mode loses to it. The worst results are observed for the GPU.

For a medium-sized matrices (M=1000) the results are leveled out. Numpy and torch in cpu mode are the same and only torch in gpu mode is far behind.

At the moment, everything looks sad for GPU. It makes a real breakthrough for large-sized (M=10000) matrices. Here the gap in execution speed is huge: 37 milliseconds for GPU versus 12 seconds for numpy.

Simultaneous computing is good

ПрОдАвЕц ИгРуШеК

Рубрики

Архивы

PyTorch: fast array computations on GPU

Ответить Cancel reply

Last 10 articles

Last comments

Login

ПрОдАвЕц ИгРуШеК

Рубрики

Tags

Архивы

PyTorch: fast array computations on GPU

Ответить Cancel reply

Last 10 articles

Last comments

Login