QPyTorch Functionality Overview

Introduction

In this notebook, we provide an overview of the major features of QPyTorch.

[1]:
import torch
import qtorch

Quantization

QPyTorch supports three different number formats: fixed point, block floating point, and floating point.

QPyTorch provides quantization functions that quantizes pytorch tensor.

[2]:
from qtorch.quant import fixed_point_quantize, block_quantize, float_quantize
[3]:
full_precision_tensor = torch.rand(5)
print("Full Precision: {}".format(full_precision_tensor))
low_precision_tensor = float_quantize(full_precision_tensor, exp=5, man=2, rounding="nearest")
print("Low Precision: {}".format(low_precision_tensor))
Full Precision: tensor([0.1241, 0.3602, 0.7104, 0.8344, 0.0211])
Low Precision: tensor([0.1250, 0.3750, 0.7500, 0.8750, 0.0195])

QPyTorch supports both nearest rounding and stochastic rounding.

[4]:
nearest_rounded = float_quantize(full_precision_tensor, exp=5, man=2, rounding="nearest")
stochastic_rounded = float_quantize(full_precision_tensor, exp=5, man=2, rounding="stochastic")
print("Nearest: {}".format(nearest_rounded))
print("Stochastic: {}".format(stochastic_rounded))
Nearest: tensor([0.1250, 0.3750, 0.7500, 0.8750, 0.0195])
Stochastic: tensor([0.1250, 0.3750, 0.7500, 0.8750, 0.0195])

Autograd

QPyTorch offers a pytorch nn.Module wrapper to integrate quantization into auto differention. A Quantizer module can use different low-precision number formats for forward and backward propagation.

[5]:
# First define number formats used in forward and backward quantization
from qtorch import FixedPoint, FloatingPoint
forward_num = FixedPoint(wl=4, fl=2)
backward_num = FloatingPoint(exp=5, man=2)

# Create a quantizer
from qtorch.quant import Quantizer
Q = Quantizer(forward_number=forward_num, backward_number=backward_num,
              forward_rounding="nearest", backward_rounding="stochastic")
[6]:
# Use QPyTorch Quantizer just as any other nn.Modules
from torch.nn import Module, Linear
class LinearLP(Module):
    """
    a low precision Logistic Regression model
    """
    def __init__(self):
        super(LinearLP, self).__init__()
        self.W = Linear(5, 1)

    def forward(self, x):
        out = self.W(x)
        out = Q(out)
        return out

lp_model = LinearLP()
[7]:
# forward low precision model, get low precision output
fake_input = torch.rand(5)
lp_output = lp_model(fake_input)
print("Low Precision Output: {}".format(lp_output))
Low Precision Output: tensor([-0.2500], grad_fn=<RoundingBackward>)
[8]:
# backward propagation is quantized automatically
from torch import sigmoid
from torch.nn import BCELoss
lp_model.zero_grad()
criterion = BCELoss()
label = torch.Tensor([0])
loss = criterion(sigmoid(lp_output), label)
loss.backward()

Low Precision Optimization

Weight and Gradient Quantization

In the previous example, the forward and backward signals are quantized into low precision. However, if we optimize our model using gradient descent, the weight and gradient may not necessarily be low precision. QPyTorch offers a low precision wrapper for pytorch optimizers and abstracts the quantization of weights, gradients, and the momentum velocity vectors.

[9]:
from torch.optim import SGD
from qtorch.optim import OptimLP

optimizer = SGD(lp_model.parameters(), momentum=0.9, lr=0.1) # use your favorite optimizer
# define custom quantization functions for different numbers
weight_quant = lambda x : float_quantize(x, exp=5, man=2, rounding="nearest")
gradient_quant = lambda x : float_quantize(x, exp=5, man=2, rounding="nearest")
momentum_quant = lambda x : float_quantize(x, exp=6, man=9, rounding="nearest")
# turn your optimizer into a low precision optimizer
optimizer = OptimLP(optimizer,
                    weight_quant=weight_quant,
                    grad_quant=gradient_quant,
                    momentum_quant=momentum_quant)
[10]:
print("Weight before optimizer stepping: \n{}".format(lp_model.W.weight.data))
print("Gradient before optimizer stepping: \n{}\n".format(lp_model.W.weight.grad))
optimizer.step()
print("Weight after optimizer stepping: \n{}".format(lp_model.W.weight.data))
print("Gradient after optimizer stepping: \n{}".format(lp_model.W.weight.grad))
optimizer.zero_grad() #
Weight before optimizer stepping:
tensor([[-0.1850,  0.1250, -0.1007, -0.0862,  0.3034]])
Gradient before optimizer stepping:
tensor([[0.1051, 0.2755, 0.0375, 0.1643, 0.1883]])

Weight after optimizer stepping:
tensor([[-0.1875,  0.0938, -0.1094, -0.1094,  0.3125]])
Gradient after optimizer stepping:
tensor([[0.1094, 0.2500, 0.0391, 0.1562, 0.1875]])

Gradient Accumulator

One popular practice in low precision training is to utilize a higher precision gradient accumulator. The gradients, after multiplied with learning rate, modified by the momentum terms, are added onto the high precision gradient accumulator. Upon next iteration of forward and backward propagation, the weights are re-quantized from the gradient accumulator so expensive computations are still done in low precision.

QPyTorch integrates this process into the low precision optimizer.

[11]:
# Let's quickly repeat the above example
lp_model = LinearLP()
fake_input = torch.rand(5)
lp_output = lp_model(fake_input)
print("Low Precision Output: {}".format(lp_output))
lp_model.zero_grad()
criterion = BCELoss()
label = torch.Tensor([0])
loss = criterion(sigmoid(lp_output), label)
loss.backward()
Low Precision Output: tensor([0.2500], grad_fn=<RoundingBackward>)
[12]:
# define a low precision optimizer with gradient accumulators
optimizer = SGD(lp_model.parameters(), momentum=0, lr=0.1)
weight_quant = lambda x : float_quantize(x, exp=5, man=2, rounding="nearest")
gradient_quant = lambda x : float_quantize(x, exp=5, man=2, rounding="nearest")
acc_quant = lambda x : float_quantize(x, exp=6, man=9, rounding="nearest") # use higher precision for accumulator
optimizer = OptimLP(optimizer,
                    weight_quant=weight_quant,
                    grad_quant=gradient_quant,
                    momentum_quant=momentum_quant,
                    acc_quant=acc_quant)
[13]:
print("Weight before optimizer stepping: \n{}\n".format(lp_model.W.weight.data))
optimizer.step()
print("after stepping, high precision accumulator : \n{}".format(optimizer.weight_acc[lp_model.W.weight]))
print("after stepping, low precision weight : \n{}".format(lp_model.W.weight.data))
optimizer.zero_grad()
Weight before optimizer stepping:
tensor([[ 0.2943, -0.1296, -0.4130, -0.2599, -0.4059]])

after stepping, high precision accumulator :
tensor([[ 0.2817, -0.1309, -0.4253, -0.2603, -0.4185]])
after stepping, low precision weight :
tensor([[ 0.3125, -0.1250, -0.4375, -0.2500, -0.4375]])

High-level Helper

QPytorch also provide a useful helper that automatically turn a predefined pytorch model into a low-precision one.

[14]:
from qtorch.auto_low import sequential_lower
class LinearFP(Module):
    """
    a low precision Logistic Regression model
    """
    def __init__(self):
        super(LinearFP, self).__init__()
        self.W = Linear(5, 1)

    def forward(self, x):
        out = self.W(x)
        return out

fp_model = LinearFP()

forward_num = FixedPoint(wl=4, fl=2)
backward_num = FloatingPoint(exp=5, man=2)
lp_model = sequential_lower(fp_model, layer_types=['linear'],
                            forward_number=forward_num, backward_number=backward_num)
[15]:
print("Full Precision Model: ")
print(fp_model)
Full Precision Model:
LinearFP(
  (W): Linear(in_features=5, out_features=1, bias=True)
)
[16]:
print("Low Precision Model: ")
lp_model
Low Precision Model:
[16]:
LinearFP(
  (W): Sequential(
    (0): Linear(in_features=5, out_features=1, bias=True)
    (1): Quantizer()
  )
)