Native Calibration Interface

There are two kinds of quantization points in NVDLA data path: hardware quantization points and software-defined quantization points, which affects the precision of AI model result. Open Neural Network Compiler (ONNC) transforms model-level layers into hardware-level layers with delicate optimization algorithms. The transformation rarely does one-to-one mapping, and the topology of the neural network might be changed for fitting hardware constraints.

To help users understand the transformation inside ONNC, furthermore, setting software-defined quantization points, ONNC provides a native quantization register/parameter interface. Users can adapt their own quantization scheme and calibration tool through the interface.

Calibration Table Example in JSON Format

The interface is defined in JSON schema. We call the JSON file a calibration table (CTable).

{
  "version":{
    "major": 0,
    "minor": 1,
    "sub_minor": 0
  },
  "qinfo":{
    "qerror": "default",
    "qstrategy": "sls",
    "qthreshold": "l2"
  },
  "conv1_w_0.weight":{
    "offset": 0.0017355285817757249,
    "scale": 0.0034710571635514498
  },
  "conv1_b_0.weight":{
    "offset": 0.0034710571635514498,
    "scale": 0.0069421143271028996
  },
  "conv1_1.conv":{
    "out_cvt.truncate": 0
  },
  "conv1_1.sdp":{
    "out_cvt.offset": 0,
    "out_cvt.scale": 17597,
    "out_cvt.truncate": 25,
    "x1_op.shift_value": 0,
    "x1_op.truncate": 0
  },
  "prob_1.emu":{
    "input_scale_factor": 0.11757779121398926,
    "output_scale_factor": 0.007874015718698502
  }
}

The example CTable shows the transformed hardware-level layers and users can assign value to quantization registers in each layer.

Generate a calibration table template for an ONNX model

ONNC compiler provides an option to generate a CTable template. Transformed hardware-level layers are listed in the CTable template and provide a way to help users to understand the transformation. Here is an ONNC command example.

onnc.nv_small -gen-ctable <json_file> <onnx_model_name>

-gen-ctable <json_file> The option specifies an output JSON file name for the generated template.

Load software-defined quantization points in a JSON file for an ONNX model

You should revise the generated calibration template. If you have a calibration tool, you can use it to assign the correct value to each quantization register. If you’re using ONNC calibrator, you can load the CTable generated by calibrator directly. -load-ctable <json_file> option specifies a CTable to load.

onnc.nv_small -load-ctable <json_file> <onnx_model_name>

The command line above will override the default values for the registers and parameters defined in the JSON file.

JSON Schema for NVDLA Quantization Information

We start with three properties called keywords which are expressed as JSON keys.

The version keyword records the schema version numbers.
The qinfo keyword defines the quantization strategy, quantization error scheme, and the quantization threshold.

The third property is a collection of keywords that states the hardware layer name and its mapped block name. Its value is another level of JSON schema that defines the quantization information for its corresponding mapped block.

{ "version" : { major : 0, minor : 1, sub_minor: 0 }, "qinfo":{ qstrategy: "sls", qerror: "default", qthreshold: "max" }, "<hw_layer_name>.<op_block_name>" : { "<reg_or_param_name>": <value>, ... }, ... }

<op_block_name> has six options in the following list.

  • CONV
  • SDP
  • LUT
  • CDP
  • EMU
  • WEIGHT

For each option, its quantization-related registers or parameters,<reg_or_param_name>, are listed in the subsequent sections.

CONV (Direct mode)

Loadable Parameter Value Range Input bitwidth Output bitwidth
out_cvt.truncate 0~16 34 32

CONV (Image mode)

Loadable Parameter Value Range
in_cvt.enable 0:disable 1:enable
in_cvt.scale int16
in_cvt.truncate 0~63
in_cvt.offset int16
mean_ry uint16
mean_gu uint16
mean_bv uint16
mean_ax uint16
mean_format 0:disable 1:enable
out_cvt.truncate 0~16

SDP

Loadable Parameter Value Range Input bitwidth Output bitwidth
x1_op.shift_value 0~63 16 32
x1_op.truncate 0~63 49 32
x2_op.shift_value 0~63 16 32
x2_op.truncate 0~63 49 32
y_op.cvt.alu_cvt.enable 0:disable 1:enable
y_op.cvt.alu_cvt.offset int16 16 17
y_op.cvt.alu_cvt.scale int16 17 33
y_op.cvt.alu_cvt.truncate 0~63 33 32
y_op.cvt.mul_cvt.enable 0:disable 1:enable
y_op.cvt.mul_cvt.offset int16 16 17
y_op.cvt.mul_cvt.scale int16 17 33
y_op.cvt.mul_cvt.truncate 0~63 33 32
out_cvt.offset int32 32 33
out_cvt.scale int16 33 49
out_cvt.truncate 0~63 49 8

LUT

Parameter Value Range
lut_param.linear_exp_offset.exp_offset -128 ~ +127
lut_param.linear_only_offset.frac_bits -128 ~ +127
lut_param.linear_only_start 0 ~ 238-1
lut_param.linear_only_end 0 ~ 238-1

CDP

Loadable Parameter Value Range Input bitwidth Output bitwidth
cdp:in_cvt.offset int8 8 9
cdp:in_cvt.scale int16 9 25
cdp:in_cvt.truncate 0~31 25 9
cdp:out_cvt.offset int25 25 26
cdp:out_cvt.scale int16 26 42
cdp:out_cvt.truncate 0~63 42 8
cdp.lut_index 0~255

EMU

Parameter Value Range
input_scale_factor float32
output_scale_factor float32

WEIGHT

Parameter Value Range
offset float32
scale float32
Back to Top