ONNC on NVDLA AI
Exploits the full power of the underlying AI chips
ONNC x NVDLA Release 1.1.3
- Version: r1.3.0
- Release Date: 2019/7/15
ONNC on NVDLA AI starter kits provides software development kit (SDK) and board support package (BSP) for IC design houses who plan to develop its own deep learning accelerator (DLA) based on NVDLA.
It’s a comprehensive set of FPGA-based prototyping tools including AI-related hardware and software components:
- compiler (ONNC)
- drivers (KMD/UMD)
- Linux kernel
- virtual platform(GreenSocs)
- FPGA netlist
- NVDLA verilog RTL code
AI Starter Kits optimize your design process and ensures that your software runs and exploits the full power of the underlying AI chips. Skymizer aims at providing a unique solution tailored to your needs to help you build a strong hardware/software co-design team and efficiently save your time ahead of AI chip fabrication.
- nv_large configuration support
- Power of compiler optimization – ONNC provides 23 optimization options
- Details, details, details …
- Resolved Issues
nv_large hardware configuration — the highest performance spec in NVDLA family. It contains 2048 MACs, 512K large CONV buffer and broad bus bandwidth (256 AXI width) and aims for the emergence of intelligent devices such as Advanced Driver Assistance Systems (ADAS) and smart factories. To support nv_large configuration, we create a new compiler
onnc.large and upgrade c-model of GreenSocs virtual platform.
onnc-createprovides nv_large configuration
- new compiler –
- kernel mode driver provides probe and initialization functions for nv_large
- new GreenSocs virtual platform –
ONNC has 23 new optimization options for you to exploit performance from underlying microarchitecture. Most optimizations are intuitive and effective. You can find their descriptions here. ONNC’s help manual also provides a brief of all optimization flags, by typing
--help flag with a larger-than-three verbose level.
onnc.nv_large --help -verbose=3
All optimizations keep the neural network mathematic-equivalent. Some optimizations may split a layer into multiple network by the limitation of hardware buffer size. Another optimizations may fuse a network into a layer for higher MAC utilization. Although topology of the network may be changed, the mathematical semantics of the network is mathematic-equivalent.
A group of optimizations in a compiler for deep learning accelerator (DLA) is used to enable layers which is not directly supported by microarchitecture. Take Concat layer for example, NVDLA microarchitecture doesn’t have Concat processing unit. ONNC must transform a Concat layer into multiple convolution layers to make sure the network can run on NVDLA. ONNC turns on these network enabling optimization by default.
New optimization options are listed here.
Version r1.3.0 contains a tutorial program to help users get a better understanding of user mode driver (UMD). The program demonstrates how to use UMD to write a Handwritten Digit Recognizer. (MNIST handwritten digits)
ComputeGraph::topologicalSort()after the parts of the operator disappears
- Fix calculation of address offset under image model error
Convweight and bias in the correct order
Relufusion( If there is no bias,
Convwill not execute
- Fix compling single layer model crash issue
- Fix onnx library expection output crash issue
- Fix invalid loadable output when only
Softmaxis in the model
LRNimplementation set value error
- Fix the calculation of group
Convread/write offset error
- Fix the calculation of
Concatshared memory offset error
- Fix model output
Tensorif other operator input crash