Request a


+886 2 8797 8337


12F-2, No.408, Ruiguang Rd., Neihu Dist., Taipei City 11492, Taiwan


Center of Innovative Incubator R819, No. 101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan



A bundle of C++ libraries and tools to boost your development of compiler for deep learning accelerators (DLAs).

ONNC compiler targets on diverse SoC architectures from a simply single core system to a heterogeneous system with multi-level memory hierarchy and bus connection.

Implementing a compiler from scratch is not easy. Our package contains off-the-shelf parsers, lowering procedures, and optimization algorithms that can keep you from reinventing the wheel. You merely focus on the core algorithms of your compiler that truly distinguishs your product, and integrate ONNC compiler as the rest. ONNC compiler is designed modularly with well-defined interfaces for ease of integration. Especially ONNC compiler supports integration with TVM and LLVM.

Compiler flow commonly separates into three parts in order: frontend, middleend, and backend. For a DLA compiler, the frontend is expected to support a rich set of model formats, and then make the parsed graph compact and explicit enough for ease of further optimization.

The middleend is expected to partition a model graph into groups, each of which will be lowered into DLA, DSP, or CPU backend, that when working together, can achieve high performance in a heterogeneous system.

The backend is expected to perform hardware-dependent optimization of scheduling and resource allocation to maximize performance and minimize memory footprint.


For the frontend

Parsing model files from popular AI frameworks of ONNX, PyTorch, TensorFlow, and etc.
Shape inference to resolve non-trivial tensor shapes in models.
Constant folding and redundency removal to reduce model size.
Lowering into a so-called ONNC IR, which is a high-level IR composed of coarse-grained operators like convolution and pooling.

For the middleend

Partitioning for heterogeneous computing
Off-the-shelf CPU fallback machanism for popular CPUs like ARM and RISC-V.
Operator tiling to fit DLA design constraints.

For the backend

We abstracted the hardware architecture into a set of pre-defined configurations, like memory size, bus address alignment, functions involved in the pipeline, and so on. We formalized those configurations into an easy-to-learn, so-called machine description language. Then we developed corresponding hardware-dependent algorithms. This way you can easily take advantage of the off-the-shelf hardware-dependent optimizations from ONNC compiler.


Key optimizations

Software pipelining
Scheduling for multiple resources
DMA operator insertion
Memory allocation supporting multi-level hierarchy
Bus allocation


Modulized for easy integration with your own compiler in the way you need
Retargetable for satisfying many different hardware architectures' need
Optimized in terms of performance and resource utilization