A Recommendation System is a tool used to predict what a user might want to buy or watch. It is used in many industries, such as advertisement and e-commerce. The reason why Recommendation Systems are valued is they directly relate to cooperations' revenue. A small enhancement in accuracy may bring millions of income. DLRM(Deep Learning Recommendation Model) is a game changer in this area.
It greatly improves the accuracy of prediction and hence the quality of the user's experience. Therefore, accelerating cards purposely built for DLRM emerge in the market to make the inference more efficient. Such accelerating cards usually require INT8 precision to enhance performance. ONNC Calibrator helps such accelerating cards remain 99.99% precision during the FP32-to-INT8 quantization.
Recommendation System are usually used in real-time scenarios in the field, such as online shopping and content recommendation. That means the system needs to be able to handle a large number of requests efficiently and response quickly. While DLRM provides great benefits in terms of accuracy, it is also computation heavy. This is where DLRM accelerating cards come in, which are designed to handle the computation load and improve user experience.
Quantization techniques are widely used in these DLRM accelerating cards in order to meet performance requirements. However, quantization is a two-edged sword. It can improve computation efficiency, but on the other hand, it may reduce accuracy. As accuracy directly relate to revenue, it is crucial to remain high accuracy when using quantization techniques when productionizing DLRM accelerating cards.
ONNC Calibrator is a Post Training Quantization(PTQ) tool helps AI Chip design teams keep high precision during quantization. In the DLRM case, it remains 99.99% of precision in INT8 mode, comparing with the original model.
ONNC Calibrator is the only architecture-aware quantizer in the market. It learns IC’s architecture and uses the best policy and algorithm for each layer in a deep learning model. ONNC Calibrator supports multiple policies, variant precision, mix precision and searches the find best parameters to quantize a model.
Another benefit that ONNC Calibrator provides is its convenience for both data scientists and embedded engineers. Using post-training quantization techniques means that data scientists do not need to change their training procedures, comparing with Quantization Aware Training(QAT). The architecture-aware feature makes embedded engineers easily deploy deep learning models to chosen AI chips while performance and accuracy are guaranteed. Such benefits increase motivation of adopting ONNC compatible ICs for data scientists and embedded engineers.
ONNC Calibrator, along with Compiler, Runtime and Virtual Platform, help AI chip design teams build high performance products. ONNC is a full-fledged toolchain for designing AI system software.
ONNC Compiler optimizes and compiles deep learning models into C++ or machine code of target hardware.
ONNC Runtime manages efficiently loading and running compiled models with support of dynamic shape and dynamic batching.
Virtual Platform is a hardware/software co-design tool to reduce development time and complexity in the early-stage of AI chip design.