YOLO-NAS : SOTA in Object Detection

Jyothish
4 min readMay 21

--

YOLO Series

YOLO, which stands for “You Only Look Once” is a popular real-time object detection algorithm in computer vision introduced in 2016. YOLO revolutionized object detection by proposing a different approach compared to traditional methods where it divided images into a grid and predicts the bounding boxes and class probabilities directly from the input image in a single pass.

YOLO operates at a high speed because it avoids the need for multiple passes through the image when compared to the two-stage complex region proposal object detection networks like Fast-RCNN and Faster-RCNN. The single forward pass of the YOLO network allows it to achieve real-time object detection on both images and videos.

Over the years, there have been several versions of YOLO, including the state of the arts (SOTA) YOLOv6, YOLOv7, YOLOv8, PP-YOLO, YOLOR, and the latest YOLO-NAS. Each version introduced improvements and architectural changes to enhance both speed and accuracy.

source: deci.ai

Read more about the comparison of different versions of the YOLO model through link1 and link2.

YOLO-NAS

YOLO-NAS (YOLO Neural Architecture Search) is the latest state-of-the-art YOLO model released by Deci in May 2023 that outperforms its predecessors in terms of performance.

Some of the key features of the YOLO-NAS algorithm are described below:

  • The architecture of the algorithm was found using the company’s proprietary technology AutoNAC which is a neural architecture search-NAS method.
  • AutoNAC was used to determine optimal sizes and structures of stages, including block type, number of blocks, and number of channels in each stage.
  • During the NAS process, quantization-aware RepVGG blocks (QSP and QCI blocks in the above diagram) were introduced into the model architecture, ensuring that the model architecture would be compatible with Post-Training Quantization (PTQ) to enable minimal accuracy loss.
  • It uses a hybrid quantization method that selectively quantizes certain parts of a model, reducing information loss and balancing latency and accuracy.
  • It was trained on Objects365, a diverse dataset for object detection consisting of 2 million images across 365 categories with 30 million bounding boxes.
  • It was also trained on the RoboFlow100 dataset or RF100, a collection of 100 datasets from diverse domains, to demonstrate its ability to handle complex object detection tasks.
  • It also incorporates Attention Mechanism, Knowledge Distillation and Distribution Focal Loss to enhance its training process.
  • It is fully compatible with high-performance inference engines like NVIDIA TensorRT and supports INT8 quantization for unprecedented runtime performance. This allows YOLO-NAS to excel in real-world scenarios, such as autonomous vehicles, robotics, and video analytics applications, where low latency and efficient processing are essential.

Below is a per-category breakdown of YOLO-NAS’s performance on the RF-100 dataset, compared to the performance of v5/v7/v8 models:

Example

Refer this notebook to do an inference and custom training on YOLO-NAS architecture

References

--

--