YOLOv9: Real-time object detection that quickly and accurately identifies and locates multiple objects in an image or video

Compared to previous YOLO series models, YOLOv9 achieves lightweight models without sacrificing performance while maintaining higher accuracy and efficiency.

This allows it to run on a variety of devices and environments, such as mobile devices, embedded systems, and edge computing devices.

YOLOv9 improves the accuracy and efficiency of object detection by improving model architecture and training methods.

Main functions:

The core function of YOLOv9 is real-time object detection, which can quickly and accurately identify and locate multiple objects in an image. This includes but is not limited to multiple categories of objects such as people, vehicles, and animals. YOLOv9 is particularly suitable for application scenarios that require high-performance real-time processing, such as video surveillance, autonomous vehicles, robot vision systems, etc.

1. Object detection: YOLOv9 is able to identify multiple objects in a single image and give their locations and classifications.

2. Real-time performance: The design takes into account the balance of speed and accuracy, making YOLOv9 suitable for real-time object detection tasks.

3. Suitable for various scale models: Through the proposed technology, YOLOv9 can be applied to various deep learning models from lightweight to large-scale.

Technological innovation:

Programmable Gradient Information (PGI): YOLOv9 introduces the concept of Programmable Gradient Information (PGI) to solve the problem of information loss during data transmission in deep neural networks. Through PGI, models can effectively transfer gradient information while maintaining the integrity of input data, thereby improving learning efficiency and model performance.

Generalized Efficient Layer Converged Network (GELAN): YOLOv9 has designed a new lightweight network architecture, GELAN, that is based on gradient path planning to optimize the network’s parameter utilization and computational efficiency. Through its improved network structure, GELAN enables YOLOv9 to achieve higher accuracy and faster processing speeds while remaining lightweight.

Working principle:

YOLOv9 works on the basis of previous YOLO series models, predicting the location and category of objects by analyzing the entire image at once. The main steps include:

1. Image preprocessing: The input image is first scaled and standardized to adapt to the input requirements of the network.

2. Feature extraction: The image propagates feed-forward through a GELAN network, which extracts the features of the image through multi-layer convolution, pooling and activation functions.

3. Gradient information transfer: PGI technology ensures that key gradient information is retained and effectively transferred during the feature extraction process, thereby improving the accuracy of detection.

4. Object detection: The network output layer analyzes the extracted features and predicts the bounding box, category and confidence level of each object in the image.

5. Post-processing: Finally, the output of the network is processed through techniques such as non-maximum suppression (NMS) to remove overlapping bounding boxes, and finally obtain the detection result of the object.

In general, through its innovative PGI technology and GELAN network architecture, YOLOv9 further improves the accuracy and efficiency of object detection while maintaining the high-speed detection performance of the YOLO series.

Compared to previous YOLO series models

The design and development of YOLOv9 focuses mainly on improving the accuracy and processing efficiency of models in object detection tasks through technological innovation. In particular, YOLOv9 emphasizes lightweight models without sacrificing performance, making it particularly suitable for devices and environments with limited computing resources, such as mobile devices, embedded systems, and edge computing devices. The significance of this is:

1. Higher accuracy: YOLOv9 optimizes the model learning process and network structure by introducing innovative technologies such as Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). Such optimizations can help models learn and recognize objects in images more effectively, resulting in higher accuracy on object detection tasks.

2. Higher efficiency: Through a well-designed lightweight network architecture, YOLOv9 can reduce computation and improve processing speed while maintaining high accuracy. This high efficiency allows YOLOv9 to perform well in real-time object detection applications, responding quickly even on devices with limited computing power.

3. Lightweight model: Lightweight model means less computing resources and storage space are required. This is particularly important for applications running on edge computing devices, which often have limited processing power and memory. YOLOv9 reduces the model size to allow it to run on these devices while maintaining high performance.

4. A wide range of application scenarios: With its high efficiency and lightweight characteristics, YOLOv9 is suitable for a variety of devices and environments, from high-performance servers to edge computing devices, covering a wide range of application scenarios from cloud computing to end-side. This includes but is not limited to areas such as intelligent surveillance, drones, autonomous driving assistance systems, and mobile device applications.

GitHub：https://github.com/WongKinYiu/yolov9
Thesis:https://arxiv.org/abs/2402.13616

Video: