Under the Hood: YOLOv8 Architecture Explained

Dec 20, 2023

YOLOv8 is a deep learning model that detects objects in real time in computer vision applications. With its architecture and algorithms, YOLOv8 has revolutionized the object detection field and enabled accurate object detection in real-world scenarios.

YOLOv8 has become important in robotics, autonomous driving, and video surveillance. Its architecture uses computer vision techniques and machine learning algorithms to identify and localize objects in images and videos.

Quick Take

YOLOv8 is a deep learning model that detects objects in computer vision systems.
The advanced architecture and advanced algorithms ensure accurate object detection.
YOLOv8 is used in robotics, autonomous driving, and video surveillance industries.
The model uses computer vision techniques and machine learning algorithms to recognize objects in real-world scenarios.

What is object detection, and what is its importance in computer vision?

Object detection is how an AI model finds and classifies objects in an image or video, determining their type and exact location. It is the basis of computer vision tasks.

Self-driving cars rely on object detection to navigate the terrain and make critical decisions for safe road traffic. Robotics uses object detection to recognize objects and interact with them intelligently. Video surveillance systems use them to detect and track suspicious activity in real time.

Convolutional neural networks (CNNs) are used to achieve accuracy in object detection. They are borrowed from biological systems and trained to extract important features from images and make predictions about the presence and location of objects. One such model is YOLOv8, which uses the capabilities of ZNM to achieve real-time object detection.

The Evolution of YOLO: From YOLOv1 to YOLOv8

YOLO, short for You Only Look Once, was released in 2015 with a research paper titled "You Only Look Once: Unified Real-Time Object Detection."

Since its inception, YOLO has evolved through several iterations, each building on the achievements of its predecessors. Let's examine the key benefits of each version.

YOLO Version	Year	Main Advancements
YOLOv1	2015	Introduction of real-time object detection using a grid-based approach
YOLOv2	2016	Incorporation of anchor boxes, feature pyramid networks, and multi-scale prediction
YOLOv3	2018	Improvements in accuracy and speed with the introduction of Darknet-53 and multiple detection scales
YOLOv8	2021	State-of-the-art advancements in real-time object detection with improved accuracy and speed
YOLOv9	2024	Introduction of Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN)
YOLOv12	2025	Integration of attention mechanisms, Area Attention, R-ELAN, improving accuracy and speed

The evolution from YOLOv1 to YOLOv12 demonstrates the work researchers and practitioners have done to advance the field and ensure that real-time object detection systems operate efficiently and accurately.

Key Features of YOLOv8 for Object Detection

1. Pre-trained models. YOLOv8 uses pre-trained models on a large dataset. These models can identify and classify objects, making them suitable for object detection applications.

2. Custom models. YOLOv8 allows you to create models tailored to your specific object detection needs. This involves data preparation, which involves selecting and annotating the desired object types in the training dataset.

3. Data preparation involves carefully curating and labeling the training dataset to provide the AI model with accurate examples of the desired object types.

4. Web application support. By integrating into web browsers, YOLOv8 allows you to develop robust and intuitive object detection interfaces without installing additional software.

Object Detection Methods in YOLOv8

YOLOv8 includes methods for object detection, classification, and image segmentation. These methods use different approaches to detect and localize objects in images.

Method	Approach	Use Case
Classification	Assigning class labels to an entire image	Identifying general content or context of an image
Object Detection	Identifying and locating multiple objects within an image	Autonomous driving, robotics, video surveillance
Image Segmentation	Identifying exact shapes and boundaries of objects	Detailed image analysis and understanding

Each method has strengths and applications, and the choice of method may vary depending on the specific task and requirements.

Getting Started with YOLOv8

You will need a Python environment to work with YOLOv8. Jupyter Notebook is flexible and easy to use.

After setting up the Python environment, you must install the necessary packages. YOLOv8 is built on PyTorch, a deep learning framework.

You also need to install the ultralytics package to work with YOLOv8. The ultralytics package provides a convenient Python API for implementing and working with YOLOv8 models. To install it, use the command in the Python environment:

!pip install ultralytics

Now you can create your YOLOv8 models. In your Python code, import the necessary modules, including the "ultralytics" module. Then, an instance of the YOLO class from the ultralytics module will be initialized to create a YOLOv8 model. Here is an example:

from ultralytics import YOLO model = YOLO()

The YOLOv8 package also has pre-trained AI models that you can use. To load a pre-trained YOLOv8 model, specify the name of the model file. For example:

model = YOLO(weights="yolov8m.pt")

With this instruction, you can create a basis for training using a YOLOv8 model.

Sample YOLOv8 Model Architecture

Sample	YOLOv8	Model Architecture
Layer	Output Shape	Number of Parameters
Conv2d	(3, 608, 608)	1,792
BatchNorm2d	(64, 608, 608)	128
LeakyReLU	(64, 608, 608)	0
MaxPool2d	(64, 304, 304)	0
Conv2d	(128, 304, 304)	73,856
BatchNorm2d	(128, 304, 304)	256
LeakyReLU	(128, 304, 304)	0
MaxPool2d	(128, 152, 152)	0
…	…	…
Conv2d	(1024, 76, 76)	2,359,296
BatchNorm2d	(1024, 76, 76)	2,048
LeakyReLU	(1024, 76, 76)	0
Conv2d	(255, 76, 76)	261,375

Training and Using the YOLOv8 Model for Object Detection

Fine-tuning the YOLOv8 model allows you to fine-tune and detect objects. Fine-tuning involves training the AI model on a specific dataset to improve accuracy and performance in detecting particular classes of objects.

The YOLOv8 training dataset contains images and their corresponding annotations or labels. It should cover a variety of instances of the objects you want to detect.

Image Prediction. After training, the YOLOv8 model can be used to predict images. By calling the "predict" method and providing an input image, the AI model analyzes the image and generates predictions about the presence and location of objects.
Bounding boxes indicate the locations of detected objects in the images. YOLOv8 provides correct predictions of bounding boxes that accurately determine the position and extent of each detected object.
Classes refer to different categories or types of objects you want to detect. YOLOv8 supports identifying and classifying various object classes when working with a pre-trained model or one tuned explicitly to a dataset.

Image Prediction Steps

Steps	Training YOLOv8	Utilizing YOLOv8 for Image Prediction
1	Prepare the dataset	Load the trained model
2	Specify the path to the dataset descriptor file	Call the "predict" method
3	Train the YOLOv8 model using the dataset descriptor file	Provide an input image for analysis
4	Fine-tune the model for specific object classes	Retrieve predictions, including bounding boxes and class labels

YOLOv8 Architectural Advantages

Better network architecture. Modules and convolutions have been replaced to optimize performance, enabling fast and accurate object detection. YOLOv8 now handles large-scale datasets while retaining the ability to process data in real time.

Boundless detection is a technique that automatically assumes bounding boxes at the center of objects. This eliminates the need for predefined reference frames, making the AI model robust and adaptable to different sizes and shapes of objects.

There are training tricks for better accuracy. One of these techniques is stop mosaicking. This technique merges multiple images into a single training set until the end of training. This prevents overtraining and improves the overall performance of the AI model.

Disjointed head approach. By eliminating the object branch, YOLOv8 becomes more efficient and accurate in object detection. This design simplifies the AI model's architecture, reducing computational complexity.

These advances in network architecture make YOLOv8 still the most popular deep learning model for object detection.

Further improvements after YOLOv8

After the release of YOLOv8 in 2023, the YOLO model series has been updated.

YOLOv9 (2024). GELAN (Generalized Efficient Layer Aggregation Network) is an improved feature extraction framework.
PGI (Programmable Gradient Information), a deeper learning through informative backpropagation. Better accuracy with fewer parameters.
YOLOv10 (2024). Non-Maximum Suppression (NMS) was replaced with a better feature filtering method. Large-Kernel Convolutional Networks improve the AI model's ability to detect contexts and large objects.
YOLOv11 (2024). Multitasking allows simultaneous detection, classification, and segmentation. It allows for better deployment in production environments and improved adaptation for complex datasets.

Advantages of the latest version YOLOv12

YOLOv12 is a modern object detection model that combines attentional mechanisms with high processing speed.

Key aspects of this version:

The Area Attention (A²) mechanism divides the feature map into equal segments, reducing its complexity from O(n²) to O(n). It maintains a large receptive field without excessive memory usage.
It provides faster feature extraction than traditional CNNs.
R-ELAN (Residual Efficient Layer Aggregation Networks). This architecture uses residual connections to prevent blocking gradients. It reduces the depth of stack blocks and improves optimization. It provides stable training for large-scale AI models.
FlashAttention optimizes I/O operations, reducing memory access time. Accelerates inference without additional computational overhead.
Supports various computer vision tasks:

Object detection.
Object segmentation.
Image classification.
Position estimation.

FAQ

What is YOLOv8?

YOLOv8 is a deep learning model that detects objects in computer vision applications.

What is object detection, and why is it important in computer vision?

Object detection is the identification and localization of objects in images or videos. Computer vision allows you to understand and interact with the visual world.

How has YOLO evolved?

Over the years, YOLO has released 12 versions, which have been updated with the development of the artificial intelligence industry.

What are the main features of YOLOv8 for object detection?

YOLOv8 uses pre-trained AI models and custom models. It supports data preparation to train custom models and allows you to create web applications for real-time object detection in a web browser.

What object detection methods are used in YOLOv8?

YOLOv8 includes methods for classification, object detection, and image segmentation.

What are the advantages of the YOLOv8 architecture?

YOLOv8 includes optimized modules and convolutions, unbound detection, training techniques, and a decoupled head approach to object detection.

What's new in the latest version of YOLOv12?

YOLOv12 uses Area Attention (A²), a new approach to spatial attention that reduces complexity and improves accuracy.

R-ELAN provides better layer coupling, which speeds up training. FlashAttention speeds up work without losing accuracy. It supports OBB (Objects with Orientation), segmentation, and classification in a single model.

Keylabs

Keylabs: Pioneering precision in data annotation. Our platform supports all formats and models, ensuring 99.9% accuracy with swift, high-performance solutions.

Recommended for you

Satellite Imagery Labeling: Extracting Information from Geospatial Data

3 days ago • 5 min read

Calculating the ROI of Annotation: Balancing Quality, Speed, and Budget

9 days ago • 9 min read

Human QA at Scale: Ensuring Quality When Labeling Thousands of Samples

10 days ago • 7 min read

Annotating for Domain-Specific Fine-Tuning: Tailoring Models to Your Use Case

15 days ago • 8 min read

Integration Testing for Labeled Data: Ensuring Consistency Across the Pipeline

17 days ago • 11 min read