Deep Dive into YOLOv5 Object Detection Code

Table of Contents

Deep Dive into YOLOv5 Object Detection Code

This article offers an in-depth analysis of YOLOv5’s training process and data augmentation mechanisms, helping to organize and summarize the internal implementation details of the YOLOv5 object detection model.


1. Analysis of train.py File

1.1 Import Section

import argparse
import math
import os
import random
import subprocess
import sys
import time
from copy import deepcopy
from datetime import datetime, timedelta
from pathlib import Path

try:
    import comet_ml  # must be imported before torch (if installed)
except ImportError:
    comet_ml = None

import numpy as np
import torch
import torch.distributed as dist
import torch.nn as nn
import yaml
from torch.optim import lr_scheduler
from tqdm import tqdm

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

import val as validate  # for end-of-epoch mAP
from models.experimental import attempt_load
from models.yolo import Model
from utils.autoanchor import check_anchors
from utils.autobatch import check_train_batch_size
from utils.callbacks import Callbacks
from utils.dataloaders import create_dataloader
from utils.downloads import attempt_download, is_url
from utils.general import (
    LOGGER,
    TQDM_BAR_FORMAT,
    check_amp,
    check_dataset,
    check_file,
    check_git_info,
    check_git_status,
    check_img_size,
    check_requirements,
    check_suffix,
    check_yaml,
    colorstr,
    get_latest_run,
    increment_path,
    init_seeds,
    intersect_dicts,
    labels_to_class_weights,
    labels_to_image_weights,
    methods,
    one_cycle,
    print_args,
    print_mutation,
    strip_optimizer,
    yaml_save,
)
from utils.loggers import LOGGERS, Loggers
from utils.loggers.comet.comet_utils import check_comet_resume
from utils.loss import ComputeLoss
from utils.metrics import fitness
from utils.plots import plot_evolve
from utils.torch_utils import (
    EarlyStopping,
    ModelEMA,
    de_parallel,
    select_device,
    smart_DDP,
    smart_optimizer,
    smart_resume,
    torch_distributed_zero_first,
)

LOCAL_RANK = int(os.getenv("LOCAL_RANK", -1))
RANK = int(os.getenv("RANK", -1))
WORLD_SIZE = int(os.getenv("WORLD_SIZE", 1))
GIT_INFO = check_git_info()

1.2 Detailed Explanation of the Train() Function

The Train() function is the core function of YOLOv5 training, responsible for managing the entire training process:

def train(hyp, opt, device, callbacks):
    """
    Train a YOLOv5 model on a custom dataset using specified hyperparameters, options, and device, managing datasets,
    model architecture, loss computation, and optimizer steps.

    Args:
        hyp (str | dict): Path to the hyperparameters YAML file or a dictionary of hyperparameters.
        opt (argparse.Namespace): Parsed command-line arguments containing training options.
        device (torch.device): Device on which training occurs, e.g., 'cuda' or 'cpu'.
        callbacks (Callbacks): Callback functions for various training events.

    Returns:
        None

    Models and datasets download automatically from the latest YOLOv5 release.

    Example:
        Single-GPU training:
        ```bash
        $ python train.py --data coco128.yaml --weights yolov5s.pt --img 640  # from pretrained (recommended)
        $ python train.py --data coco128.yaml --weights '' --cfg yolov5s.yaml --img 640  # from scratch
        ```

        Multi-GPU DDP training:
        ```bash
        $ python -m torch.distributed.run --nproc_per_node 4 --master_port 1 train.py --data coco128.yaml --weights
        yolov5s.pt --img 640 --device 0,1,2,3
        ```

        For more usage details, refer to:
        - Models: https://github.com/ultralytics/yolov5/tree/master/models
        - Datasets: https://github.com/ultralytics/yolov5/tree/master/data
        - Tutorial: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
    """

The main steps executed by the function include:

  1. Parameter parsing and initialization: Process input parameters, set up save directory and training configuration
  2. Loading hyperparameters: Load from YAML file or use the passed hyperparameter dictionary
  3. Configuring logging system: Initialize logger and callback functions
  4. Loading dataset: Validate dataset format and get training and validation paths
  5. Model creation or loading: Create a new model or load pre-trained weights based on configuration
  6. Optimizer configuration: Set up optimizer, learning rate scheduler, and EMA (Exponential Moving Average)
  7. Data loader creation: Create training and validation data loaders
  8. Start training loop: Execute multiple epochs of training, each including:
    • Training phase (forward pass, loss calculation, backward pass, parameter update)
    • Optional validation phase (calculate metrics like mAP)
    • Model saving and early stopping check
  9. End of training: Save final model, perform final validation, release resources

Particularly noteworthy is the training data loader creation section:

# Trainloader
train_loader, dataset = create_dataloader(
    train_path,
    imgsz,
    batch_size // WORLD_SIZE,
    gs,
    single_cls,
    hyp=hyp,
    augment=True, # Data augmentation is enabled by default in training
    cache=None if opt.cache == "val" else opt.cache,
    rect=opt.rect,
    rank=LOCAL_RANK,
    workers=workers,
    image_weights=opt.image_weights,
    quad=opt.quad,
    prefix=colorstr("train: "),
    shuffle=True,
    seed=opt.seed,
)

In contrast, the validation data loader does not have data augmentation enabled:

# Process 0
# No data augmentation in validation data loader:
# In comparison, data augmentation is not enabled in the validation data loader, which is reasonable 
# since validation should be performed on the original, unaugmented data:
# Note that the augment parameter is not specified here, it will use the default value False in the create_dataloader function.
if RANK in {-1, 0}:
    val_loader = create_dataloader(
        val_path,
        imgsz,
        batch_size // WORLD_SIZE * 2,
        gs,
        single_cls,
        hyp=hyp,
        cache=None if noval else opt.cache,
        rect=True,
        rank=-1,
        workers=workers * 2,
        pad=0.5,
        prefix=colorstr("val: "),
    )[0]

1.3 Data Augmentation in Training

YOLOv5 training enables data augmentation by default. Based on code analysis, we can confirm the following points:

  1. Data augmentation settings in the training data loader: In the train.py file’s create_dataloader call, the augment parameter is explicitly set to True:

    train_loader, dataset = create_dataloader(
        train_path,
        imgsz,
        batch_size // WORLD_SIZE,
        gs,
        single_cls,
        hyp=hyp,
        augment=True,  # Data augmentation is enabled by default in training
        cache=None if opt.cache == "val" else opt.cache,
        rect=opt.rect,
        rank=LOCAL_RANK,
        workers=workers,
        image_weights=opt.image_weights,
        quad=opt.quad,
        prefix=colorstr("train: "),
        shuffle=True,
        seed=opt.seed,
    )
    
  2. No data augmentation in validation data loader: In contrast, data augmentation is not enabled in the validation data loader, which is reasonable since validation should be performed on the original, unaugmented data.

In the LoadImagesAndLabels class’s __getitem__ method, various data augmentation techniques are applied when augment=True:

  1. Mosaic augmentation: Combines 4 different images into one, enhancing multi-scale training and small object detection capabilities

    if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
        img, labels = self.load_mosaic(index)
    
  2. MixUp augmentation: Mixes two images at a certain ratio, increasing the complexity of training data

    if random.random() < hyp["mixup"]:
        img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
    
  3. Random perspective transformation: Includes rotation, translation, scaling, shearing, and other geometric transformations

    img, labels = random_perspective(
        img,
        labels,
        degrees=hyp["degrees"],
        translate=hyp["translate"],
        scale=hyp["scale"],
        shear=hyp["shear"],
        perspective=hyp["perspective"],
    )
    
  4. Albumentations library augmentation: Additional augmentations provided by the powerful Albumentations image augmentation library

    img, labels = self.albumentations(img, labels)
    
  5. HSV color space augmentation: Adjusts hue, saturation, and brightness

    augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
    
  6. Random flipping: Vertical and horizontal flipping

    if random.random() < hyp["flipud"]:
        img = np.flipud(img)
    
    if random.random() < hyp["fliplr"]:
        img = np.fliplr(img)
    
  7. Cutout (commented out): Randomly masks certain areas in the image to enhance the model’s robustness

1.4 Augmentation Parameter Control

Specific parameters for data augmentation are controlled through the hyperparameter file (hyp.yaml), including:

ParameterDescriptionFunction
mosaicProbability of applying Mosaic augmentationControls whether to apply Mosaic augmentation
mixupProbability of applying MixUp augmentationControls whether to apply MixUp augmentation
hsv_hHSV hue adjustment intensityControls the range of hue variation
hsv_sHSV saturation adjustment intensityControls the range of saturation variation
hsv_vHSV brightness adjustment intensityControls the range of brightness variation
degreesRotation angle rangeControls the maximum angle of random rotation
translateTranslation rangeControls the maximum ratio of random translation
scaleScaling rangeControls the maximum ratio of random scaling
shearShearing rangeControls the maximum angle of random shearing
perspectivePerspective transformation intensityControls the intensity of perspective transformation
flipudVertical flip probabilityControls the probability of vertical flipping
fliplrHorizontal flip probabilityControls the probability of horizontal flipping

1.5 Conclusion

YOLOv5 enables a rich set of data augmentation strategies by default during training, which is one of the key factors enabling its high detection performance. These augmentations include image fusion (Mosaic and MixUp), geometric transformations, color adjustments, and random flipping. These techniques work together to greatly increase the diversity of training data, helping the model learn more robust features and improving its detection capabilities for objects in different environments and conditions.

When using YOLOv5 to train your own dataset, you don’t need to manually enable data augmentation as it’s already enabled by default. If you need to adjust the intensity of augmentations, you can modify the relevant parameters in the hyperparameter file.


2. YOLOv5 Data Loading and Augmentation Workflow

The entire data loading and augmentation process involves call relationships among multiple functions and classes. Below is a detailed explanation of this workflow:

2.1 Call Relationship

  1. First, the create_dataloader function is called in train.py:

    train_loader, dataset = create_dataloader(
        train_path,
        imgsz,
        batch_size // WORLD_SIZE,
        gs,
        single_cls,
        hyp=hyp,
        augment=True,  # augment=True is set here
        cache=None if opt.cache == "val" else opt.cache,
        rect=opt.rect,
        rank=LOCAL_RANK,
        workers=workers,
        image_weights=opt.image_weights,
        quad=opt.quad,
        prefix=colorstr("train: "),
        shuffle=True,
        seed=opt.seed,
    )
    
  2. Inside the create_dataloader function, an instance of the LoadImagesAndLabels dataset class is created:

    dataset = LoadImagesAndLabels(
        path,
        imgsz,
        batch_size,
        augment=augment,  # The augment parameter is passed to LoadImagesAndLabels here
        hyp=hyp,
        # Other parameters...
    )
    
  3. The create_dataloader function finally returns a PyTorch DataLoader and the dataset:

    return loader(
        dataset,
        batch_size=batch_size,
        # Other parameters...
    ), dataset
    

2.2 Specific Implementation of Data Augmentation

Data augmentation occurs in the __getitem__ method of the LoadImagesAndLabels class when the training process needs a batch of data:

  1. When augment=True, the LoadImagesAndLabels class sets during initialization:

    self.augment = augment
    self.mosaic = self.augment and not self.rect  # mosaic is only enabled when augment=True
    self.albumentations = Albumentations(size=img_size) if augment else None
    
  2. In the __getitem__ method, if self.augment=True, various augmentations are applied:

    if self.augment:
        # Random perspective transformation
        img, labels = random_perspective(...)
    
        # Albumentations library augmentation
        img, labels = self.albumentations(img, labels)
    
        # HSV color space augmentation
        augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
    
        # Random flipping
        if random.random() < hyp["flipud"]:
            img = np.flipud(img)
    
        if random.random() < hyp["fliplr"]:
            img = np.fliplr(img)
    

2.3 Complete Call Chain

The actual call chain is as follows:

  1. train.py → Call create_dataloader(augment=True)
  2. create_dataloader → Create LoadImagesAndLabels(augment=True)
  3. create_dataloader → Use the above dataset to create and return DataLoader
  4. When the training loop executes, DataLoader → Call LoadImagesAndLabels.__getitem__
  5. LoadImagesAndLabels.__getitem__ → Apply various data augmentations based on augment=True
graph TD
    A[train.py] -->|Call| B[create_dataloader]
    B -->|Create| C[LoadImagesAndLabels]
    B -->|Return| D[DataLoader]
    D -->|Training loop requests data| E[__getitem__]
    E -->|Apply| F[Data Augmentation]

2.4 Conclusion

The augment=True parameter set in train.py is ultimately passed to the LoadImagesAndLabels class and triggers various data augmentation operations in that class’s __getitem__ method. This is a typical PyTorch data loading workflow: first define a dataset class (handling the loading and augmentation of individual samples), then wrap it with DataLoader (handling batches, multi-threading, etc.).

YOLOv5 adopts this design to achieve:

  1. Clear code structure (separation of data loading and model training)
  2. Efficient data processing (multi-threaded preloading)
  3. Flexible augmentation operations (can be enabled or disabled as needed)

This is why the system can automatically apply complex data augmentation strategies after setting augment=True in train.py.


3. Detailed Explanation of YOLOv5’s Data Loading Mechanism

3.1 Data Loader Creation Process

In YOLOv5’s training process:

  1. The create_dataloader function creates a data loader:

    • This function first creates an instance of the LoadImagesAndLabels class as the dataset
    • Then wraps this dataset in PyTorch’s DataLoader or InfiniteDataLoader
    • Finally returns this data loader and the dataset
  2. The LoadImagesAndLabels class acts as the dataset:

    • This class inherits from PyTorch’s Dataset class
    • It is responsible for managing data loading, preprocessing, and augmentation
    • It defines the logic for obtaining individual data samples

3.2 Main Parameters of the LoadImagesAndLabels Class

This class contains many important parameters, with the following being the main ones:

3.2.1 Basic Path and Image Settings

  • path: Dataset path (can be a directory or file list)
  • img_size: Image size (default 640 pixels)
  • batch_size: Batch size
  • augment: Whether to enable data augmentation
  • hyp: Hyperparameter dictionary, containing probabilities and intensities for various augmentations
  • mosaic: Whether to use Mosaic augmentation (automatically enabled when augment=True and rect=False)
  • albumentations: Whether to use the Albumentations library for augmentation
  • rect: Whether to use rectangular training (using images with similar aspect ratios in a batch)
  • stride: The model’s maximum downsampling rate, used to ensure image dimensions are multiples of the stride
  • pad: Boundary padding size

3.2.4 Cache and Performance Optimization Parameters

  • cache_images: Whether to cache images to speed up training (can be “ram” or “disk”)
  • workers: Number of data loading worker threads

3.2.5 Dataset Characteristic Parameters

  • single_cls: Whether to treat all categories as one category
  • image_weights: Whether to use image weights (based on class frequency)

3.3 Main Functions of the create_dataloader Function

This function completes several key tasks:

def create_dataloader(
    path,
    imgsz,
    batch_size,
    stride,
    single_cls=False,
    hyp=None,
    augment=False,
    cache=False,
    pad=0.0,
    rect=False,
    rank=-1,
    workers=8,
    image_weights=False,
    quad=False,
    prefix="",
    shuffle=False,
    seed=0,
):
    # Create dataset instance
    dataset = LoadImagesAndLabels(
        path,
        imgsz,
        batch_size,
        # Other parameters...
    )
  
    # Configure batch size and sampler
    batch_size = min(batch_size, len(dataset))
    sampler = None if rank == -1 else distributed.DistributedSampler(...)
  
    # Select data loader type
    loader = InfiniteDataLoader if image_weights else DataLoader
  
    # Create and return data loader
    return loader(
        dataset,
        batch_size=batch_size,
        shuffle=shuffle and sampler is None,
        # Other parameters...
    ), dataset
  1. Create dataset: Instantiate the LoadImagesAndLabels class
  2. Determine batch size: Ensure batch size does not exceed dataset size
  3. Set sampler: Choose appropriate sampler based on whether distributed training is being used
  4. Select data loader type: Choose DataLoader or InfiniteDataLoader based on whether image weights are used
  5. Configure data loading parameters:
    • Batch size
    • Whether to shuffle
    • Number of worker threads
    • Sampler
    • Whether to drop the last incomplete batch
    • Memory pinning
    • Collate function (collate_fn)
    • Worker initialization function
    • Random number generator

3.4 Where Data Augmentation Actually Happens

Data augmentation primarily occurs in the __getitem__ method of the LoadImagesAndLabels class. Below is a simplified method flow:

def __getitem__(self, index):
    # 1. Get index
    index = self.indices[index]
  
    # 2. Decide whether to use Mosaic augmentation
    if self.mosaic and random.random() < self.hyp["mosaic"]:
        # Load Mosaic-augmented image
        img, labels = self.load_mosaic(index)
      
        # Potentially apply MixUp augmentation
        if random.random() < self.hyp["mixup"]:
            img, labels = mixup(...)
    else:
        # Regular image loading
        img, (h0, w0), (h, w) = self.load_image(index)
        # Apply Letterbox
        img, ratio, pad = letterbox(...)
        # Process labels
        labels = self.labels[index].copy()
      
    # 3. Apply more augmentation operations
    if self.augment:
        # Random perspective transformation
        img, labels = random_perspective(...)
      
        # Albumentations library augmentation
        img, labels = self.albumentations(img, labels)
      
        # HSV color space augmentation
        augment_hsv(...)
      
        # Random flipping
        if random.random() < self.hyp["flipud"]:
            img = np.flipud(img)
          
        if random.random() < self.hyp["fliplr"]:
            img = np.fliplr(img)
  
    # 4. Final processing
    # Label format conversion
    labels_out = torch.zeros((len(labels), 6))
    # Image format conversion
    img = img.transpose((2, 0, 1))[::-1]
  
    return torch.from_numpy(img), labels_out, self.im_files[index], shapes

When the training loop requests a batch of data, this method will:

  1. Choose whether to apply Mosaic augmentation (based on probability)
  2. Apply random perspective transformation
  3. Apply Albumentations library augmentation
  4. Apply HSV color space augmentation
  5. Apply random flipping (vertical and horizontal)
  6. Optionally apply Cutout augmentation (seems to be commented out currently)

3.5 Summary

The entire YOLOv5 data loading process is:

  1. Call the create_dataloader function in train.py, with the augment=True parameter
  2. The create_dataloader function creates an instance of the LoadImagesAndLabels class and passes augment=True to it
  3. create_dataloader wraps the dataset in PyTorch’s DataLoader
  4. The training loop uses this DataLoader to get data batches
  5. Each time a batch is requested, the __getitem__ method of the LoadImagesAndLabels class is called, applying various data augmentations

This design allows YOLOv5 to flexibly handle various data formats and apply complex data augmentation strategies while maintaining code modularity and extensibility.


4. Complete Analysis of Data Augmentation Workflow in YOLOv5

4.1 Complete Call Flow

4.1.1 Batch Retrieval in Training Loop

In the training loop of train.py, we can see code like this:

pbar = enumerate(train_loader)
...
for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
    callbacks.run("on_train_batch_start")
    ni = i + nb * epoch  # number integrated batches (since train start)
    imgs = imgs.to(device, non_blocking=True).float() / 255  # uint8 to float32, 0-255 to 0.0-1.0
    ...

When iterating through train_loader, PyTorch’s data loading process is actually being called. Here’s the complete call chain:

  1. The line for i, (imgs, targets, paths, _) in pbar triggers PyTorch’s data loading process
  2. PyTorch’s DataLoader creates worker threads to get samples from the dataset
  3. DataLoader calls LoadImagesAndLabels.__getitem__(index) to get individual samples
  4. DataLoader uses the collate_fn function to combine multiple samples into a batch
  5. Returns the combined batch data (imgs, targets, paths, _) to the training loop

4.1.2 Timing of getitem Method Calls

When the training process needs to load a batch of data:

  • If it’s the first iteration, DataLoader creates an iterator
  • The iterator determines which sample indices to load based on batch size and sampler
  • For each index, DataLoader calls dataset[index], which is LoadImagesAndLabels.__getitem__(index)
  • This method returns a processed single sample (image and labels)
  • DataLoader combines multiple samples into a batch and returns it to the training loop

4.1.3 Data Augmentation Implementation in getitem

Steps of data augmentation in the __getitem__ method:

def __getitem__(self, index):
    """Get a sample from the dataset, considering linear, random, or weighted sampling."""
    index = self.indices[index]  # linear, random, or weighted
  
    hyp = self.hyp
    if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
        # Load Mosaic augmentation
        img, labels = self.load_mosaic(index)
        shapes = None
      
        # MixUp augmentation
        if random.random() < hyp["mixup"]:
            img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
    else:
        # Regular image loading
        img, (h0, w0), (h, w) = self.load_image(index)
      
        # Letterbox
        shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size
        img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
        shapes = (h0, w0), ((h / h0, w / w0), pad)
      
        # Process labels
        labels = self.labels[index].copy()
        if labels.size:
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])
      
        # Random perspective transformation
        if self.augment:
            img, labels = random_perspective(
                img,
                labels,
                degrees=hyp["degrees"],
                translate=hyp["translate"],
                scale=hyp["scale"],
                shear=hyp["shear"],
                perspective=hyp["perspective"],
            )
  
    nl = len(labels)  # number of labels
    if nl:
        labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1e-3)
  
    # More data augmentation operations
    if self.augment:
        # Albumentations library augmentation
        img, labels = self.albumentations(img, labels)
        nl = len(labels)  # update number of labels
      
        # HSV color space augmentation
        augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
      
        # Vertical flip
        if random.random() < hyp["flipud"]:
            img = np.flipud(img)
            if nl:
                labels[:, 2] = 1 - labels[:, 2]
      
        # Horizontal flip
        if random.random() < hyp["fliplr"]:
            img = np.fliplr(img)
            if nl:
                labels[:, 1] = 1 - labels[:, 1]
      
        # Cutout (commented out)
        # labels = cutout(img, labels, p=0.5)
  
    # Format conversion
    labels_out = torch.zeros((nl, 6))
    if nl:
        labels_out[:, 1:] = torch.from_numpy(labels)
  
    # Image format conversion: HWC to CHW, BGR to RGB
    img = img.transpose((2, 0, 1))[::-1]
    img = np.ascontiguousarray(img)
  
    return torch.from_numpy(img), labels_out, self.im_files[index], shapes

4.1.4 Detailed Explanation of Key Augmentation Operations

a. Mosaic Augmentation
  • Randomly selects 4 images and combines them into one large image
  • Randomly determines the mosaic center point position
  • Adjusts the size and position of the four images
  • Adjusts corresponding label coordinates
if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
    img, labels = self.load_mosaic(index)
b. MixUp Augmentation
  • May be applied after Mosaic
  • Mixes two Mosaic-augmented samples at a certain ratio
  • Merges labels from both samples
if random.random() < hyp["mixup"]:
    img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
c. Random Perspective Transformation
  • Applies rotation, translation, scaling, shearing, and other geometric transformations
  • Simultaneously adjusts label coordinates to match the transformed image
if self.augment:
    img, labels = random_perspective(
        img,
        labels,
        degrees=hyp["degrees"],
        translate=hyp["translate"],
        scale=hyp["scale"],
        shear=hyp["shear"],
        perspective=hyp["perspective"],
    )
d. Albumentations Library Augmentation
  • Uses additional augmentations provided by the third-party Albumentations library
  • This is a conditional operation, only applied if an albumentations object was created during initialization
if self.augment:
    img, labels = self.albumentations(img, labels)
e. HSV Color Space Augmentation
  • Adjusts hue, saturation, and brightness in HSV color space
  • Random variation intensity controlled by hyperparameters
if self.augment:
    augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
f. Random Flipping
  • Performs random vertical and horizontal flipping
  • Adjusts label coordinates accordingly
if self.augment:
    if random.random() < hyp["flipud"]:
        img = np.flipud(img)
        if nl:
            labels[:, 2] = 1 - labels[:, 2]
  
    if random.random() < hyp["fliplr"]:
        img = np.fliplr(img)
        if nl:
            labels[:, 1] = 1 - labels[:, 1]

4.1.5 Augmentation Probability Control

The application probability of each augmentation operation is controlled through hyperparameters hyp:

  • hyp["mosaic"]: Mosaic augmentation probability
  • hyp["mixup"]: MixUp augmentation probability
  • hyp["flipud"]: Vertical flip probability
  • hyp["fliplr"]: Horizontal flip probability

Other augmentation intensities are also controlled by hyperparameters:

  • hyp["degrees"]: Rotation angle range
  • hyp["translate"]: Translation range
  • hyp["scale"]: Scaling range
  • hyp["shear"]: Shearing range
  • hyp["perspective"]: Perspective transformation intensity
  • hyp["hsv_h"], hyp["hsv_s"], hyp["hsv_v"]: HSV color space adjustment intensity

4.2 Summary: Complete Data Augmentation Flow

  1. Trigger timing: When the training loop iterates through DataLoader via for i, (imgs, targets, paths, _) in pbar
  2. Call process: PyTorch DataLoader → Worker threads → LoadImagesAndLabels.__getitem__(index)collate_fn → Batch data
  3. Preprocessing: Load image, apply letterbox resizing, adjust label coordinates
  4. Main augmentations:
    • Mosaic augmentation (combining 4 images)
    • MixUp augmentation (mixing 2 samples)
    • Random perspective transformation (rotation, translation, scaling, shearing)
    • Albumentations library augmentation
    • HSV color space augmentation
    • Random flipping (vertical and horizontal)
  5. Format conversion: Convert image format, prepare tensor format required by PyTorch
  6. Return results: Processed image, labels, file path, and shape information
sequenceDiagram
    participant Train as Training Loop
    participant Loader as DataLoader
    participant Dataset as LoadImagesAndLabels
    participant Aug as Data Augmentation
  
    Train->>Loader: Iterate request batch data
    Loader->>Dataset: __getitem__(index)
    Dataset->>Dataset: Load image
    alt Mosaic augmentation
        Dataset->>Aug: load_mosaic()
        opt MixUp augmentation
            Dataset->>Aug: mixup()
        end
    else Regular loading
        Dataset->>Dataset: Load image + letterbox
        opt Random perspective transformation
            Dataset->>Aug: random_perspective()
        end
    end
    opt Other augmentations
        Dataset->>Aug: albumentations augmentation
        Dataset->>Aug: HSV color space augmentation
        Dataset->>Aug: Random flipping
    end
    Dataset->>Loader: Return processed sample
    Loader->>Train: Return batch data

References:

文章对话

由AI生成的"小T"和"好奇宝宝"之间的对话,帮助理解文章内容