Table of Contents

Deep Dive into YOLOv5 Object Detection Code

This article offers an in-depth analysis of YOLOv5’s training process and data augmentation mechanisms, helping to organize and summarize the internal implementation details of the YOLOv5 object detection model.

1. Analysis of train.py File

1.1 Import Section

import argparse
import math
import os
import random
import subprocess
import sys
import time
from copy import deepcopy
from datetime import datetime, timedelta
from pathlib import Path

try:
    import comet_ml  # must be imported before torch (if installed)
except ImportError:
    comet_ml = None

import numpy as np
import torch
import torch.distributed as dist
import torch.nn as nn
import yaml
from torch.optim import lr_scheduler
from tqdm import tqdm

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

import val as validate  # for end-of-epoch mAP
from models.experimental import attempt_load
from models.yolo import Model
from utils.autoanchor import check_anchors
from utils.autobatch import check_train_batch_size
from utils.callbacks import Callbacks
from utils.dataloaders import create_dataloader
from utils.downloads import attempt_download, is_url
from utils.general import (
    LOGGER,
    TQDM_BAR_FORMAT,
    check_amp,
    check_dataset,
    check_file,
    check_git_info,
    check_git_status,
    check_img_size,
    check_requirements,
    check_suffix,
    check_yaml,
    colorstr,
    get_latest_run,
    increment_path,
    init_seeds,
    intersect_dicts,
    labels_to_class_weights,
    labels_to_image_weights,
    methods,
    one_cycle,
    print_args,
    print_mutation,
    strip_optimizer,
    yaml_save,
)
from utils.loggers import LOGGERS, Loggers
from utils.loggers.comet.comet_utils import check_comet_resume
from utils.loss import ComputeLoss
from utils.metrics import fitness
from utils.plots import plot_evolve
from utils.torch_utils import (
    EarlyStopping,
    ModelEMA,
    de_parallel,
    select_device,
    smart_DDP,
    smart_optimizer,
    smart_resume,
    torch_distributed_zero_first,
)

LOCAL_RANK = int(os.getenv("LOCAL_RANK", -1))
RANK = int(os.getenv("RANK", -1))
WORLD_SIZE = int(os.getenv("WORLD_SIZE", 1))
GIT_INFO = check_git_info()

1.2 Detailed Explanation of the Train() Function

The Train() function is the core function of YOLOv5 training, responsible for managing the entire training process:

def train(hyp, opt, device, callbacks):
    """
    Train a YOLOv5 model on a custom dataset using specified hyperparameters, options, and device, managing datasets,
    model architecture, loss computation, and optimizer steps.

    Args:
        hyp (str | dict): Path to the hyperparameters YAML file or a dictionary of hyperparameters.
        opt (argparse.Namespace): Parsed command-line arguments containing training options.
        device (torch.device): Device on which training occurs, e.g., 'cuda' or 'cpu'.
        callbacks (Callbacks): Callback functions for various training events.

    Returns:
        None

    Models and datasets download automatically from the latest YOLOv5 release.

    Example:
        Single-GPU training:
        ```bash
        $ python train.py --data coco128.yaml --weights yolov5s.pt --img 640  # from pretrained (recommended)
        $ python train.py --data coco128.yaml --weights '' --cfg yolov5s.yaml --img 640  # from scratch
        ```

        Multi-GPU DDP training:
        ```bash
        $ python -m torch.distributed.run --nproc_per_node 4 --master_port 1 train.py --data coco128.yaml --weights
        yolov5s.pt --img 640 --device 0,1,2,3
        ```

        For more usage details, refer to:
        - Models: https://github.com/ultralytics/yolov5/tree/master/models
        - Datasets: https://github.com/ultralytics/yolov5/tree/master/data
        - Tutorial: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
    """

The main steps executed by the function include:

Parameter parsing and initialization: Process input parameters, set up save directory and training configuration
Loading hyperparameters: Load from YAML file or use the passed hyperparameter dictionary
Configuring logging system: Initialize logger and callback functions
Loading dataset: Validate dataset format and get training and validation paths
Model creation or loading: Create a new model or load pre-trained weights based on configuration
Optimizer configuration: Set up optimizer, learning rate scheduler, and EMA (Exponential Moving Average)
Data loader creation: Create training and validation data loaders
Start training loop: Execute multiple epochs of training, each including:
- Training phase (forward pass, loss calculation, backward pass, parameter update)
- Optional validation phase (calculate metrics like mAP)
- Model saving and early stopping check
End of training: Save final model, perform final validation, release resources

Particularly noteworthy is the training data loader creation section:

# Trainloader
train_loader, dataset = create_dataloader(
    train_path,
    imgsz,
    batch_size // WORLD_SIZE,
    gs,
    single_cls,
    hyp=hyp,
    augment=True, # Data augmentation is enabled by default in training
    cache=None if opt.cache == "val" else opt.cache,
    rect=opt.rect,
    rank=LOCAL_RANK,
    workers=workers,
    image_weights=opt.image_weights,
    quad=opt.quad,
    prefix=colorstr("train: "),
    shuffle=True,
    seed=opt.seed,
)

In contrast, the validation data loader does not have data augmentation enabled:

# Process 0
# No data augmentation in validation data loader:
# In comparison, data augmentation is not enabled in the validation data loader, which is reasonable 
# since validation should be performed on the original, unaugmented data:
# Note that the augment parameter is not specified here, it will use the default value False in the create_dataloader function.
if RANK in {-1, 0}:
    val_loader = create_dataloader(
        val_path,
        imgsz,
        batch_size // WORLD_SIZE * 2,
        gs,
        single_cls,
        hyp=hyp,
        cache=None if noval else opt.cache,
        rect=True,
        rank=-1,
        workers=workers * 2,
        pad=0.5,
        prefix=colorstr("val: "),
    )[0]

1.3 Data Augmentation in Training

YOLOv5 training enables data augmentation by default. Based on code analysis, we can confirm the following points:

Data augmentation settings in the training data loader: In the train.py file’s create_dataloader call, the augment parameter is explicitly set to True:

train_loader, dataset = create_dataloader(
    train_path,
    imgsz,
    batch_size // WORLD_SIZE,
    gs,
    single_cls,
    hyp=hyp,
    augment=True,  # Data augmentation is enabled by default in training
    cache=None if opt.cache == "val" else opt.cache,
    rect=opt.rect,
    rank=LOCAL_RANK,
    workers=workers,
    image_weights=opt.image_weights,
    quad=opt.quad,
    prefix=colorstr("train: "),
    shuffle=True,
    seed=opt.seed,
)

No data augmentation in validation data loader: In contrast, data augmentation is not enabled in the validation data loader, which is reasonable since validation should be performed on the original, unaugmented data.

In the LoadImagesAndLabels class’s __getitem__ method, various data augmentation techniques are applied when augment=True:

Mosaic augmentation: Combines 4 different images into one, enhancing multi-scale training and small object detection capabilities
```
if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
    img, labels = self.load_mosaic(index)
```

MixUp augmentation: Mixes two images at a certain ratio, increasing the complexity of training data

if random.random() < hyp["mixup"]:
    img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))

Random perspective transformation: Includes rotation, translation, scaling, shearing, and other geometric transformations

img, labels = random_perspective(
    img,
    labels,
    degrees=hyp["degrees"],
    translate=hyp["translate"],
    scale=hyp["scale"],
    shear=hyp["shear"],
    perspective=hyp["perspective"],
)

Albumentations library augmentation: Additional augmentations provided by the powerful Albumentations image augmentation library
```
img, labels = self.albumentations(img, labels)
```

HSV color space augmentation: Adjusts hue, saturation, and brightness

augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])

Random flipping: Vertical and horizontal flipping

if random.random() < hyp["flipud"]:
    img = np.flipud(img)

if random.random() < hyp["fliplr"]:
    img = np.fliplr(img)

Cutout (commented out): Randomly masks certain areas in the image to enhance the model’s robustness

1.4 Augmentation Parameter Control

Specific parameters for data augmentation are controlled through the hyperparameter file (hyp.yaml), including:

Parameter	Description	Function
`mosaic`	Probability of applying Mosaic augmentation	Controls whether to apply Mosaic augmentation
`mixup`	Probability of applying MixUp augmentation	Controls whether to apply MixUp augmentation
`hsv_h`	HSV hue adjustment intensity	Controls the range of hue variation
`hsv_s`	HSV saturation adjustment intensity	Controls the range of saturation variation
`hsv_v`	HSV brightness adjustment intensity	Controls the range of brightness variation
`degrees`	Rotation angle range	Controls the maximum angle of random rotation
`translate`	Translation range	Controls the maximum ratio of random translation
`scale`	Scaling range	Controls the maximum ratio of random scaling
`shear`	Shearing range	Controls the maximum angle of random shearing
`perspective`	Perspective transformation intensity	Controls the intensity of perspective transformation
`flipud`	Vertical flip probability	Controls the probability of vertical flipping
`fliplr`	Horizontal flip probability	Controls the probability of horizontal flipping

1.5 Conclusion

YOLOv5 enables a rich set of data augmentation strategies by default during training, which is one of the key factors enabling its high detection performance. These augmentations include image fusion (Mosaic and MixUp), geometric transformations, color adjustments, and random flipping. These techniques work together to greatly increase the diversity of training data, helping the model learn more robust features and improving its detection capabilities for objects in different environments and conditions.

When using YOLOv5 to train your own dataset, you don’t need to manually enable data augmentation as it’s already enabled by default. If you need to adjust the intensity of augmentations, you can modify the relevant parameters in the hyperparameter file.

2. YOLOv5 Data Loading and Augmentation Workflow

The entire data loading and augmentation process involves call relationships among multiple functions and classes. Below is a detailed explanation of this workflow:

2.1 Call Relationship

First, the create_dataloader function is called in train.py:

train_loader, dataset = create_dataloader(
    train_path,
    imgsz,
    batch_size // WORLD_SIZE,
    gs,
    single_cls,
    hyp=hyp,
    augment=True,  # augment=True is set here
    cache=None if opt.cache == "val" else opt.cache,
    rect=opt.rect,
    rank=LOCAL_RANK,
    workers=workers,
    image_weights=opt.image_weights,
    quad=opt.quad,
    prefix=colorstr("train: "),
    shuffle=True,
    seed=opt.seed,
)

Inside the create_dataloader function, an instance of the LoadImagesAndLabels dataset class is created:

dataset = LoadImagesAndLabels(
    path,
    imgsz,
    batch_size,
    augment=augment,  # The augment parameter is passed to LoadImagesAndLabels here
    hyp=hyp,
    # Other parameters...
)

The create_dataloader function finally returns a PyTorch DataLoader and the dataset:

return loader(
    dataset,
    batch_size=batch_size,
    # Other parameters...
), dataset

2.2 Specific Implementation of Data Augmentation

Data augmentation occurs in the __getitem__ method of the LoadImagesAndLabels class when the training process needs a batch of data:

When augment=True, the LoadImagesAndLabels class sets during initialization:

self.augment = augment
self.mosaic = self.augment and not self.rect  # mosaic is only enabled when augment=True
self.albumentations = Albumentations(size=img_size) if augment else None

In the __getitem__ method, if self.augment=True, various augmentations are applied:

if self.augment:
    # Random perspective transformation
    img, labels = random_perspective(...)

    # Albumentations library augmentation
    img, labels = self.albumentations(img, labels)

    # HSV color space augmentation
    augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])

    # Random flipping
    if random.random() < hyp["flipud"]:
        img = np.flipud(img)

    if random.random() < hyp["fliplr"]:
        img = np.fliplr(img)

2.3 Complete Call Chain

The actual call chain is as follows:

train.py → Call create_dataloader(augment=True)
create_dataloader → Create LoadImagesAndLabels(augment=True)
create_dataloader → Use the above dataset to create and return DataLoader
When the training loop executes, DataLoader → Call LoadImagesAndLabels.__getitem__
LoadImagesAndLabels.__getitem__ → Apply various data augmentations based on augment=True

graph TD
    A[train.py] -->|Call| B[create_dataloader]
    B -->|Create| C[LoadImagesAndLabels]
    B -->|Return| D[DataLoader]
    D -->|Training loop requests data| E[__getitem__]
    E -->|Apply| F[Data Augmentation]

2.4 Conclusion

The augment=True parameter set in train.py is ultimately passed to the LoadImagesAndLabels class and triggers various data augmentation operations in that class’s __getitem__ method. This is a typical PyTorch data loading workflow: first define a dataset class (handling the loading and augmentation of individual samples), then wrap it with DataLoader (handling batches, multi-threading, etc.).

YOLOv5 adopts this design to achieve:

Clear code structure (separation of data loading and model training)
Efficient data processing (multi-threaded preloading)
Flexible augmentation operations (can be enabled or disabled as needed)

This is why the system can automatically apply complex data augmentation strategies after setting augment=True in train.py.

3. Detailed Explanation of YOLOv5’s Data Loading Mechanism

3.1 Data Loader Creation Process

In YOLOv5’s training process:

The create_dataloader function creates a data loader:
- This function first creates an instance of the LoadImagesAndLabels class as the dataset
- Then wraps this dataset in PyTorch’s DataLoader or InfiniteDataLoader
- Finally returns this data loader and the dataset
The LoadImagesAndLabels class acts as the dataset:
- This class inherits from PyTorch’s Dataset class
- It is responsible for managing data loading, preprocessing, and augmentation
- It defines the logic for obtaining individual data samples

3.2 Main Parameters of the LoadImagesAndLabels Class

This class contains many important parameters, with the following being the main ones:

3.2.1 Basic Path and Image Settings

path: Dataset path (can be a directory or file list)
img_size: Image size (default 640 pixels)
batch_size: Batch size

augment: Whether to enable data augmentation
hyp: Hyperparameter dictionary, containing probabilities and intensities for various augmentations
mosaic: Whether to use Mosaic augmentation (automatically enabled when augment=True and rect=False)
albumentations: Whether to use the Albumentations library for augmentation

rect: Whether to use rectangular training (using images with similar aspect ratios in a batch)
stride: The model’s maximum downsampling rate, used to ensure image dimensions are multiples of the stride
pad: Boundary padding size

3.2.4 Cache and Performance Optimization Parameters

cache_images: Whether to cache images to speed up training (can be “ram” or “disk”)
workers: Number of data loading worker threads

3.2.5 Dataset Characteristic Parameters

single_cls: Whether to treat all categories as one category
image_weights: Whether to use image weights (based on class frequency)

3.3 Main Functions of the create_dataloader Function

This function completes several key tasks:

def create_dataloader(
    path,
    imgsz,
    batch_size,
    stride,
    single_cls=False,
    hyp=None,
    augment=False,
    cache=False,
    pad=0.0,
    rect=False,
    rank=-1,
    workers=8,
    image_weights=False,
    quad=False,
    prefix="",
    shuffle=False,
    seed=0,
):
    # Create dataset instance
    dataset = LoadImagesAndLabels(
        path,
        imgsz,
        batch_size,
        # Other parameters...
    )
  
    # Configure batch size and sampler
    batch_size = min(batch_size, len(dataset))
    sampler = None if rank == -1 else distributed.DistributedSampler(...)
  
    # Select data loader type
    loader = InfiniteDataLoader if image_weights else DataLoader
  
    # Create and return data loader
    return loader(
        dataset,
        batch_size=batch_size,
        shuffle=shuffle and sampler is None,
        # Other parameters...
    ), dataset

Create dataset: Instantiate the LoadImagesAndLabels class
Determine batch size: Ensure batch size does not exceed dataset size
Set sampler: Choose appropriate sampler based on whether distributed training is being used
Select data loader type: Choose DataLoader or InfiniteDataLoader based on whether image weights are used
Configure data loading parameters:
- Batch size
- Whether to shuffle
- Number of worker threads
- Sampler
- Whether to drop the last incomplete batch
- Memory pinning
- Collate function (collate_fn)
- Worker initialization function
- Random number generator

3.4 Where Data Augmentation Actually Happens

Data augmentation primarily occurs in the __getitem__ method of the LoadImagesAndLabels class. Below is a simplified method flow:

def __getitem__(self, index):
    # 1. Get index
    index = self.indices[index]
  
    # 2. Decide whether to use Mosaic augmentation
    if self.mosaic and random.random() < self.hyp["mosaic"]:
        # Load Mosaic-augmented image
        img, labels = self.load_mosaic(index)
      
        # Potentially apply MixUp augmentation
        if random.random() < self.hyp["mixup"]:
            img, labels = mixup(...)
    else:
        # Regular image loading
        img, (h0, w0), (h, w) = self.load_image(index)
        # Apply Letterbox
        img, ratio, pad = letterbox(...)
        # Process labels
        labels = self.labels[index].copy()
      
    # 3. Apply more augmentation operations
    if self.augment:
        # Random perspective transformation
        img, labels = random_perspective(...)
      
        # Albumentations library augmentation
        img, labels = self.albumentations(img, labels)
      
        # HSV color space augmentation
        augment_hsv(...)
      
        # Random flipping
        if random.random() < self.hyp["flipud"]:
            img = np.flipud(img)
          
        if random.random() < self.hyp["fliplr"]:
            img = np.fliplr(img)
  
    # 4. Final processing
    # Label format conversion
    labels_out = torch.zeros((len(labels), 6))
    # Image format conversion
    img = img.transpose((2, 0, 1))[::-1]
  
    return torch.from_numpy(img), labels_out, self.im_files[index], shapes

When the training loop requests a batch of data, this method will:

Choose whether to apply Mosaic augmentation (based on probability)
Apply random perspective transformation
Apply Albumentations library augmentation
Apply HSV color space augmentation
Apply random flipping (vertical and horizontal)
Optionally apply Cutout augmentation (seems to be commented out currently)

3.5 Summary

The entire YOLOv5 data loading process is:

Call the create_dataloader function in train.py, with the augment=True parameter
The create_dataloader function creates an instance of the LoadImagesAndLabels class and passes augment=True to it
create_dataloader wraps the dataset in PyTorch’s DataLoader
The training loop uses this DataLoader to get data batches
Each time a batch is requested, the __getitem__ method of the LoadImagesAndLabels class is called, applying various data augmentations

This design allows YOLOv5 to flexibly handle various data formats and apply complex data augmentation strategies while maintaining code modularity and extensibility.

4. Complete Analysis of Data Augmentation Workflow in YOLOv5

4.1 Complete Call Flow

4.1.1 Batch Retrieval in Training Loop

In the training loop of train.py, we can see code like this:

pbar = enumerate(train_loader)
...
for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------
    callbacks.run("on_train_batch_start")
    ni = i + nb * epoch  # number integrated batches (since train start)
    imgs = imgs.to(device, non_blocking=True).float() / 255  # uint8 to float32, 0-255 to 0.0-1.0
    ...

When iterating through train_loader, PyTorch’s data loading process is actually being called. Here’s the complete call chain:

The line for i, (imgs, targets, paths, _) in pbar triggers PyTorch’s data loading process
PyTorch’s DataLoader creates worker threads to get samples from the dataset
DataLoader calls LoadImagesAndLabels.__getitem__(index) to get individual samples
DataLoader uses the collate_fn function to combine multiple samples into a batch
Returns the combined batch data (imgs, targets, paths, _) to the training loop

4.1.2 Timing of getitem Method Calls

When the training process needs to load a batch of data:

If it’s the first iteration, DataLoader creates an iterator
The iterator determines which sample indices to load based on batch size and sampler
For each index, DataLoader calls dataset[index], which is LoadImagesAndLabels.__getitem__(index)
This method returns a processed single sample (image and labels)
DataLoader combines multiple samples into a batch and returns it to the training loop

4.1.3 Data Augmentation Implementation in getitem

Steps of data augmentation in the __getitem__ method:

def __getitem__(self, index):
    """Get a sample from the dataset, considering linear, random, or weighted sampling."""
    index = self.indices[index]  # linear, random, or weighted
  
    hyp = self.hyp
    if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
        # Load Mosaic augmentation
        img, labels = self.load_mosaic(index)
        shapes = None
      
        # MixUp augmentation
        if random.random() < hyp["mixup"]:
            img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
    else:
        # Regular image loading
        img, (h0, w0), (h, w) = self.load_image(index)
      
        # Letterbox
        shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size
        img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
        shapes = (h0, w0), ((h / h0, w / w0), pad)
      
        # Process labels
        labels = self.labels[index].copy()
        if labels.size:
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])
      
        # Random perspective transformation
        if self.augment:
            img, labels = random_perspective(
                img,
                labels,
                degrees=hyp["degrees"],
                translate=hyp["translate"],
                scale=hyp["scale"],
                shear=hyp["shear"],
                perspective=hyp["perspective"],
            )
  
    nl = len(labels)  # number of labels
    if nl:
        labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1e-3)
  
    # More data augmentation operations
    if self.augment:
        # Albumentations library augmentation
        img, labels = self.albumentations(img, labels)
        nl = len(labels)  # update number of labels
      
        # HSV color space augmentation
        augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
      
        # Vertical flip
        if random.random() < hyp["flipud"]:
            img = np.flipud(img)
            if nl:
                labels[:, 2] = 1 - labels[:, 2]
      
        # Horizontal flip
        if random.random() < hyp["fliplr"]:
            img = np.fliplr(img)
            if nl:
                labels[:, 1] = 1 - labels[:, 1]
      
        # Cutout (commented out)
        # labels = cutout(img, labels, p=0.5)
  
    # Format conversion
    labels_out = torch.zeros((nl, 6))
    if nl:
        labels_out[:, 1:] = torch.from_numpy(labels)
  
    # Image format conversion: HWC to CHW, BGR to RGB
    img = img.transpose((2, 0, 1))[::-1]
    img = np.ascontiguousarray(img)
  
    return torch.from_numpy(img), labels_out, self.im_files[index], shapes

4.1.4 Detailed Explanation of Key Augmentation Operations

a. Mosaic Augmentation

Randomly selects 4 images and combines them into one large image
Randomly determines the mosaic center point position
Adjusts the size and position of the four images
Adjusts corresponding label coordinates

if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
    img, labels = self.load_mosaic(index)

b. MixUp Augmentation

May be applied after Mosaic
Mixes two Mosaic-augmented samples at a certain ratio
Merges labels from both samples

if random.random() < hyp["mixup"]:
    img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))

c. Random Perspective Transformation

Applies rotation, translation, scaling, shearing, and other geometric transformations
Simultaneously adjusts label coordinates to match the transformed image

if self.augment:
    img, labels = random_perspective(
        img,
        labels,
        degrees=hyp["degrees"],
        translate=hyp["translate"],
        scale=hyp["scale"],
        shear=hyp["shear"],
        perspective=hyp["perspective"],
    )

d. Albumentations Library Augmentation

Uses additional augmentations provided by the third-party Albumentations library
This is a conditional operation, only applied if an albumentations object was created during initialization

if self.augment:
    img, labels = self.albumentations(img, labels)

e. HSV Color Space Augmentation

Adjusts hue, saturation, and brightness in HSV color space
Random variation intensity controlled by hyperparameters

if self.augment:
    augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])

f. Random Flipping

Performs random vertical and horizontal flipping
Adjusts label coordinates accordingly

if self.augment:
    if random.random() < hyp["flipud"]:
        img = np.flipud(img)
        if nl:
            labels[:, 2] = 1 - labels[:, 2]
  
    if random.random() < hyp["fliplr"]:
        img = np.fliplr(img)
        if nl:
            labels[:, 1] = 1 - labels[:, 1]

4.1.5 Augmentation Probability Control

The application probability of each augmentation operation is controlled through hyperparameters hyp:

hyp["mosaic"]: Mosaic augmentation probability
hyp["mixup"]: MixUp augmentation probability
hyp["flipud"]: Vertical flip probability
hyp["fliplr"]: Horizontal flip probability

Other augmentation intensities are also controlled by hyperparameters:

hyp["degrees"]: Rotation angle range
hyp["translate"]: Translation range
hyp["scale"]: Scaling range
hyp["shear"]: Shearing range
hyp["perspective"]: Perspective transformation intensity
hyp["hsv_h"], hyp["hsv_s"], hyp["hsv_v"]: HSV color space adjustment intensity

4.2 Summary: Complete Data Augmentation Flow

Trigger timing: When the training loop iterates through DataLoader via for i, (imgs, targets, paths, _) in pbar
Call process: PyTorch DataLoader → Worker threads → LoadImagesAndLabels.__getitem__(index) → collate_fn → Batch data
Preprocessing: Load image, apply letterbox resizing, adjust label coordinates
Main augmentations:
- Mosaic augmentation (combining 4 images)
- MixUp augmentation (mixing 2 samples)
- Random perspective transformation (rotation, translation, scaling, shearing)
- Albumentations library augmentation
- HSV color space augmentation
- Random flipping (vertical and horizontal)
Format conversion: Convert image format, prepare tensor format required by PyTorch
Return results: Processed image, labels, file path, and shape information

sequenceDiagram
    participant Train as Training Loop
    participant Loader as DataLoader
    participant Dataset as LoadImagesAndLabels
    participant Aug as Data Augmentation
  
    Train->>Loader: Iterate request batch data
    Loader->>Dataset: __getitem__(index)
    Dataset->>Dataset: Load image
    alt Mosaic augmentation
        Dataset->>Aug: load_mosaic()
        opt MixUp augmentation
            Dataset->>Aug: mixup()
        end
    else Regular loading
        Dataset->>Dataset: Load image + letterbox
        opt Random perspective transformation
            Dataset->>Aug: random_perspective()
        end
    end
    opt Other augmentations
        Dataset->>Aug: albumentations augmentation
        Dataset->>Aug: HSV color space augmentation
        Dataset->>Aug: Random flipping
    end
    Dataset->>Loader: Return processed sample
    Loader->>Train: Return batch data

References:

文章对话

由AI生成的"小T"和"好奇宝宝"之间的对话，帮助理解文章内容