Deep Dive into YOLOv5 Object Detection Code
Table of Contents
Deep Dive into YOLOv5 Object Detection Code
This article offers an in-depth analysis of YOLOv5’s training process and data augmentation mechanisms, helping to organize and summarize the internal implementation details of the YOLOv5 object detection model.
1. Analysis of train.py File
1.1 Import Section
import argparse
import math
import os
import random
import subprocess
import sys
import time
from copy import deepcopy
from datetime import datetime, timedelta
from pathlib import Path
try:
import comet_ml # must be imported before torch (if installed)
except ImportError:
comet_ml = None
import numpy as np
import torch
import torch.distributed as dist
import torch.nn as nn
import yaml
from torch.optim import lr_scheduler
from tqdm import tqdm
FILE = Path(__file__).resolve()
ROOT = FILE.parents[0] # YOLOv5 root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
import val as validate # for end-of-epoch mAP
from models.experimental import attempt_load
from models.yolo import Model
from utils.autoanchor import check_anchors
from utils.autobatch import check_train_batch_size
from utils.callbacks import Callbacks
from utils.dataloaders import create_dataloader
from utils.downloads import attempt_download, is_url
from utils.general import (
LOGGER,
TQDM_BAR_FORMAT,
check_amp,
check_dataset,
check_file,
check_git_info,
check_git_status,
check_img_size,
check_requirements,
check_suffix,
check_yaml,
colorstr,
get_latest_run,
increment_path,
init_seeds,
intersect_dicts,
labels_to_class_weights,
labels_to_image_weights,
methods,
one_cycle,
print_args,
print_mutation,
strip_optimizer,
yaml_save,
)
from utils.loggers import LOGGERS, Loggers
from utils.loggers.comet.comet_utils import check_comet_resume
from utils.loss import ComputeLoss
from utils.metrics import fitness
from utils.plots import plot_evolve
from utils.torch_utils import (
EarlyStopping,
ModelEMA,
de_parallel,
select_device,
smart_DDP,
smart_optimizer,
smart_resume,
torch_distributed_zero_first,
)
LOCAL_RANK = int(os.getenv("LOCAL_RANK", -1))
RANK = int(os.getenv("RANK", -1))
WORLD_SIZE = int(os.getenv("WORLD_SIZE", 1))
GIT_INFO = check_git_info()
1.2 Detailed Explanation of the Train() Function
The Train() function is the core function of YOLOv5 training, responsible for managing the entire training process:
def train(hyp, opt, device, callbacks):
"""
Train a YOLOv5 model on a custom dataset using specified hyperparameters, options, and device, managing datasets,
model architecture, loss computation, and optimizer steps.
Args:
hyp (str | dict): Path to the hyperparameters YAML file or a dictionary of hyperparameters.
opt (argparse.Namespace): Parsed command-line arguments containing training options.
device (torch.device): Device on which training occurs, e.g., 'cuda' or 'cpu'.
callbacks (Callbacks): Callback functions for various training events.
Returns:
None
Models and datasets download automatically from the latest YOLOv5 release.
Example:
Single-GPU training:
```bash
$ python train.py --data coco128.yaml --weights yolov5s.pt --img 640 # from pretrained (recommended)
$ python train.py --data coco128.yaml --weights '' --cfg yolov5s.yaml --img 640 # from scratch
```
Multi-GPU DDP training:
```bash
$ python -m torch.distributed.run --nproc_per_node 4 --master_port 1 train.py --data coco128.yaml --weights
yolov5s.pt --img 640 --device 0,1,2,3
```
For more usage details, refer to:
- Models: https://github.com/ultralytics/yolov5/tree/master/models
- Datasets: https://github.com/ultralytics/yolov5/tree/master/data
- Tutorial: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
"""
The main steps executed by the function include:
- Parameter parsing and initialization: Process input parameters, set up save directory and training configuration
- Loading hyperparameters: Load from YAML file or use the passed hyperparameter dictionary
- Configuring logging system: Initialize logger and callback functions
- Loading dataset: Validate dataset format and get training and validation paths
- Model creation or loading: Create a new model or load pre-trained weights based on configuration
- Optimizer configuration: Set up optimizer, learning rate scheduler, and EMA (Exponential Moving Average)
- Data loader creation: Create training and validation data loaders
- Start training loop: Execute multiple epochs of training, each including:
- Training phase (forward pass, loss calculation, backward pass, parameter update)
- Optional validation phase (calculate metrics like mAP)
- Model saving and early stopping check
- End of training: Save final model, perform final validation, release resources
Particularly noteworthy is the training data loader creation section:
# Trainloader
train_loader, dataset = create_dataloader(
train_path,
imgsz,
batch_size // WORLD_SIZE,
gs,
single_cls,
hyp=hyp,
augment=True, # Data augmentation is enabled by default in training
cache=None if opt.cache == "val" else opt.cache,
rect=opt.rect,
rank=LOCAL_RANK,
workers=workers,
image_weights=opt.image_weights,
quad=opt.quad,
prefix=colorstr("train: "),
shuffle=True,
seed=opt.seed,
)
In contrast, the validation data loader does not have data augmentation enabled:
# Process 0
# No data augmentation in validation data loader:
# In comparison, data augmentation is not enabled in the validation data loader, which is reasonable
# since validation should be performed on the original, unaugmented data:
# Note that the augment parameter is not specified here, it will use the default value False in the create_dataloader function.
if RANK in {-1, 0}:
val_loader = create_dataloader(
val_path,
imgsz,
batch_size // WORLD_SIZE * 2,
gs,
single_cls,
hyp=hyp,
cache=None if noval else opt.cache,
rect=True,
rank=-1,
workers=workers * 2,
pad=0.5,
prefix=colorstr("val: "),
)[0]
1.3 Data Augmentation in Training
YOLOv5 training enables data augmentation by default. Based on code analysis, we can confirm the following points:
Data augmentation settings in the training data loader: In the
train.py
file’screate_dataloader
call, theaugment
parameter is explicitly set toTrue
:train_loader, dataset = create_dataloader( train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls, hyp=hyp, augment=True, # Data augmentation is enabled by default in training cache=None if opt.cache == "val" else opt.cache, rect=opt.rect, rank=LOCAL_RANK, workers=workers, image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr("train: "), shuffle=True, seed=opt.seed, )
No data augmentation in validation data loader: In contrast, data augmentation is not enabled in the validation data loader, which is reasonable since validation should be performed on the original, unaugmented data.
In the LoadImagesAndLabels
class’s __getitem__
method, various data augmentation techniques are applied when augment=True
:
Mosaic augmentation: Combines 4 different images into one, enhancing multi-scale training and small object detection capabilities
if mosaic := self.mosaic and random.random() < hyp["mosaic"]: img, labels = self.load_mosaic(index)
MixUp augmentation: Mixes two images at a certain ratio, increasing the complexity of training data
if random.random() < hyp["mixup"]: img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
Random perspective transformation: Includes rotation, translation, scaling, shearing, and other geometric transformations
img, labels = random_perspective( img, labels, degrees=hyp["degrees"], translate=hyp["translate"], scale=hyp["scale"], shear=hyp["shear"], perspective=hyp["perspective"], )
Albumentations library augmentation: Additional augmentations provided by the powerful Albumentations image augmentation library
img, labels = self.albumentations(img, labels)
HSV color space augmentation: Adjusts hue, saturation, and brightness
augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
Random flipping: Vertical and horizontal flipping
if random.random() < hyp["flipud"]: img = np.flipud(img) if random.random() < hyp["fliplr"]: img = np.fliplr(img)
Cutout (commented out): Randomly masks certain areas in the image to enhance the model’s robustness
1.4 Augmentation Parameter Control
Specific parameters for data augmentation are controlled through the hyperparameter file (hyp.yaml
), including:
Parameter | Description | Function |
---|---|---|
mosaic | Probability of applying Mosaic augmentation | Controls whether to apply Mosaic augmentation |
mixup | Probability of applying MixUp augmentation | Controls whether to apply MixUp augmentation |
hsv_h | HSV hue adjustment intensity | Controls the range of hue variation |
hsv_s | HSV saturation adjustment intensity | Controls the range of saturation variation |
hsv_v | HSV brightness adjustment intensity | Controls the range of brightness variation |
degrees | Rotation angle range | Controls the maximum angle of random rotation |
translate | Translation range | Controls the maximum ratio of random translation |
scale | Scaling range | Controls the maximum ratio of random scaling |
shear | Shearing range | Controls the maximum angle of random shearing |
perspective | Perspective transformation intensity | Controls the intensity of perspective transformation |
flipud | Vertical flip probability | Controls the probability of vertical flipping |
fliplr | Horizontal flip probability | Controls the probability of horizontal flipping |
1.5 Conclusion
YOLOv5 enables a rich set of data augmentation strategies by default during training, which is one of the key factors enabling its high detection performance. These augmentations include image fusion (Mosaic and MixUp), geometric transformations, color adjustments, and random flipping. These techniques work together to greatly increase the diversity of training data, helping the model learn more robust features and improving its detection capabilities for objects in different environments and conditions.
When using YOLOv5 to train your own dataset, you don’t need to manually enable data augmentation as it’s already enabled by default. If you need to adjust the intensity of augmentations, you can modify the relevant parameters in the hyperparameter file.
2. YOLOv5 Data Loading and Augmentation Workflow
The entire data loading and augmentation process involves call relationships among multiple functions and classes. Below is a detailed explanation of this workflow:
2.1 Call Relationship
First, the
create_dataloader
function is called intrain.py
:train_loader, dataset = create_dataloader( train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls, hyp=hyp, augment=True, # augment=True is set here cache=None if opt.cache == "val" else opt.cache, rect=opt.rect, rank=LOCAL_RANK, workers=workers, image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr("train: "), shuffle=True, seed=opt.seed, )
Inside the
create_dataloader
function, an instance of theLoadImagesAndLabels
dataset class is created:dataset = LoadImagesAndLabels( path, imgsz, batch_size, augment=augment, # The augment parameter is passed to LoadImagesAndLabels here hyp=hyp, # Other parameters... )
The
create_dataloader
function finally returns a PyTorchDataLoader
and the dataset:return loader( dataset, batch_size=batch_size, # Other parameters... ), dataset
2.2 Specific Implementation of Data Augmentation
Data augmentation occurs in the __getitem__
method of the LoadImagesAndLabels
class when the training process needs a batch of data:
When
augment=True
, theLoadImagesAndLabels
class sets during initialization:self.augment = augment self.mosaic = self.augment and not self.rect # mosaic is only enabled when augment=True self.albumentations = Albumentations(size=img_size) if augment else None
In the
__getitem__
method, ifself.augment=True
, various augmentations are applied:if self.augment: # Random perspective transformation img, labels = random_perspective(...) # Albumentations library augmentation img, labels = self.albumentations(img, labels) # HSV color space augmentation augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"]) # Random flipping if random.random() < hyp["flipud"]: img = np.flipud(img) if random.random() < hyp["fliplr"]: img = np.fliplr(img)
2.3 Complete Call Chain
The actual call chain is as follows:
train.py
→ Callcreate_dataloader(augment=True)
create_dataloader
→ CreateLoadImagesAndLabels(augment=True)
create_dataloader
→ Use the above dataset to create and returnDataLoader
- When the training loop executes,
DataLoader
→ CallLoadImagesAndLabels.__getitem__
LoadImagesAndLabels.__getitem__
→ Apply various data augmentations based onaugment=True
graph TD
A[train.py] -->|Call| B[create_dataloader]
B -->|Create| C[LoadImagesAndLabels]
B -->|Return| D[DataLoader]
D -->|Training loop requests data| E[__getitem__]
E -->|Apply| F[Data Augmentation]
2.4 Conclusion
The augment=True
parameter set in train.py
is ultimately passed to the LoadImagesAndLabels
class and triggers various data augmentation operations in that class’s __getitem__
method. This is a typical PyTorch data loading workflow: first define a dataset class (handling the loading and augmentation of individual samples), then wrap it with DataLoader
(handling batches, multi-threading, etc.).
YOLOv5 adopts this design to achieve:
- Clear code structure (separation of data loading and model training)
- Efficient data processing (multi-threaded preloading)
- Flexible augmentation operations (can be enabled or disabled as needed)
This is why the system can automatically apply complex data augmentation strategies after setting augment=True
in train.py
.
3. Detailed Explanation of YOLOv5’s Data Loading Mechanism
3.1 Data Loader Creation Process
In YOLOv5’s training process:
The
create_dataloader
function creates a data loader:- This function first creates an instance of the
LoadImagesAndLabels
class as the dataset - Then wraps this dataset in PyTorch’s
DataLoader
orInfiniteDataLoader
- Finally returns this data loader and the dataset
- This function first creates an instance of the
The
LoadImagesAndLabels
class acts as the dataset:- This class inherits from PyTorch’s
Dataset
class - It is responsible for managing data loading, preprocessing, and augmentation
- It defines the logic for obtaining individual data samples
- This class inherits from PyTorch’s
3.2 Main Parameters of the LoadImagesAndLabels Class
This class contains many important parameters, with the following being the main ones:
3.2.1 Basic Path and Image Settings
path
: Dataset path (can be a directory or file list)img_size
: Image size (default 640 pixels)batch_size
: Batch size
3.2.2 Augmentation-Related Parameters
augment
: Whether to enable data augmentationhyp
: Hyperparameter dictionary, containing probabilities and intensities for various augmentationsmosaic
: Whether to use Mosaic augmentation (automatically enabled whenaugment=True
andrect=False
)albumentations
: Whether to use the Albumentations library for augmentation
3.2.3 Batch and Processing-Related Parameters
rect
: Whether to use rectangular training (using images with similar aspect ratios in a batch)stride
: The model’s maximum downsampling rate, used to ensure image dimensions are multiples of the stridepad
: Boundary padding size
3.2.4 Cache and Performance Optimization Parameters
cache_images
: Whether to cache images to speed up training (can be “ram” or “disk”)workers
: Number of data loading worker threads
3.2.5 Dataset Characteristic Parameters
single_cls
: Whether to treat all categories as one categoryimage_weights
: Whether to use image weights (based on class frequency)
3.3 Main Functions of the create_dataloader Function
This function completes several key tasks:
def create_dataloader(
path,
imgsz,
batch_size,
stride,
single_cls=False,
hyp=None,
augment=False,
cache=False,
pad=0.0,
rect=False,
rank=-1,
workers=8,
image_weights=False,
quad=False,
prefix="",
shuffle=False,
seed=0,
):
# Create dataset instance
dataset = LoadImagesAndLabels(
path,
imgsz,
batch_size,
# Other parameters...
)
# Configure batch size and sampler
batch_size = min(batch_size, len(dataset))
sampler = None if rank == -1 else distributed.DistributedSampler(...)
# Select data loader type
loader = InfiniteDataLoader if image_weights else DataLoader
# Create and return data loader
return loader(
dataset,
batch_size=batch_size,
shuffle=shuffle and sampler is None,
# Other parameters...
), dataset
- Create dataset: Instantiate the
LoadImagesAndLabels
class - Determine batch size: Ensure batch size does not exceed dataset size
- Set sampler: Choose appropriate sampler based on whether distributed training is being used
- Select data loader type: Choose
DataLoader
orInfiniteDataLoader
based on whether image weights are used - Configure data loading parameters:
- Batch size
- Whether to shuffle
- Number of worker threads
- Sampler
- Whether to drop the last incomplete batch
- Memory pinning
- Collate function (collate_fn)
- Worker initialization function
- Random number generator
3.4 Where Data Augmentation Actually Happens
Data augmentation primarily occurs in the __getitem__
method of the LoadImagesAndLabels
class. Below is a simplified method flow:
def __getitem__(self, index):
# 1. Get index
index = self.indices[index]
# 2. Decide whether to use Mosaic augmentation
if self.mosaic and random.random() < self.hyp["mosaic"]:
# Load Mosaic-augmented image
img, labels = self.load_mosaic(index)
# Potentially apply MixUp augmentation
if random.random() < self.hyp["mixup"]:
img, labels = mixup(...)
else:
# Regular image loading
img, (h0, w0), (h, w) = self.load_image(index)
# Apply Letterbox
img, ratio, pad = letterbox(...)
# Process labels
labels = self.labels[index].copy()
# 3. Apply more augmentation operations
if self.augment:
# Random perspective transformation
img, labels = random_perspective(...)
# Albumentations library augmentation
img, labels = self.albumentations(img, labels)
# HSV color space augmentation
augment_hsv(...)
# Random flipping
if random.random() < self.hyp["flipud"]:
img = np.flipud(img)
if random.random() < self.hyp["fliplr"]:
img = np.fliplr(img)
# 4. Final processing
# Label format conversion
labels_out = torch.zeros((len(labels), 6))
# Image format conversion
img = img.transpose((2, 0, 1))[::-1]
return torch.from_numpy(img), labels_out, self.im_files[index], shapes
When the training loop requests a batch of data, this method will:
- Choose whether to apply Mosaic augmentation (based on probability)
- Apply random perspective transformation
- Apply Albumentations library augmentation
- Apply HSV color space augmentation
- Apply random flipping (vertical and horizontal)
- Optionally apply Cutout augmentation (seems to be commented out currently)
3.5 Summary
The entire YOLOv5 data loading process is:
- Call the
create_dataloader
function intrain.py
, with theaugment=True
parameter - The
create_dataloader
function creates an instance of theLoadImagesAndLabels
class and passesaugment=True
to it create_dataloader
wraps the dataset in PyTorch’sDataLoader
- The training loop uses this
DataLoader
to get data batches - Each time a batch is requested, the
__getitem__
method of theLoadImagesAndLabels
class is called, applying various data augmentations
This design allows YOLOv5 to flexibly handle various data formats and apply complex data augmentation strategies while maintaining code modularity and extensibility.
4. Complete Analysis of Data Augmentation Workflow in YOLOv5
4.1 Complete Call Flow
4.1.1 Batch Retrieval in Training Loop
In the training loop of train.py
, we can see code like this:
pbar = enumerate(train_loader)
...
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
callbacks.run("on_train_batch_start")
ni = i + nb * epoch # number integrated batches (since train start)
imgs = imgs.to(device, non_blocking=True).float() / 255 # uint8 to float32, 0-255 to 0.0-1.0
...
When iterating through train_loader
, PyTorch’s data loading process is actually being called. Here’s the complete call chain:
- The line
for i, (imgs, targets, paths, _) in pbar
triggers PyTorch’s data loading process - PyTorch’s
DataLoader
creates worker threads to get samples from the dataset DataLoader
callsLoadImagesAndLabels.__getitem__(index)
to get individual samplesDataLoader
uses thecollate_fn
function to combine multiple samples into a batch- Returns the combined batch data
(imgs, targets, paths, _)
to the training loop
4.1.2 Timing of getitem Method Calls
When the training process needs to load a batch of data:
- If it’s the first iteration,
DataLoader
creates an iterator - The iterator determines which sample indices to load based on batch size and sampler
- For each index,
DataLoader
callsdataset[index]
, which isLoadImagesAndLabels.__getitem__(index)
- This method returns a processed single sample (image and labels)
DataLoader
combines multiple samples into a batch and returns it to the training loop
4.1.3 Data Augmentation Implementation in getitem
Steps of data augmentation in the __getitem__
method:
def __getitem__(self, index):
"""Get a sample from the dataset, considering linear, random, or weighted sampling."""
index = self.indices[index] # linear, random, or weighted
hyp = self.hyp
if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
# Load Mosaic augmentation
img, labels = self.load_mosaic(index)
shapes = None
# MixUp augmentation
if random.random() < hyp["mixup"]:
img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
else:
# Regular image loading
img, (h0, w0), (h, w) = self.load_image(index)
# Letterbox
shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size
img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
shapes = (h0, w0), ((h / h0, w / w0), pad)
# Process labels
labels = self.labels[index].copy()
if labels.size:
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])
# Random perspective transformation
if self.augment:
img, labels = random_perspective(
img,
labels,
degrees=hyp["degrees"],
translate=hyp["translate"],
scale=hyp["scale"],
shear=hyp["shear"],
perspective=hyp["perspective"],
)
nl = len(labels) # number of labels
if nl:
labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1e-3)
# More data augmentation operations
if self.augment:
# Albumentations library augmentation
img, labels = self.albumentations(img, labels)
nl = len(labels) # update number of labels
# HSV color space augmentation
augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
# Vertical flip
if random.random() < hyp["flipud"]:
img = np.flipud(img)
if nl:
labels[:, 2] = 1 - labels[:, 2]
# Horizontal flip
if random.random() < hyp["fliplr"]:
img = np.fliplr(img)
if nl:
labels[:, 1] = 1 - labels[:, 1]
# Cutout (commented out)
# labels = cutout(img, labels, p=0.5)
# Format conversion
labels_out = torch.zeros((nl, 6))
if nl:
labels_out[:, 1:] = torch.from_numpy(labels)
# Image format conversion: HWC to CHW, BGR to RGB
img = img.transpose((2, 0, 1))[::-1]
img = np.ascontiguousarray(img)
return torch.from_numpy(img), labels_out, self.im_files[index], shapes
4.1.4 Detailed Explanation of Key Augmentation Operations
a. Mosaic Augmentation
- Randomly selects 4 images and combines them into one large image
- Randomly determines the mosaic center point position
- Adjusts the size and position of the four images
- Adjusts corresponding label coordinates
if mosaic := self.mosaic and random.random() < hyp["mosaic"]:
img, labels = self.load_mosaic(index)
b. MixUp Augmentation
- May be applied after Mosaic
- Mixes two Mosaic-augmented samples at a certain ratio
- Merges labels from both samples
if random.random() < hyp["mixup"]:
img, labels = mixup(img, labels, *self.load_mosaic(random.choice(self.indices)))
c. Random Perspective Transformation
- Applies rotation, translation, scaling, shearing, and other geometric transformations
- Simultaneously adjusts label coordinates to match the transformed image
if self.augment:
img, labels = random_perspective(
img,
labels,
degrees=hyp["degrees"],
translate=hyp["translate"],
scale=hyp["scale"],
shear=hyp["shear"],
perspective=hyp["perspective"],
)
d. Albumentations Library Augmentation
- Uses additional augmentations provided by the third-party Albumentations library
- This is a conditional operation, only applied if an albumentations object was created during initialization
if self.augment:
img, labels = self.albumentations(img, labels)
e. HSV Color Space Augmentation
- Adjusts hue, saturation, and brightness in HSV color space
- Random variation intensity controlled by hyperparameters
if self.augment:
augment_hsv(img, hgain=hyp["hsv_h"], sgain=hyp["hsv_s"], vgain=hyp["hsv_v"])
f. Random Flipping
- Performs random vertical and horizontal flipping
- Adjusts label coordinates accordingly
if self.augment:
if random.random() < hyp["flipud"]:
img = np.flipud(img)
if nl:
labels[:, 2] = 1 - labels[:, 2]
if random.random() < hyp["fliplr"]:
img = np.fliplr(img)
if nl:
labels[:, 1] = 1 - labels[:, 1]
4.1.5 Augmentation Probability Control
The application probability of each augmentation operation is controlled through hyperparameters hyp
:
hyp["mosaic"]
: Mosaic augmentation probabilityhyp["mixup"]
: MixUp augmentation probabilityhyp["flipud"]
: Vertical flip probabilityhyp["fliplr"]
: Horizontal flip probability
Other augmentation intensities are also controlled by hyperparameters:
hyp["degrees"]
: Rotation angle rangehyp["translate"]
: Translation rangehyp["scale"]
: Scaling rangehyp["shear"]
: Shearing rangehyp["perspective"]
: Perspective transformation intensityhyp["hsv_h"]
,hyp["hsv_s"]
,hyp["hsv_v"]
: HSV color space adjustment intensity
4.2 Summary: Complete Data Augmentation Flow
- Trigger timing: When the training loop iterates through
DataLoader
viafor i, (imgs, targets, paths, _) in pbar
- Call process: PyTorch
DataLoader
→ Worker threads →LoadImagesAndLabels.__getitem__(index)
→collate_fn
→ Batch data - Preprocessing: Load image, apply letterbox resizing, adjust label coordinates
- Main augmentations:
- Mosaic augmentation (combining 4 images)
- MixUp augmentation (mixing 2 samples)
- Random perspective transformation (rotation, translation, scaling, shearing)
- Albumentations library augmentation
- HSV color space augmentation
- Random flipping (vertical and horizontal)
- Format conversion: Convert image format, prepare tensor format required by PyTorch
- Return results: Processed image, labels, file path, and shape information
sequenceDiagram
participant Train as Training Loop
participant Loader as DataLoader
participant Dataset as LoadImagesAndLabels
participant Aug as Data Augmentation
Train->>Loader: Iterate request batch data
Loader->>Dataset: __getitem__(index)
Dataset->>Dataset: Load image
alt Mosaic augmentation
Dataset->>Aug: load_mosaic()
opt MixUp augmentation
Dataset->>Aug: mixup()
end
else Regular loading
Dataset->>Dataset: Load image + letterbox
opt Random perspective transformation
Dataset->>Aug: random_perspective()
end
end
opt Other augmentations
Dataset->>Aug: albumentations augmentation
Dataset->>Aug: HSV color space augmentation
Dataset->>Aug: Random flipping
end
Dataset->>Loader: Return processed sample
Loader->>Train: Return batch data
References:
文章对话
由AI生成的"小T"和"好奇宝宝"之间的对话,帮助理解文章内容