文章 on Jam Sylph's little universe https://jamsylph.top/posts/ Recent content in 文章 on Jam Sylph's little universe Hugo -- gohugo.io zh-cn Sun, 16 Oct 2022 00:00:00 +0000 YOLOv5目标检测代码精读 https://jamsylph.top/posts/yolov5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ Sun, 16 Oct 2022 00:00:00 +0000 https://jamsylph.top/posts/yolov5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ <h1 id="yolov5目标检测代码精读">YOLOv5目标检测代码精读</h1> <blockquote> <p>本文深入分析YOLOv5训练流程与数据增强机制,帮助个人梳理总结Yolov5这一目标检测模型的内部实现细节。</p></blockquote> <hr> <h2 id="1-trainpy-文件解析">1. train.py 文件解析</h2> <h3 id="11-import-部分">1.1 Import 部分</h3> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">argparse</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">math</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">random</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">subprocess</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">sys</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="k">try</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="kn">import</span> <span class="nn">comet_ml</span> <span class="c1"># must be imported before torch (if installed)</span> </span></span><span class="line"><span class="cl"><span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">comet_ml</span> <span class="o">=</span> <span class="kc">None</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch.distributed</span> <span class="k">as</span> <span class="nn">dist</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="nn">nn</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">yaml</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">torch.optim</span> <span class="kn">import</span> <span class="n">lr_scheduler</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">FILE</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span><span class="o">.</span><span class="n">resolve</span><span class="p">()</span> </span></span><span class="line"><span class="cl"><span class="n">ROOT</span> <span class="o">=</span> <span class="n">FILE</span><span class="o">.</span><span class="n">parents</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># YOLOv5 root directory</span> </span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="nb">str</span><span class="p">(</span><span class="n">ROOT</span><span class="p">)</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">ROOT</span><span class="p">))</span> <span class="c1"># add ROOT to PATH</span> </span></span><span class="line"><span class="cl"><span class="n">ROOT</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">relpath</span><span class="p">(</span><span class="n">ROOT</span><span class="p">,</span> <span class="n">Path</span><span class="o">.</span><span class="n">cwd</span><span class="p">()))</span> <span class="c1"># relative</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">val</span> <span class="k">as</span> <span class="nn">validate</span> <span class="c1"># for end-of-epoch mAP</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">models.experimental</span> <span class="kn">import</span> <span class="n">attempt_load</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">models.yolo</span> <span class="kn">import</span> <span class="n">Model</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.autoanchor</span> <span class="kn">import</span> <span class="n">check_anchors</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.autobatch</span> <span class="kn">import</span> <span class="n">check_train_batch_size</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.callbacks</span> <span class="kn">import</span> <span class="n">Callbacks</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.dataloaders</span> <span class="kn">import</span> <span class="n">create_dataloader</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.downloads</span> <span class="kn">import</span> <span class="n">attempt_download</span><span class="p">,</span> <span class="n">is_url</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.general</span> <span class="kn">import</span> <span class="p">(</span> </span></span><span class="line"><span class="cl"> <span class="n">LOGGER</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">TQDM_BAR_FORMAT</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_amp</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_dataset</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_file</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_git_info</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_git_status</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_img_size</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_requirements</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_suffix</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_yaml</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">colorstr</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">get_latest_run</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">increment_path</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">init_seeds</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">intersect_dicts</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_to_class_weights</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_to_image_weights</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">methods</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">one_cycle</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">print_args</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">print_mutation</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">strip_optimizer</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">yaml_save</span><span class="p">,</span> </span></span><span class="line"><span class="cl"><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loggers</span> <span class="kn">import</span> <span class="n">LOGGERS</span><span class="p">,</span> <span class="n">Loggers</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loggers.comet.comet_utils</span> <span class="kn">import</span> <span class="n">check_comet_resume</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loss</span> <span class="kn">import</span> <span class="n">ComputeLoss</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.metrics</span> <span class="kn">import</span> <span class="n">fitness</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.plots</span> <span class="kn">import</span> <span class="n">plot_evolve</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.torch_utils</span> <span class="kn">import</span> <span class="p">(</span> </span></span><span class="line"><span class="cl"> <span class="n">EarlyStopping</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">ModelEMA</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">de_parallel</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">select_device</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_DDP</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_optimizer</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_resume</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">torch_distributed_zero_first</span><span class="p">,</span> </span></span><span class="line"><span class="cl"><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">LOCAL_RANK</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;LOCAL_RANK&#34;</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">RANK</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;RANK&#34;</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">WORLD_SIZE</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;WORLD_SIZE&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">GIT_INFO</span> <span class="o">=</span> <span class="n">check_git_info</span><span class="p">()</span> </span></span></code></pre></div><h3 id="12-train-函数详解">1.2 Train() 函数详解</h3> <p>Train()函数是YOLOv5训练的核心函数,负责整个训练流程的管理:</p> CNN历程 https://jamsylph.top/posts/cnn%E5%8E%86%E7%A8%8B/ Sun, 01 Mar 2020 00:00:00 +0000 https://jamsylph.top/posts/cnn%E5%8E%86%E7%A8%8B/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li>2024-05-26:补充2023-2024年视觉模型最新进展,新增第六阶段架构分析</li> <li>2023-11-15:增加对现有 CNN 规律的梳理</li> <li>2023-09-15: 扩充第五阶段(2020至今)架构介绍,新增ConvNeXt分析</li> <li>2020-03-01: 首次发布文章</li> </ul></blockquote> <h1 id="cnn历程">CNN历程</h1> <h2 id="第一阶段奠基时代-1998-2011">第一阶段:奠基时代 (1998-2011)</h2> <h3 id="lenet-5-1998">LeNet-5 (1998)</h3> <p><strong>创始人</strong>:Yann LeCun</p> <p><strong>主要架构</strong>:</p> <ul> <li>7层结构:3个卷积层、2个池化层、2个全连接层</li> <li>使用5×5卷积核</li> <li>使用sigmoid/tanh激活函数</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>首次成功应用于实际问题(手写数字识别)</li> <li>确立了&quot;卷积层-池化层-全连接层&quot;的基本范式</li> <li>引入权重共享概念减少参数量</li> </ul> <p><strong>局限性</strong>:</p> <ul> <li>由于计算资源限制,网络较浅</li> <li>当时缺乏现代训练技巧,如批量归一化、ReLU激活函数</li> </ul> <p>LeNet-5的出现标志着CNN的正式诞生,但在随后的十年里,由于计算能力受限,其他传统机器学习方法表现优异,所以CNN发展缓慢,直到GPU计算能力的提升和大规模训练数据的出现才迎来转机。</p> <h2 id="第二阶段深度学习爆发期-2012-2014">第二阶段:深度学习爆发期 (2012-2014)</h2> <h3 id="alexnet-2012---深度学习革命的火种">AlexNet (2012) - 深度学习革命的火种</h3> <p><strong>创始人</strong>:Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton</p> <p><strong>主要架构</strong>:</p> <ul> <li>8层:5个卷积层、3个全连接层</li> <li>首次大量使用ReLU激活函数</li> <li>使用<strong>重叠最大池化</strong>(Overlapping Max Pooling)池化窗口大小大于步长,相邻输出单元有重叠的感受野,缓解过拟合,平滑特征过渡,扩大感受野(receptive field)</li> <li>使用Dropout防止过拟合</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>2012年ImageNet挑战赛冠军,错误率从26%降至15.3%</li> <li>深度学习革命的<strong>标志性事件</strong></li> <li>证明GPU对训练深度网络的重要性</li> <li>首次大规模使用数据增强(Data Augmentation):多尺度裁剪,水平翻转,PCA色彩扰动</li> </ul> <h3 id="zfnet-2013---打开cnn黑盒">ZFNet (2013) - 打开CNN黑盒</h3> <p><strong>创始人</strong>:Matthew Zeiler与Rob Fergus</p> <p><strong>主要架构</strong>:</p> <ul> <li>AlexNet的改进版</li> <li>更小的第一层卷积核(7×7代替11×11)</li> <li>更小的步长</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>2013年ImageNet挑战赛冠军</li> <li>首次通过可视化技术解释CNN内部工作机制</li> <li>引入&quot;(本质上是Transposed Convolution)转置卷积&quot;(Deconvolution是反卷积)概念</li> </ul> <p><strong>贡献</strong>:</p> Exploration of the Inception Architecture https://jamsylph.top/posts/exploration-of-the-inception-architecture/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/exploration-of-the-inception-architecture/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li> <p>2024-05-18:</p> <ul> <li>增强了&quot;自适应特征处理&quot;部分,细化了三个层次的自适应机制</li> <li>新增&quot;瓶颈层设计模式&quot;专题,深入分析降维-处理-升维的设计理念</li> </ul> </li> <li> <p>2023-10-20:</p> <ul> <li>全面修订文章结构,增强Inception思想普适性论述</li> <li>补充Inception对现代网络设计的长期影响</li> </ul> </li> <li> <p>2023-05-12:</p> <ul> <li>扩充其他网络中的Inception影响</li> <li>新增分割网络领域的Inception思想应用</li> </ul> </li> <li> <p>2022-08-15:</p> <ul> <li>更新YOLOv7部分内容</li> <li>补充C2f模块与SPPF模块的Inception思想应用</li> </ul> </li> <li> <p>2021-09-03:</p> <ul> <li>增加YOLOv5相关内容</li> <li>完善YOLO系列对Inception思想的演变分析</li> </ul> </li> </ul></blockquote> <h1 id="exploration-of-the-inception-architecture">Exploration of the Inception Architecture</h1> <h2 id="引言从模块创新到设计范式">引言:从模块创新到设计范式</h2> <p>inception模块也就是2014年GoogleNet中的其中一个创新点,其本质就是并行多分枝,实现了在不同尺度上特征多样化的提取,这一思想影响了众多后续网络设计。</p> <h2 id="inception设计原则的精髓">Inception设计原则的精髓</h2> <p>Inception模块打破了传统CNN的线性堆叠范式,引入了四个关键设计原则:</p> <ol> <li><strong>多尺度并行特征提取</strong>:同时使用不同感受野的卷积核捕获不同尺度的图像特征</li> <li><strong>计算效率优化</strong>:通过1×1卷积降维,实现的&quot;瓶颈层&quot;,减少计算量</li> <li><strong>网络宽度与深度平衡</strong>:在增加网络表达能力的同时避免参数数量爆炸</li> <li><strong>特征融合机制</strong>:通过通道拼接整合多路径提取的互补特征</li> </ol> <h2 id="yolo系列中的inception思想">YOLO系列中的Inception思想</h2> <h3 id="yolov3初步融合多尺度思想">YOLOv3:初步融合多尺度思想</h3> <ul> <li><strong>特征金字塔结构</strong>:通过<em>上采样和跳跃连接</em>融合不同尺度特征,也就是多尺度特征表示</li> <li><strong>SPP (Spatial Pyramid Pooling) 模块</strong>:采用<em>并行池化</em>操作聚合不同感受野的特征信息</li> </ul> <h3 id="yolov4inception思想的系统性应用">YOLOv4:Inception思想的系统性应用</h3> <p>Alexey Bochkovskiy团队</p> <ul> <li><strong>CSPDarknet53 backbone</strong>:采用CSP(Cross Stage Partial)连接,创建了多路径信息流,增强了特征重用,也就是并联</li> <li><strong>PANet (Path Aggregation Network) Neck</strong>:双向特征传递机制允许不同层次特征的有效融合,实现特征整合,使得各种信息更好融合</li> <li><strong>SPPCSP模块</strong>:在保留*空间金字塔池化(SPP)*多尺度处理能力的同时,通过CSP连接进一步提升了计算效率,达到计算效率优化</li> </ul> <h3 id="yolov5inception思想的精细化实现">YOLOv5:Inception思想的精细化实现</h3> <p>Ultralytics</p> <ol> <li><strong>Focus模块</strong>: <ul> <li>不使用普通卷积,而是将图像像素进行重排</li> <li>把2×2区域的像素分离成4个通道,类似&quot;并行采样&quot;</li> <li>这种空间特征重组方式提高了信息密度,减少了计算量</li> <li>体现了Inception的&quot;不同方式并行处理输入&quot;思想</li> </ul> </li> <li><strong>C3模块</strong>: <ul> <li>改进版CSP结构,将输入分成两路</li> <li>一路直接连接,一路通过多个残差块处理</li> <li>这种&quot;双路径&quot;设计与Inception的并行分支思想相似</li> <li>同时也提高了特征提取能力和计算效率</li> </ul> </li> <li><strong>SPPF模块</strong>*: <ul> <li>将SPP模块优化为<em>序列形式</em>,减少计算开销</li> <li>通过连续最大池化+特征融合实现多尺度特征提取</li> <li>相比YOLOv4的SPPCSP,更加轻量高效</li> </ul> </li> </ol> <h3 id="yolov7">YOLOv7</h3> <p>WongKinYiu团队</p> introduction-to-yolo https://jamsylph.top/posts/introduction-to-yolo/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/introduction-to-yolo/ <h1 id="yolo的历程">yolo的历程</h1> <p>YOLO(You Only Look Once)是一种流行的实时目标检测算法,它以其高效的性能和较高的准确率而闻名。与传统的目标检测方法不同,YOLO将目标检测视为一个回归问题,直接从完整图像预测边界框和类别概率。</p> <h2 id="yolo的基本原理">YOLO的基本原理</h2> <p>YOLO的核心思想是将整个图像划分为S×S的网格,每个网格负责预测包含在其中的目标。具体来说,每个网格预测:</p> <ol> <li>B个边界框及其置信度</li> <li>C个类别的条件概率</li> </ol> <p>这种方法使YOLO能够在单次前向传播中完成目标检测,大大提高了处理速度。</p> <h2 id="yolo的发展历程">YOLO的发展历程</h2> <h3 id="yolov1">YOLOv1</h3> <p>2016年,Joseph Redmon等人提出了第一版YOLO。YOLOv1虽然速度快,但准确率较低,尤其是对小目标的检测效果不佳。</p> <h3 id="yolov2yolo9000">YOLOv2/YOLO9000</h3> <p>YOLOv2引入了批量归一化、锚框等改进,并提出了YOLO9000,能够检测超过9000种不同的目标类别。</p> <h3 id="yolov3">YOLOv3</h3> <p>YOLOv3使用了更复杂的骨干网络Darknet-53,并采用了多尺度预测,显著提高了对小目标的检测能力。</p> <h3 id="yolov4">YOLOv4</h3> <p>YOLOv4引入了多种先进技术,如CSPDarknet53骨干网络、PANet路径聚合网络等,进一步提升了性能。</p> <h3 id="yolov5">YOLOv5</h3> <p>YOLOv5由Ultralytics开发,提供了多种不同大小的模型(S、M、L、X),可以根据需求选择速度和准确率的平衡点。</p> <h3 id="yolov6yolov7及更新版本">YOLOv6、YOLOv7及更新版本</h3> <p>随着研究的深入,YOLO算法不断演进,推出了更高效、更准确的版本。</p> <h2 id="yolo的应用场景">YOLO的应用场景</h2> <p>由于其实时性和较高的准确率,YOLO在多个领域有广泛应用:</p> <ul> <li>自动驾驶:检测道路上的车辆、行人和交通标志</li> <li>安防监控:识别异常行为和可疑物体</li> <li>工业检测:检测产品缺陷</li> <li>医学影像:辅助医生诊断疾病</li> <li>零售分析:跟踪商店中的客户行为</li> </ul> <h2 id="实现yolo的工具和框架">实现YOLO的工具和框架</h2> <p>目前有多种工具和框架可以帮助开发者实现YOLO算法:</p> <ul> <li>Darknet:YOLO的原始实现</li> <li>PyTorch:提供了多种YOLO的实现版本</li> <li>TensorFlow:也有YOLO的移植版本</li> <li>ONNX:支持将YOLO模型转换为通用格式</li> <li>OpenCV:提供了使用预训练YOLO模型的接口</li> </ul> <h2 id="结论">结论</h2> <p>YOLO算法凭借其出色的速度和准确率平衡,已成为计算机视觉领域最受欢迎的目标检测算法之一。随着算法的不断改进和硬件的发展,YOLO的应用前景将更加广阔。</p> <p>在未来的文章中,我将深入探讨YOLO的具体实现、训练技巧以及如何针对特定应用进行优化。敬请期待!</p> YOLOv5的dataloaders.py代码精读 https://jamsylph.top/posts/yolov5%E7%9A%84dataloaders.py%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/yolov5%E7%9A%84dataloaders.py%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ <blockquote> <p>在 yolov 5 目标检测任务中,我跑 <a href="yolo5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB.md">train. Py 代码</a> ,在train_loader 中,那我就调用了create_dataloader 的函数,在该函数内部创建了LoadImagesAndLabels 类的实例作为 dataset,在这个 create_dataloader 函数最后返回一个  DataLoader和数据集:</p></blockquote> <h1 id="-dataloaders-py">-dataloaders. Py</h1> <h1 id="pytorch-数据集中的-__getitem__-方法工作原理">PyTorch 数据集中的 <code>__getitem__</code> 方法工作原理</h1> <p><code>__getitem__</code> 是 Python 中的一个特殊方法(魔术方法),在 YOLOv 5 的 <code>LoadImagesAndLabels</code> 类中用于访问数据集中的单个样本。当您使用数据加载器或直接通过索引访问数据集时,这个方法会被调用。</p> <h2 id="访问流程">访问流程</h2> <p>当执行以下操作时,<code>__getitem__</code> 方法被调用:</p> <ol> <li>直接从数据集访问:<code>image, label = dataset[5]</code></li> <li>通过 DataLoader 迭代:<code>for images, labels in dataloader: ...</code></li> </ol> <p>在 DataLoader 中,<code>__getitem__</code> 会被多次并行调用(由 <code>num_workers</code> 参数决定),然后结果通过 <code>collate_fn</code> 方法合并为批次。</p> <h2 id="yolov-5-中-__getitem__-的工作流程">YOLOv 5 中 <code>__getitem__</code> 的工作流程</h2> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 1. 将传入的索引转换为实际使用的索引(处理线性、打乱或加权采样)</span> </span></span><span class="line"><span class="cl"> <span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">indices</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 2. 检查是否应用mosaic增强(基于配置和随机概率)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">mosaic</span> <span class="o">:=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mosaic</span> <span class="ow">and</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;mosaic&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 加载mosaic增强的图像和标签</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">load_mosaic</span><span class="p">(</span><span class="n">index</span><span class="p">)</span> <span class="c1"># 这里是您想修改为load_mosaic9的地方</span> </span></span><span class="line"><span class="cl"> <span class="n">shapes</span> <span class="o">=</span> <span class="kc">None</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 检查是否进一步应用mixup增强</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;mixup&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">mixup</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">load_mosaic</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">indices</span><span class="p">)))</span> </span></span><span class="line"><span class="cl"> <span class="k">else</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 3. 不使用mosaic时的常规图像加载和处理流程</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">w0</span><span class="p">),</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">load_image</span><span class="p">(</span><span class="n">index</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># Letterbox处理</span> </span></span><span class="line"><span class="cl"> <span class="n">shape</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">batch_shapes</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">rect</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">img_size</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">ratio</span><span class="p">,</span> <span class="n">pad</span> <span class="o">=</span> <span class="n">letterbox</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">shape</span><span class="p">,</span> <span class="n">auto</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">scaleup</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">shapes</span> <span class="o">=</span> <span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">w0</span><span class="p">),</span> <span class="p">((</span><span class="n">h</span> <span class="o">/</span> <span class="n">h0</span><span class="p">,</span> <span class="n">w</span> <span class="o">/</span> <span class="n">w0</span><span class="p">),</span> <span class="n">pad</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 处理标签</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">labels</span><span class="o">.</span><span class="n">size</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span> <span class="o">=</span> <span class="n">xywhn2xyxy</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:],</span> <span class="n">ratio</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">w</span><span class="p">,</span> <span class="n">ratio</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">h</span><span class="p">,</span> <span class="n">padw</span><span class="o">=</span><span class="n">pad</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">padh</span><span class="o">=</span><span class="n">pad</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 应用随机透视变换等增强</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">random_perspective</span><span class="p">(</span><span class="o">...</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 4. 标签格式转换</span> </span></span><span class="line"><span class="cl"> <span class="n">nl</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="n">xyxy2xywhn</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">],</span> <span class="n">w</span><span class="o">=</span><span class="n">img</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">h</span><span class="o">=</span><span class="n">img</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">clip</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 5. 应用更多的增强技术(如果启用)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># Albumentations增强</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">albumentations</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># HSV颜色空间增强</span> </span></span><span class="line"><span class="cl"> <span class="n">augment_hsv</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">hgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_h&#34;</span><span class="p">],</span> <span class="n">sgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_s&#34;</span><span class="p">],</span> <span class="n">vgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_v&#34;</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 上下翻转</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;flipud&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">flipud</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 左右翻转</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;fliplr&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">fliplr</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 6. 准备输出格式</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_out</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">nl</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_out</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 7. 图像格式转换:HWC-&gt;CHW, BGR-&gt;RGB</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">img</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ascontiguousarray</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 8. 返回最终处理好的数据</span> </span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">img</span><span class="p">),</span> <span class="n">labels_out</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">im_files</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="n">shapes</span> </span></span></code></pre></div><h2 id="dataloader-如何使用-__getitem__">DataLoader 如何使用 <code>__getitem__</code></h2> <ol> <li>PyTorch 的 DataLoader 创建多个工作进程(worker)</li> <li>每个 worker 负责获取批次中的部分样本,调用数据集的 <code>__getitem__</code> 方法</li> <li>所有样本收集完后,通过 <code>collate_fn</code> 函数合并为一个批次</li> <li>最终批次传递给模型进行训练</li> </ol> <h2 id="修改-mosaic-增强">修改 mosaic 增强</h2> <p>要将 4 图像 mosaic 修改为 9 图像 mosaic,您只需修改第 478 行的 <code>self.load_mosaic(index)</code> 为 <code>self.load_mosaic9(index)</code>。这样当启用 mosaic 增强时,系统会使用 9 张图片而不是 4 张图片来创建马赛克增强效果。</p> YOLOv8解码流程完全解析 https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/ <h1 id="yolov8解码流程完全解析">YOLOv8解码流程完全解析</h1> <blockquote> <p>本文详细分析了YOLOv8目标检测算法中的预测解码和后处理机制,包括DFL(Distribution Focal Loss)解码、非极大值抑制(NMS)等关键环节。</p></blockquote> <h2 id="目录">目录</h2> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#1-%e9%a2%84%e6%b5%8b%e8%a7%a3%e7%a0%81%e6%b5%81%e7%a8%8b-decode_predictions">1. 预测解码流程 (decode_predictions)</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#11-%e7%bd%91%e6%a0%bc%e7%82%b9%e7%94%9f%e6%88%90">1.1 网格点生成</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#12-%e7%89%b9%e5%be%81%e5%9b%be%e5%a4%84%e7%90%86">1.2 特征图处理</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#13-dfl%e8%a7%a3%e7%a0%81%e5%ae%9e%e7%8e%b0">1.3 DFL解码实现</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#14-%e5%9d%90%e6%a0%87%e8%bd%ac%e6%8d%a2">1.4 坐标转换</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#2-%e5%90%8e%e5%a4%84%e7%90%86%e6%b5%81%e7%a8%8b-post_process">2. 后处理流程 (post_process)</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#21-%e7%bd%ae%e4%bf%a1%e5%ba%a6%e8%bf%87%e6%bb%a4">2.1 置信度过滤</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#22-%e5%9d%90%e6%a0%87%e6%a0%bc%e5%bc%8f%e8%bd%ac%e6%8d%a2">2.2 坐标格式转换</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#23-%e9%9d%9e%e6%9e%81%e5%a4%a7%e5%80%bc%e6%8a%91%e5%88%b6">2.3 非极大值抑制</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#24-%e5%9d%90%e6%a0%87%e7%bc%a9%e6%94%be%e8%bf%98%e5%8e%9f">2.4 坐标缩放还原</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#3-%e6%a0%b8%e5%bf%83%e6%96%b9%e6%b3%95%e8%a7%a3%e6%9e%90">3. 核心方法解析</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#31-dist2bbox%e6%96%b9%e6%b3%95">3.1 dist2bbox方法</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#32-scale_boxes%e6%96%b9%e6%b3%95">3.2 scale_boxes方法</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#4-%e5%ae%8c%e6%95%b4%e4%bb%a3%e7%a0%81%e5%8f%82%e8%80%83">4. 完整代码参考</a></li> </ul> <h2 id="1-预测解码流程-decode_predictions">1. 预测解码流程 (decode_predictions)</h2> <p>YOLOv8采用anchor-free设计,预测解码过程将网络输出转换为标准的边界框格式。整个流程可分为以下几个关键步骤:</p> <h3 id="11-网格点生成">1.1 网格点生成</h3> <p>为每个特征图生成参考点坐标和对应的stride值:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 生成锚点和对应步长</span> </span></span><span class="line"><span class="cl"><span class="n">anchors</span><span class="p">,</span> <span class="n">strides</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">make_anchors</span><span class="p">(</span><span class="n">predictions</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">stride</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="c1"># anchors: 所有特征图的网格点坐标</span> </span></span><span class="line"><span class="cl"><span class="c1"># strides: 对应的stride值(8/16/32)</span> </span></span></code></pre></div><p>这一步完成了:</p> <ul> <li>为三个特征图(P3/P4/P5)生成网格点</li> <li>生成对应的stride值(P3:8, P4:16, P5:32)</li> </ul> <h3 id="12-特征图处理">1.2 特征图处理</h3> <p>将三个尺度的特征图预测结果统一处理,分离边界框和类别预测:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 将三个特征图的预测结果拼接</span> </span></span><span class="line"><span class="cl"><span class="n">x_cat</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">([</span><span class="n">xi</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nc</span> <span class="o">+</span> <span class="mi">16</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">xi</span> <span class="ow">in</span> <span class="n">predictions</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="mi">2</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="c1"># P3: (1, nc+64, 6400) # 80*80=6400</span> </span></span><span class="line"><span class="cl"><span class="c1"># P4: (1, nc+64, 1600) # 40*40=1600</span> </span></span><span class="line"><span class="cl"><span class="c1"># P5: (1, nc+64, 400) # 20*20=400</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># 分离边界框预测和类别预测</span> </span></span><span class="line"><span class="cl"><span class="n">box</span><span class="p">,</span> <span class="bp">cls</span> <span class="o">=</span> <span class="n">x_cat</span><span class="o">.</span><span class="n">split</span><span class="p">((</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nc</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="c1"># box: (1, 64, 8400) # 64=16*4,每个坐标用16个值编码</span> </span></span><span class="line"><span class="cl"><span class="c1"># cls: (1, nc, 8400) # nc是类别数</span> </span></span></code></pre></div><p>维度解包为DFL解码做准备:</p> 从MambaStock看Mamba https://jamsylph.top/posts/mambastock%E8%AE%BA%E6%96%87%E5%A4%8D%E7%8E%B0%E5%8F%8A%E6%80%9D%E8%80%83/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/mambastock%E8%AE%BA%E6%96%87%E5%A4%8D%E7%8E%B0%E5%8F%8A%E6%80%9D%E8%80%83/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li>2024-06-16: <ul> <li>总结实验结果和未来改进方向</li> <li>添加与其他模型的全面比较分析</li> <li>完善技术洞察和最终结论</li> </ul> </li> <li>2024-06-15: <ul> <li>创建MambaStock论文复现文档</li> <li>详细分析Mamba模型架构</li> <li>实现MambaStock改进设计</li> </ul> </li> </ul></blockquote> <h1 id="从mambastock看mamba">从MambaStock看Mamba</h1> <h2 id="1-mamba模型架构回顾">1. Mamba模型架构回顾</h2> <p>根据论文,MambaStock结构是根据Mamba模型的改进,那先看Mamba模型的主要架构(mamba.py文件)</p> <h3 id="11-模型整体架构设计">1.1 模型整体架构设计</h3> <p>Mamba采用层次化设计,由外向内可分为三层核心结构: Mamba → ResidualBlock → MambaBlock</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Mamba</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">ResidualBlock</span><span class="p">(</span><span class="n">config</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)])</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">norm_f</span> <span class="o">=</span> <span class="n">RMSNorm</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">d_model</span><span class="p">)</span> </span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">ResidualBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">mixer</span> <span class="o">=</span> <span class="n">MambaBlock</span><span class="p">(</span><span class="n">config</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">RMSNorm</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">d_model</span><span class="p">)</span> </span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MambaBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 各种layer的定义...</span> </span></span></code></pre></div><ul> <li>最外层结构是——Mamba类,创建了n_layer个ResidualBlock的list</li> <li>中间层——其次每个ResidualBlock内部都有一个Mambablock作为核心的计算单元</li> <li>最后在最内层——MambaBlock囊括了所有实际的计算逻辑,投影层+卷积层+SSM计算</li> </ul> <p>这里的三层结构设计刚好就体现了现代深度学习架构的关键构思:<strong>模块化</strong>、<strong>残差连接</strong>以及<strong>层标准化</strong></p> <h3 id="12-mambaconfig参数化配置的精髓">1.2 MambaConfig:参数化配置的精髓</h3> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@dataclass</span> </span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MambaConfig</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span> <span class="c1"># 模型维度 D</span> </span></span><span class="line"><span class="cl"> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span> <span class="c1"># 层数</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_rank</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;auto&#39;</span> <span class="c1"># Δ投影的秩</span> </span></span><span class="line"><span class="cl"> <span class="n">d_state</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">16</span> <span class="c1"># 状态空间维度 N</span> </span></span><span class="line"><span class="cl"> <span class="n">expand_factor</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># 扩展因子 E</span> </span></span><span class="line"><span class="cl"> <span class="n">d_conv</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1"># 卷积核大小</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># Δ参数初始化相关配置</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_min</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.001</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_max</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_init</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&#34;random&#34;</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_scale</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1.0</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_init_floor</span> <span class="o">=</span> <span class="mf">1e-4</span> </span></span></code></pre></div><p>精细的参数化配置使得模型具有高度可调整性,对比Transformer的配置来看,Mamba引入了多个特有参数:</p>