Jam Sylph's little universe https://jamsylph.top/ Recent content on Jam Sylph's little universe Hugo -- gohugo.io zh-cn Mon, 13 Jan 2025 00:00:00 +0000 关于 https://jamsylph.top/about/ Mon, 13 Jan 2025 00:00:00 +0000 https://jamsylph.top/about/ <link rel="stylesheet" href="https://jamsylph.top/css/timeline.css"> <link rel="stylesheet" href="https://jamsylph.top/css/icons.css"> <h1 id="blabla">BLABLA</h1> <ul> <li>hi,我是Jam Sylph😉,欢迎来到我的个人博客。目前是一枚小小的算法工程师~~</li> <li>喜欢探索新事物,永远保持满满的好奇心🤔</li> <li>刚把我的小ja挪到Hugo来,慢慢地搬家,嘿嘿🙂</li> </ul> <div class="about-section"> <h2>我的成长历程</h2> <div class="timeline"> <div class="timeline-container left"> <div class="timeline-content"> <div class="timeline-date">2017-2018</div> <h3>编程启蒙</h3> <div class="timeline-details"> <p>开启我的编程之旅:</p> <ul> <li>学习 AndroidStudio 开发</li> <li>探索编程基础知识</li> <li>初次接触前端开发</li> </ul> </div> </div> </div> <div class="timeline-container right"> <div class="timeline-content"> <div class="timeline-date">2018-2019</div> <h3>跨界学习</h3> <div class="timeline-details"> <p>拓展多领域知识:</p> <ul> <li>探索公众号等自媒体平台</li> <li>学习产业经济学</li> <li>掌握产品原型设计</li> </ul> </div> </div> </div> <div class="timeline-container left"> <div class="timeline-content"> <div class="timeline-date">2019-2020</div> <h3>数据分析之路</h3> <div class="timeline-details"> <p>专注数据分析与统计:</p> <ul> <li>掌握统计学基础</li> <li>学习 Stata 计量经济分析</li> <li>开始 Python 编程</li> <li>参与多个数据分析项目</li> </ul> </div> </div> </div> <div class="timeline-container right"> <div class="timeline-content"> <div class="timeline-date">2020-2021</div> <h3>机器学习探索</h3> <div class="timeline-details"> <p>深入机器学习及其实际应用:</p> <ul> <li>完成数字经济背景下的制造业收入预测分析</li> <li>参与无人售货机创业项目</li> <li>应用预测分析和计算机视觉算法</li> </ul> </div> </div> </div> <div class="timeline-container left"> <div class="timeline-content"> <div class="timeline-date">2021-2025</div> <h3>职业发展</h3> <div class="timeline-details"> <p>作为计算机视觉工程师的专业成长:</p> <ul> <li>主导多个工业视觉项目: <ul> <li>缺陷检测系统</li> <li>目标检测应用</li> <li>异常分析解决方案</li> </ul> </li> <li>探索大语言模型(2023至今): <ul> <li>研究 GPT 架构及应用</li> <li>关注 LLAMA 3 发展</li> <li>研究 QWEN 等开源模型</li> </ul> </li> </ul> <p class="timeline-highlight">4年全职工作经验 • 完成16+个项目</p> </div> </div> </div> </div> </div> <div class="about-section"> <h2>我的兴趣</h2> <div class="interests-container"> <div class="interest-item"> <span class="interest-icon">🎬</span> <div class="interest-text">恐怖电影</div> </div> <div class="interest-item"> <span class="interest-icon">🎵</span> <div class="interest-text">KPOP</div> </div> <div class="interest-item"> <span class="interest-icon">⛰️</span> <div class="interest-text">爬山</div> </div> <div class="interest-item"> <span class="interest-icon">🥾</span> <div class="interest-text">徒步</div> </div> <div class="interest-item"> <span class="interest-icon">🚶</span> <div class="interest-text">散步</div> </div> <div class="interest-item"> <span class="interest-icon">🏃‍♂️</span> <div class="interest-text">马拉松</div> </div> <div class="interest-item"> <span class="interest-icon">🏃‍♀️</span> <div class="interest-text">跑步</div> </div> <div class="interest-item"> <span class="interest-icon">🥑</span> <div class="interest-text">生酮生活</div> </div> <div class="interest-item"> <span class="interest-icon">📺</span> <div class="interest-text">美剧</div> </div> <div class="interest-item"> <span class="interest-icon">🇯🇵</span> <div class="interest-text">日剧</div> </div> <div class="interest-item"> <span class="interest-icon">🎨</span> <div class="interest-text">动漫</div> </div> <div class="interest-item"> <span class="interest-icon">🎤</span> <div class="interest-text">听Live</div> </div> <div class="interest-item"> <span class="interest-icon">🍳</span> <div class="interest-text">烹饪</div> </div> <div class="interest-item"> <span class="interest-icon">🌳</span> <div class="interest-text">逛公园</div> </div> <div class="interest-item"> <span class="interest-icon">💻</span> <div class="interest-text">编程</div> </div> <div class="interest-item"> <span class="interest-icon">🔍</span> <div class="interest-text">开源项目</div> </div> </div> </div> <div class="about-section"> <h2>联系方式</h2> <div class="social-links"> <a href="https://github.com/jamsylph" target="_blank" class="social-link"> <svg class="social-icon" viewBox="0 0 24 24" fill="currentColor"> <path d="M12 .297c-6.63 0-12 5.373-12 12 0 5.303 3.438 9.8 8.205 11.385.6.113.82-.258.82-.577 0-.285-.01-1.04-.015-2.04-3.338.724-4.042-1.61-4.042-1.61C4.422 18.07 3.633 17.7 3.633 17.7c-1.087-.744.084-.729.084-.729 1.205.084 1.838 1.236 1.838 1.236 1.07 1.835 2.809 1.305 3.495.998.108-.776.417-1.305.76-1.605-2.665-.3-5.466-1.332-5.466-5.93 0-1.31.465-2.38 1.235-3.22-.135-.303-.54-1.523.105-3.176 0 0 1.005-.322 3.3 1.23.96-.267 1.98-.399 3-.405 1.02.006 2.04.138 3 .405 2.28-1.552 3.285-1.23 3.285-1.23.645 1.653.24 2.873.12 3.176.765.84 1.23 1.91 1.23 3.22 0 4.61-2.805 5.625-5.475 5.92.42.36.81 1.096.81 2.22 0 1.606-.015 2.896-.015 3.286 0 .315.21.69.825.57C20.565 22.092 24 17.592 24 12.297c0-6.627-5.373-12-12-12" /> </svg> <span class="social-text">GitHub</span> </a> <a href="https://linkedin.com/in/jamsylph" target="_blank" class="social-link"> <svg class="social-icon" viewBox="0 0 24 24" fill="currentColor"> <path d="M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z" /> </svg> <span class="social-text">LinkedIn</span> </a> <a href="mailto:jamsylph2@gmail.com" class="social-link"> <svg class="social-icon" viewBox="0 0 24 24" fill="currentColor"> <path d="M22 6c0-1.1-.9-2-2-2H4c-1.1 0-2 .9-2 2v12c0 1.1.9 2 2 2h16c1.1 0 2-.9 2-2V6zm-2 0l-8 5-8-5h16zm0 12H4V8l8 5 8-5v10z" /> </svg> <span class="social-text">Email</span> </a> <a href="https://discord.com/users/jamsylph" target="_blank" class="social-link"> <svg class="social-icon" viewBox="0 0 24 24" fill="currentColor"> <path d="M20.317 4.3698a19.7913 19.7913 0 00-4.8851-1.5152.0741.0741 0 00-.0785.0371c-.211.3753-.4447.8648-.6083 1.2495-1.8447-.2762-3.68-.2762-5.4868 0-.1636-.3933-.4058-.8742-.6177-1.2495a.077.077 0 00-.0785-.037 19.7363 19.7363 0 00-4.8852 1.515.0699.0699 0 00-.0321.0277C.5334 9.0458-.319 13.5799.0992 18.0578a.0824.0824 0 00.0312.0561c2.0528 1.5076 4.0413 2.4228 5.9929 3.0294a.0777.0777 0 00.0842-.0276c.4616-.6304.8731-1.2952 1.226-1.9942a.076.076 0 00-.0416-.1057c-.6528-.2476-1.2743-.5495-1.8722-.8923a.077.077 0 01-.0076-.1277c.1258-.0943.2517-.1923.3718-.2914a.0743.0743 0 01.0776-.0105c3.9278 1.7933 8.18 1.7933 12.0614 0a.0739.0739 0 01.0785.0095c.1202.099.246.1981.3728.2924a.077.077 0 01-.0066.1276 12.2986 12.2986 0 01-1.873.8914.0766.0766 0 00-.0407.1067c.3604.698.7719 1.3628 1.225 1.9932a.076.076 0 00.0842.0286c1.961-.6067 3.9495-1.5219 6.0023-3.0294a.077.077 0 00.0313-.0552c.5004-5.177-.8382-9.6739-3.5485-13.6604a.061.061 0 00-.0312-.0286zM8.02 15.3312c-1.1825 0-2.1569-1.0857-2.1569-2.419 0-1.3332.9555-2.4189 2.157-2.4189 1.2108 0 2.1757 1.0952 2.1568 2.419 0 1.3332-.9555 2.4189-2.1569 2.4189zm7.9748 0c-1.1825 0-2.1569-1.0857-2.1569-2.419 0-1.3332.9554-2.4189 2.1569-2.4189 1.2108 0 2.1757 1.0952 2.1568 2.419 0 1.3332-.946 2.4189-2.1568 2.4189Z" /> </svg> <span class="social-text">Discord</span> </a> <a href="https://instagram.com/jamsylph" target="_blank" class="social-link"> <svg class="social-icon" viewBox="0 0 24 24" fill="currentColor"> <path d="M12 0C8.74 0 8.333.015 7.053.072 5.775.132 4.905.333 4.14.63c-.789.306-1.459.717-2.126 1.384S.935 3.35.63 4.14C.333 4.905.131 5.775.072 7.053.012 8.333 0 8.74 0 12s.015 3.667.072 4.947c.06 1.277.261 2.148.558 2.913.306.788.717 1.459 1.384 2.126.667.666 1.336 1.079 2.126 1.384.766.296 1.636.499 2.913.558C8.333 23.988 8.74 24 12 24s3.667-.015 4.947-.072c1.277-.06 2.148-.262 2.913-.558.788-.306 1.459-.718 2.126-1.384.666-.667 1.079-1.335 1.384-2.126.296-.765.499-1.636.558-2.913.06-1.28.072-1.687.072-4.947s-.015-3.667-.072-4.947c-.06-1.277-.262-2.149-.558-2.913-.306-.789-.718-1.459-1.384-2.126C21.319 1.347 20.651.935 19.86.63c-.765-.297-1.636-.499-2.913-.558C15.667.012 15.26 0 12 0zm0 2.16c3.203 0 3.585.016 4.85.071 1.17.055 1.805.249 2.227.415.562.217.96.477 1.382.896.419.42.679.819.896 1.381.164.422.36 1.057.413 2.227.057 1.266.07 1.646.07 4.85s-.015 3.585-.074 4.85c-.061 1.17-.256 1.805-.421 2.227-.224.562-.479.96-.899 1.382-.419.419-.824.679-1.38.896-.42.164-1.065.36-2.235.413-1.274.057-1.649.07-4.859.07-3.211 0-3.586-.015-4.859-.074-1.171-.061-1.816-.256-2.236-.421-.569-.224-.96-.479-1.379-.899-.421-.419-.69-.824-.9-1.38-.165-.42-.359-1.065-.42-2.235-.045-1.26-.061-1.649-.061-4.844 0-3.196.016-3.586.061-4.861.061-1.17.255-1.814.42-2.234.21-.57.479-.96.9-1.381.419-.419.81-.689 1.379-.898.42-.166 1.051-.361 2.221-.421 1.275-.045 1.65-.06 4.859-.06l.045.03zm0 3.678c-3.405 0-6.162 2.76-6.162 6.162 0 3.405 2.76 6.162 6.162 6.162 3.405 0 6.162-2.76 6.162-6.162 0-3.405-2.76-6.162-6.162-6.162zM12 16c-2.21 0-4-1.79-4-4s1.79-4 4-4 4 1.79 4 4-1.79 4-4 4zm7.846-10.405c0 .795-.646 1.44-1.44 1.44-.795 0-1.44-.646-1.44-1.44 0-.794.646-1.439 1.44-1.439.793-.001 1.44.645 1.44 1.439z"/> </svg> <span class="social-text">Instagram</span> </a> </div> </div> <style> .about-section { margin: 3rem 0; padding-bottom: 1.5rem; border-bottom: 1px solid rgba(255, 255, 255, 0.1); } .about-section:last-child { border-bottom: none; } .interests-container { display: flex; flex-wrap: wrap; gap: 1.5rem; margin-top: 1.5rem; } .interest-item { display: flex; flex-direction: column; align-items: center; background-color: rgba(80, 80, 80, 0.2); border-radius: 12px; padding: 1.25rem; min-width: 120px; transition: all 0.3s ease; flex: 1 1 calc(16.666% - 1.5rem); } .interest-item:hover { background-color: rgba(0, 123, 255, 0.2); transform: translateY(-5px); } .interest-icon { font-size: 2.5rem; margin-bottom: 0.75rem; } .interest-text { font-weight: 500; } .timeline-highlight { font-weight: 600; color: #007bff; margin-top: 0.5rem; } /* 联系方式样式 */ .social-links { display: flex; flex-wrap: wrap; gap: 1rem; margin-top: 1.5rem; justify-content: center; } .social-link { display: flex; flex-direction: column; align-items: center; background-color: rgba(80, 80, 80, 0.2); border-radius: 12px; padding: 1.25rem; min-width: 120px; transition: all 0.3s ease; flex: 1 1 calc(16.666% - 1.5rem); } .social-link:hover { background-color: rgba(0, 123, 255, 0.2); transform: translateY(-5px); } .social-icon { width: 24px; height: 24px; margin-bottom: 0.75rem; } .social-text { font-weight: 500; } @media (max-width: 1200px) { .interest-item { flex: 1 1 calc(20% - 1.5rem); } } @media (max-width: 992px) { .interest-item { flex: 1 1 calc(25% - 1.5rem); } .social-link { padding: 1rem; } } @media (max-width: 768px) { .interests-container { gap: 1rem; } .interest-item { min-width: calc(33.333% - 1rem); padding: 1rem; flex: 1 1 calc(33.333% - 1rem); } .social-link { padding: 1rem; } } @media (max-width: 576px) { .interest-item { min-width: calc(50% - 1rem); flex: 1 1 calc(50% - 1rem); } .social-link { padding: 1rem; } } </style> <script> document.addEventListener('DOMContentLoaded', function() { // 为时间轴添加动画效果 const timelineItems = document.querySelectorAll('.timeline-container'); const observer = new IntersectionObserver((entries) => { entries.forEach(entry => { if (entry.isIntersecting) { entry.target.style.opacity = 1; entry.target.style.transform = 'translateY(0)'; } }); }, { threshold: 0.1 }); timelineItems.forEach(item => { item.style.opacity = 0; item.style.transform = 'translateY(20px)'; item.style.transition = 'opacity 0.5s ease, transform 0.5s ease'; observer.observe(item); }); }); </script> YOLOv5目标检测代码精读 https://jamsylph.top/posts/yolov5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ Sun, 16 Oct 2022 00:00:00 +0000 https://jamsylph.top/posts/yolov5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ <h1 id="yolov5目标检测代码精读">YOLOv5目标检测代码精读</h1> <blockquote> <p>本文深入分析YOLOv5训练流程与数据增强机制,帮助个人梳理总结Yolov5这一目标检测模型的内部实现细节。</p></blockquote> <hr> <h2 id="1-trainpy-文件解析">1. train.py 文件解析</h2> <h3 id="11-import-部分">1.1 Import 部分</h3> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">argparse</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">math</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">random</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">subprocess</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">sys</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">copy</span> <span class="kn">import</span> <span class="n">deepcopy</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="k">try</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="kn">import</span> <span class="nn">comet_ml</span> <span class="c1"># must be imported before torch (if installed)</span> </span></span><span class="line"><span class="cl"><span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">comet_ml</span> <span class="o">=</span> <span class="kc">None</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch.distributed</span> <span class="k">as</span> <span class="nn">dist</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="nn">nn</span> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">yaml</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">torch.optim</span> <span class="kn">import</span> <span class="n">lr_scheduler</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">FILE</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span><span class="o">.</span><span class="n">resolve</span><span class="p">()</span> </span></span><span class="line"><span class="cl"><span class="n">ROOT</span> <span class="o">=</span> <span class="n">FILE</span><span class="o">.</span><span class="n">parents</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># YOLOv5 root directory</span> </span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="nb">str</span><span class="p">(</span><span class="n">ROOT</span><span class="p">)</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">ROOT</span><span class="p">))</span> <span class="c1"># add ROOT to PATH</span> </span></span><span class="line"><span class="cl"><span class="n">ROOT</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">relpath</span><span class="p">(</span><span class="n">ROOT</span><span class="p">,</span> <span class="n">Path</span><span class="o">.</span><span class="n">cwd</span><span class="p">()))</span> <span class="c1"># relative</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">val</span> <span class="k">as</span> <span class="nn">validate</span> <span class="c1"># for end-of-epoch mAP</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">models.experimental</span> <span class="kn">import</span> <span class="n">attempt_load</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">models.yolo</span> <span class="kn">import</span> <span class="n">Model</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.autoanchor</span> <span class="kn">import</span> <span class="n">check_anchors</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.autobatch</span> <span class="kn">import</span> <span class="n">check_train_batch_size</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.callbacks</span> <span class="kn">import</span> <span class="n">Callbacks</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.dataloaders</span> <span class="kn">import</span> <span class="n">create_dataloader</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.downloads</span> <span class="kn">import</span> <span class="n">attempt_download</span><span class="p">,</span> <span class="n">is_url</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.general</span> <span class="kn">import</span> <span class="p">(</span> </span></span><span class="line"><span class="cl"> <span class="n">LOGGER</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">TQDM_BAR_FORMAT</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_amp</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_dataset</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_file</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_git_info</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_git_status</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_img_size</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_requirements</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_suffix</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">check_yaml</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">colorstr</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">get_latest_run</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">increment_path</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">init_seeds</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">intersect_dicts</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_to_class_weights</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_to_image_weights</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">methods</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">one_cycle</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">print_args</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">print_mutation</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">strip_optimizer</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">yaml_save</span><span class="p">,</span> </span></span><span class="line"><span class="cl"><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loggers</span> <span class="kn">import</span> <span class="n">LOGGERS</span><span class="p">,</span> <span class="n">Loggers</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loggers.comet.comet_utils</span> <span class="kn">import</span> <span class="n">check_comet_resume</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.loss</span> <span class="kn">import</span> <span class="n">ComputeLoss</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.metrics</span> <span class="kn">import</span> <span class="n">fitness</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.plots</span> <span class="kn">import</span> <span class="n">plot_evolve</span> </span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">utils.torch_utils</span> <span class="kn">import</span> <span class="p">(</span> </span></span><span class="line"><span class="cl"> <span class="n">EarlyStopping</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">ModelEMA</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">de_parallel</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">select_device</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_DDP</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_optimizer</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">smart_resume</span><span class="p">,</span> </span></span><span class="line"><span class="cl"> <span class="n">torch_distributed_zero_first</span><span class="p">,</span> </span></span><span class="line"><span class="cl"><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">LOCAL_RANK</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;LOCAL_RANK&#34;</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">RANK</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;RANK&#34;</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">WORLD_SIZE</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">&#34;WORLD_SIZE&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">GIT_INFO</span> <span class="o">=</span> <span class="n">check_git_info</span><span class="p">()</span> </span></span></code></pre></div><h3 id="12-train-函数详解">1.2 Train() 函数详解</h3> <p>Train()函数是YOLOv5训练的核心函数,负责整个训练流程的管理:</p> CNN历程 https://jamsylph.top/posts/cnn%E5%8E%86%E7%A8%8B/ Sun, 01 Mar 2020 00:00:00 +0000 https://jamsylph.top/posts/cnn%E5%8E%86%E7%A8%8B/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li>2024-05-26:补充2023-2024年视觉模型最新进展,新增第六阶段架构分析</li> <li>2023-11-15:增加对现有 CNN 规律的梳理</li> <li>2023-09-15: 扩充第五阶段(2020至今)架构介绍,新增ConvNeXt分析</li> <li>2020-03-01: 首次发布文章</li> </ul></blockquote> <h1 id="cnn历程">CNN历程</h1> <h2 id="第一阶段奠基时代-1998-2011">第一阶段:奠基时代 (1998-2011)</h2> <h3 id="lenet-5-1998">LeNet-5 (1998)</h3> <p><strong>创始人</strong>:Yann LeCun</p> <p><strong>主要架构</strong>:</p> <ul> <li>7层结构:3个卷积层、2个池化层、2个全连接层</li> <li>使用5×5卷积核</li> <li>使用sigmoid/tanh激活函数</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>首次成功应用于实际问题(手写数字识别)</li> <li>确立了&quot;卷积层-池化层-全连接层&quot;的基本范式</li> <li>引入权重共享概念减少参数量</li> </ul> <p><strong>局限性</strong>:</p> <ul> <li>由于计算资源限制,网络较浅</li> <li>当时缺乏现代训练技巧,如批量归一化、ReLU激活函数</li> </ul> <p>LeNet-5的出现标志着CNN的正式诞生,但在随后的十年里,由于计算能力受限,其他传统机器学习方法表现优异,所以CNN发展缓慢,直到GPU计算能力的提升和大规模训练数据的出现才迎来转机。</p> <h2 id="第二阶段深度学习爆发期-2012-2014">第二阶段:深度学习爆发期 (2012-2014)</h2> <h3 id="alexnet-2012---深度学习革命的火种">AlexNet (2012) - 深度学习革命的火种</h3> <p><strong>创始人</strong>:Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton</p> <p><strong>主要架构</strong>:</p> <ul> <li>8层:5个卷积层、3个全连接层</li> <li>首次大量使用ReLU激活函数</li> <li>使用<strong>重叠最大池化</strong>(Overlapping Max Pooling)池化窗口大小大于步长,相邻输出单元有重叠的感受野,缓解过拟合,平滑特征过渡,扩大感受野(receptive field)</li> <li>使用Dropout防止过拟合</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>2012年ImageNet挑战赛冠军,错误率从26%降至15.3%</li> <li>深度学习革命的<strong>标志性事件</strong></li> <li>证明GPU对训练深度网络的重要性</li> <li>首次大规模使用数据增强(Data Augmentation):多尺度裁剪,水平翻转,PCA色彩扰动</li> </ul> <h3 id="zfnet-2013---打开cnn黑盒">ZFNet (2013) - 打开CNN黑盒</h3> <p><strong>创始人</strong>:Matthew Zeiler与Rob Fergus</p> <p><strong>主要架构</strong>:</p> <ul> <li>AlexNet的改进版</li> <li>更小的第一层卷积核(7×7代替11×11)</li> <li>更小的步长</li> </ul> <p><strong>突破点</strong>:</p> <ul> <li>2013年ImageNet挑战赛冠军</li> <li>首次通过可视化技术解释CNN内部工作机制</li> <li>引入&quot;(本质上是Transposed Convolution)转置卷积&quot;(Deconvolution是反卷积)概念</li> </ul> <p><strong>贡献</strong>:</p> Exploration of the Inception Architecture https://jamsylph.top/posts/exploration-of-the-inception-architecture/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/exploration-of-the-inception-architecture/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li> <p>2024-05-18:</p> <ul> <li>增强了&quot;自适应特征处理&quot;部分,细化了三个层次的自适应机制</li> <li>新增&quot;瓶颈层设计模式&quot;专题,深入分析降维-处理-升维的设计理念</li> </ul> </li> <li> <p>2023-10-20:</p> <ul> <li>全面修订文章结构,增强Inception思想普适性论述</li> <li>补充Inception对现代网络设计的长期影响</li> </ul> </li> <li> <p>2023-05-12:</p> <ul> <li>扩充其他网络中的Inception影响</li> <li>新增分割网络领域的Inception思想应用</li> </ul> </li> <li> <p>2022-08-15:</p> <ul> <li>更新YOLOv7部分内容</li> <li>补充C2f模块与SPPF模块的Inception思想应用</li> </ul> </li> <li> <p>2021-09-03:</p> <ul> <li>增加YOLOv5相关内容</li> <li>完善YOLO系列对Inception思想的演变分析</li> </ul> </li> </ul></blockquote> <h1 id="exploration-of-the-inception-architecture">Exploration of the Inception Architecture</h1> <h2 id="引言从模块创新到设计范式">引言:从模块创新到设计范式</h2> <p>inception模块也就是2014年GoogleNet中的其中一个创新点,其本质就是并行多分枝,实现了在不同尺度上特征多样化的提取,这一思想影响了众多后续网络设计。</p> <h2 id="inception设计原则的精髓">Inception设计原则的精髓</h2> <p>Inception模块打破了传统CNN的线性堆叠范式,引入了四个关键设计原则:</p> <ol> <li><strong>多尺度并行特征提取</strong>:同时使用不同感受野的卷积核捕获不同尺度的图像特征</li> <li><strong>计算效率优化</strong>:通过1×1卷积降维,实现的&quot;瓶颈层&quot;,减少计算量</li> <li><strong>网络宽度与深度平衡</strong>:在增加网络表达能力的同时避免参数数量爆炸</li> <li><strong>特征融合机制</strong>:通过通道拼接整合多路径提取的互补特征</li> </ol> <h2 id="yolo系列中的inception思想">YOLO系列中的Inception思想</h2> <h3 id="yolov3初步融合多尺度思想">YOLOv3:初步融合多尺度思想</h3> <ul> <li><strong>特征金字塔结构</strong>:通过<em>上采样和跳跃连接</em>融合不同尺度特征,也就是多尺度特征表示</li> <li><strong>SPP (Spatial Pyramid Pooling) 模块</strong>:采用<em>并行池化</em>操作聚合不同感受野的特征信息</li> </ul> <h3 id="yolov4inception思想的系统性应用">YOLOv4:Inception思想的系统性应用</h3> <p>Alexey Bochkovskiy团队</p> <ul> <li><strong>CSPDarknet53 backbone</strong>:采用CSP(Cross Stage Partial)连接,创建了多路径信息流,增强了特征重用,也就是并联</li> <li><strong>PANet (Path Aggregation Network) Neck</strong>:双向特征传递机制允许不同层次特征的有效融合,实现特征整合,使得各种信息更好融合</li> <li><strong>SPPCSP模块</strong>:在保留*空间金字塔池化(SPP)*多尺度处理能力的同时,通过CSP连接进一步提升了计算效率,达到计算效率优化</li> </ul> <h3 id="yolov5inception思想的精细化实现">YOLOv5:Inception思想的精细化实现</h3> <p>Ultralytics</p> <ol> <li><strong>Focus模块</strong>: <ul> <li>不使用普通卷积,而是将图像像素进行重排</li> <li>把2×2区域的像素分离成4个通道,类似&quot;并行采样&quot;</li> <li>这种空间特征重组方式提高了信息密度,减少了计算量</li> <li>体现了Inception的&quot;不同方式并行处理输入&quot;思想</li> </ul> </li> <li><strong>C3模块</strong>: <ul> <li>改进版CSP结构,将输入分成两路</li> <li>一路直接连接,一路通过多个残差块处理</li> <li>这种&quot;双路径&quot;设计与Inception的并行分支思想相似</li> <li>同时也提高了特征提取能力和计算效率</li> </ul> </li> <li><strong>SPPF模块</strong>*: <ul> <li>将SPP模块优化为<em>序列形式</em>,减少计算开销</li> <li>通过连续最大池化+特征融合实现多尺度特征提取</li> <li>相比YOLOv4的SPPCSP,更加轻量高效</li> </ul> </li> </ol> <h3 id="yolov7">YOLOv7</h3> <p>WongKinYiu团队</p> introduction-to-yolo https://jamsylph.top/posts/introduction-to-yolo/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/introduction-to-yolo/ <h1 id="yolo的历程">yolo的历程</h1> <p>YOLO(You Only Look Once)是一种流行的实时目标检测算法,它以其高效的性能和较高的准确率而闻名。与传统的目标检测方法不同,YOLO将目标检测视为一个回归问题,直接从完整图像预测边界框和类别概率。</p> <h2 id="yolo的基本原理">YOLO的基本原理</h2> <p>YOLO的核心思想是将整个图像划分为S×S的网格,每个网格负责预测包含在其中的目标。具体来说,每个网格预测:</p> <ol> <li>B个边界框及其置信度</li> <li>C个类别的条件概率</li> </ol> <p>这种方法使YOLO能够在单次前向传播中完成目标检测,大大提高了处理速度。</p> <h2 id="yolo的发展历程">YOLO的发展历程</h2> <h3 id="yolov1">YOLOv1</h3> <p>2016年,Joseph Redmon等人提出了第一版YOLO。YOLOv1虽然速度快,但准确率较低,尤其是对小目标的检测效果不佳。</p> <h3 id="yolov2yolo9000">YOLOv2/YOLO9000</h3> <p>YOLOv2引入了批量归一化、锚框等改进,并提出了YOLO9000,能够检测超过9000种不同的目标类别。</p> <h3 id="yolov3">YOLOv3</h3> <p>YOLOv3使用了更复杂的骨干网络Darknet-53,并采用了多尺度预测,显著提高了对小目标的检测能力。</p> <h3 id="yolov4">YOLOv4</h3> <p>YOLOv4引入了多种先进技术,如CSPDarknet53骨干网络、PANet路径聚合网络等,进一步提升了性能。</p> <h3 id="yolov5">YOLOv5</h3> <p>YOLOv5由Ultralytics开发,提供了多种不同大小的模型(S、M、L、X),可以根据需求选择速度和准确率的平衡点。</p> <h3 id="yolov6yolov7及更新版本">YOLOv6、YOLOv7及更新版本</h3> <p>随着研究的深入,YOLO算法不断演进,推出了更高效、更准确的版本。</p> <h2 id="yolo的应用场景">YOLO的应用场景</h2> <p>由于其实时性和较高的准确率,YOLO在多个领域有广泛应用:</p> <ul> <li>自动驾驶:检测道路上的车辆、行人和交通标志</li> <li>安防监控:识别异常行为和可疑物体</li> <li>工业检测:检测产品缺陷</li> <li>医学影像:辅助医生诊断疾病</li> <li>零售分析:跟踪商店中的客户行为</li> </ul> <h2 id="实现yolo的工具和框架">实现YOLO的工具和框架</h2> <p>目前有多种工具和框架可以帮助开发者实现YOLO算法:</p> <ul> <li>Darknet:YOLO的原始实现</li> <li>PyTorch:提供了多种YOLO的实现版本</li> <li>TensorFlow:也有YOLO的移植版本</li> <li>ONNX:支持将YOLO模型转换为通用格式</li> <li>OpenCV:提供了使用预训练YOLO模型的接口</li> </ul> <h2 id="结论">结论</h2> <p>YOLO算法凭借其出色的速度和准确率平衡,已成为计算机视觉领域最受欢迎的目标检测算法之一。随着算法的不断改进和硬件的发展,YOLO的应用前景将更加广阔。</p> <p>在未来的文章中,我将深入探讨YOLO的具体实现、训练技巧以及如何针对特定应用进行优化。敬请期待!</p> YOLOv5的dataloaders.py代码精读 https://jamsylph.top/posts/yolov5%E7%9A%84dataloaders.py%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/yolov5%E7%9A%84dataloaders.py%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB/ <blockquote> <p>在 yolov 5 目标检测任务中,我跑 <a href="yolo5%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%BB%A3%E7%A0%81%E7%B2%BE%E8%AF%BB.md">train. Py 代码</a> ,在train_loader 中,那我就调用了create_dataloader 的函数,在该函数内部创建了LoadImagesAndLabels 类的实例作为 dataset,在这个 create_dataloader 函数最后返回一个  DataLoader和数据集:</p></blockquote> <h1 id="-dataloaders-py">-dataloaders. Py</h1> <h1 id="pytorch-数据集中的-__getitem__-方法工作原理">PyTorch 数据集中的 <code>__getitem__</code> 方法工作原理</h1> <p><code>__getitem__</code> 是 Python 中的一个特殊方法(魔术方法),在 YOLOv 5 的 <code>LoadImagesAndLabels</code> 类中用于访问数据集中的单个样本。当您使用数据加载器或直接通过索引访问数据集时,这个方法会被调用。</p> <h2 id="访问流程">访问流程</h2> <p>当执行以下操作时,<code>__getitem__</code> 方法被调用:</p> <ol> <li>直接从数据集访问:<code>image, label = dataset[5]</code></li> <li>通过 DataLoader 迭代:<code>for images, labels in dataloader: ...</code></li> </ol> <p>在 DataLoader 中,<code>__getitem__</code> 会被多次并行调用(由 <code>num_workers</code> 参数决定),然后结果通过 <code>collate_fn</code> 方法合并为批次。</p> <h2 id="yolov-5-中-__getitem__-的工作流程">YOLOv 5 中 <code>__getitem__</code> 的工作流程</h2> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">index</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 1. 将传入的索引转换为实际使用的索引(处理线性、打乱或加权采样)</span> </span></span><span class="line"><span class="cl"> <span class="n">index</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">indices</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 2. 检查是否应用mosaic增强(基于配置和随机概率)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">mosaic</span> <span class="o">:=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mosaic</span> <span class="ow">and</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;mosaic&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 加载mosaic增强的图像和标签</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">load_mosaic</span><span class="p">(</span><span class="n">index</span><span class="p">)</span> <span class="c1"># 这里是您想修改为load_mosaic9的地方</span> </span></span><span class="line"><span class="cl"> <span class="n">shapes</span> <span class="o">=</span> <span class="kc">None</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 检查是否进一步应用mixup增强</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;mixup&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">mixup</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">load_mosaic</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">indices</span><span class="p">)))</span> </span></span><span class="line"><span class="cl"> <span class="k">else</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 3. 不使用mosaic时的常规图像加载和处理流程</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">w0</span><span class="p">),</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">load_image</span><span class="p">(</span><span class="n">index</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># Letterbox处理</span> </span></span><span class="line"><span class="cl"> <span class="n">shape</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">batch_shapes</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">batch</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">rect</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">img_size</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">ratio</span><span class="p">,</span> <span class="n">pad</span> <span class="o">=</span> <span class="n">letterbox</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">shape</span><span class="p">,</span> <span class="n">auto</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">scaleup</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">shapes</span> <span class="o">=</span> <span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">w0</span><span class="p">),</span> <span class="p">((</span><span class="n">h</span> <span class="o">/</span> <span class="n">h0</span><span class="p">,</span> <span class="n">w</span> <span class="o">/</span> <span class="n">w0</span><span class="p">),</span> <span class="n">pad</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 处理标签</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">labels</span><span class="o">.</span><span class="n">size</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span> <span class="o">=</span> <span class="n">xywhn2xyxy</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:],</span> <span class="n">ratio</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">w</span><span class="p">,</span> <span class="n">ratio</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">h</span><span class="p">,</span> <span class="n">padw</span><span class="o">=</span><span class="n">pad</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">padh</span><span class="o">=</span><span class="n">pad</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 应用随机透视变换等增强</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">random_perspective</span><span class="p">(</span><span class="o">...</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 4. 标签格式转换</span> </span></span><span class="line"><span class="cl"> <span class="n">nl</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="n">xyxy2xywhn</span><span class="p">(</span><span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="mi">5</span><span class="p">],</span> <span class="n">w</span><span class="o">=</span><span class="n">img</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">h</span><span class="o">=</span><span class="n">img</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">clip</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">eps</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 5. 应用更多的增强技术(如果启用)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">augment</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="c1"># Albumentations增强</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">albumentations</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># HSV颜色空间增强</span> </span></span><span class="line"><span class="cl"> <span class="n">augment_hsv</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">hgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_h&#34;</span><span class="p">],</span> <span class="n">sgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_s&#34;</span><span class="p">],</span> <span class="n">vgain</span><span class="o">=</span><span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;hsv_v&#34;</span><span class="p">])</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 上下翻转</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;flipud&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">flipud</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 左右翻转</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">random</span><span class="o">.</span><span class="n">random</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">hyp</span><span class="p">[</span><span class="s2">&#34;fliplr&#34;</span><span class="p">]:</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">fliplr</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">labels</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 6. 准备输出格式</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_out</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">nl</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span> </span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">nl</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">labels_out</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 7. 图像格式转换:HWC-&gt;CHW, BGR-&gt;RGB</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">img</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))[::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> <span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ascontiguousarray</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># 8. 返回最终处理好的数据</span> </span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">torch</span><span class="o">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">img</span><span class="p">),</span> <span class="n">labels_out</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">im_files</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="n">shapes</span> </span></span></code></pre></div><h2 id="dataloader-如何使用-__getitem__">DataLoader 如何使用 <code>__getitem__</code></h2> <ol> <li>PyTorch 的 DataLoader 创建多个工作进程(worker)</li> <li>每个 worker 负责获取批次中的部分样本,调用数据集的 <code>__getitem__</code> 方法</li> <li>所有样本收集完后,通过 <code>collate_fn</code> 函数合并为一个批次</li> <li>最终批次传递给模型进行训练</li> </ol> <h2 id="修改-mosaic-增强">修改 mosaic 增强</h2> <p>要将 4 图像 mosaic 修改为 9 图像 mosaic,您只需修改第 478 行的 <code>self.load_mosaic(index)</code> 为 <code>self.load_mosaic9(index)</code>。这样当启用 mosaic 增强时,系统会使用 9 张图片而不是 4 张图片来创建马赛克增强效果。</p> YOLOv8解码流程完全解析 https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/ <h1 id="yolov8解码流程完全解析">YOLOv8解码流程完全解析</h1> <blockquote> <p>本文详细分析了YOLOv8目标检测算法中的预测解码和后处理机制,包括DFL(Distribution Focal Loss)解码、非极大值抑制(NMS)等关键环节。</p></blockquote> <h2 id="目录">目录</h2> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#1-%e9%a2%84%e6%b5%8b%e8%a7%a3%e7%a0%81%e6%b5%81%e7%a8%8b-decode_predictions">1. 预测解码流程 (decode_predictions)</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#11-%e7%bd%91%e6%a0%bc%e7%82%b9%e7%94%9f%e6%88%90">1.1 网格点生成</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#12-%e7%89%b9%e5%be%81%e5%9b%be%e5%a4%84%e7%90%86">1.2 特征图处理</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#13-dfl%e8%a7%a3%e7%a0%81%e5%ae%9e%e7%8e%b0">1.3 DFL解码实现</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#14-%e5%9d%90%e6%a0%87%e8%bd%ac%e6%8d%a2">1.4 坐标转换</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#2-%e5%90%8e%e5%a4%84%e7%90%86%e6%b5%81%e7%a8%8b-post_process">2. 后处理流程 (post_process)</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#21-%e7%bd%ae%e4%bf%a1%e5%ba%a6%e8%bf%87%e6%bb%a4">2.1 置信度过滤</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#22-%e5%9d%90%e6%a0%87%e6%a0%bc%e5%bc%8f%e8%bd%ac%e6%8d%a2">2.2 坐标格式转换</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#23-%e9%9d%9e%e6%9e%81%e5%a4%a7%e5%80%bc%e6%8a%91%e5%88%b6">2.3 非极大值抑制</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#24-%e5%9d%90%e6%a0%87%e7%bc%a9%e6%94%be%e8%bf%98%e5%8e%9f">2.4 坐标缩放还原</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#3-%e6%a0%b8%e5%bf%83%e6%96%b9%e6%b3%95%e8%a7%a3%e6%9e%90">3. 核心方法解析</a> <ul> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#31-dist2bbox%e6%96%b9%e6%b3%95">3.1 dist2bbox方法</a></li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#32-scale_boxes%e6%96%b9%e6%b3%95">3.2 scale_boxes方法</a></li> </ul> </li> <li><a href="https://jamsylph.top/posts/yolov8%E8%A7%A3%E7%A0%81%E6%B5%81%E7%A8%8B%E5%AE%8C%E5%85%A8%E8%A7%A3%E6%9E%90/#4-%e5%ae%8c%e6%95%b4%e4%bb%a3%e7%a0%81%e5%8f%82%e8%80%83">4. 完整代码参考</a></li> </ul> <h2 id="1-预测解码流程-decode_predictions">1. 预测解码流程 (decode_predictions)</h2> <p>YOLOv8采用anchor-free设计,预测解码过程将网络输出转换为标准的边界框格式。整个流程可分为以下几个关键步骤:</p> <h3 id="11-网格点生成">1.1 网格点生成</h3> <p>为每个特征图生成参考点坐标和对应的stride值:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 生成锚点和对应步长</span> </span></span><span class="line"><span class="cl"><span class="n">anchors</span><span class="p">,</span> <span class="n">strides</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">make_anchors</span><span class="p">(</span><span class="n">predictions</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">stride</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="c1"># anchors: 所有特征图的网格点坐标</span> </span></span><span class="line"><span class="cl"><span class="c1"># strides: 对应的stride值(8/16/32)</span> </span></span></code></pre></div><p>这一步完成了:</p> <ul> <li>为三个特征图(P3/P4/P5)生成网格点</li> <li>生成对应的stride值(P3:8, P4:16, P5:32)</li> </ul> <h3 id="12-特征图处理">1.2 特征图处理</h3> <p>将三个尺度的特征图预测结果统一处理,分离边界框和类别预测:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 将三个特征图的预测结果拼接</span> </span></span><span class="line"><span class="cl"><span class="n">x_cat</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">([</span><span class="n">xi</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nc</span> <span class="o">+</span> <span class="mi">16</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">xi</span> <span class="ow">in</span> <span class="n">predictions</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="mi">2</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="c1"># P3: (1, nc+64, 6400) # 80*80=6400</span> </span></span><span class="line"><span class="cl"><span class="c1"># P4: (1, nc+64, 1600) # 40*40=1600</span> </span></span><span class="line"><span class="cl"><span class="c1"># P5: (1, nc+64, 400) # 20*20=400</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># 分离边界框预测和类别预测</span> </span></span><span class="line"><span class="cl"><span class="n">box</span><span class="p">,</span> <span class="bp">cls</span> <span class="o">=</span> <span class="n">x_cat</span><span class="o">.</span><span class="n">split</span><span class="p">((</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nc</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="c1"># box: (1, 64, 8400) # 64=16*4,每个坐标用16个值编码</span> </span></span><span class="line"><span class="cl"><span class="c1"># cls: (1, nc, 8400) # nc是类别数</span> </span></span></code></pre></div><p>维度解包为DFL解码做准备:</p> 从MambaStock看Mamba https://jamsylph.top/posts/mambastock%E8%AE%BA%E6%96%87%E5%A4%8D%E7%8E%B0%E5%8F%8A%E6%80%9D%E8%80%83/ Mon, 01 Jan 0001 00:00:00 +0000 https://jamsylph.top/posts/mambastock%E8%AE%BA%E6%96%87%E5%A4%8D%E7%8E%B0%E5%8F%8A%E6%80%9D%E8%80%83/ <blockquote> <h2 id="-更新记录">📝 更新记录</h2> <ul> <li>2024-06-16: <ul> <li>总结实验结果和未来改进方向</li> <li>添加与其他模型的全面比较分析</li> <li>完善技术洞察和最终结论</li> </ul> </li> <li>2024-06-15: <ul> <li>创建MambaStock论文复现文档</li> <li>详细分析Mamba模型架构</li> <li>实现MambaStock改进设计</li> </ul> </li> </ul></blockquote> <h1 id="从mambastock看mamba">从MambaStock看Mamba</h1> <h2 id="1-mamba模型架构回顾">1. Mamba模型架构回顾</h2> <p>根据论文,MambaStock结构是根据Mamba模型的改进,那先看Mamba模型的主要架构(mamba.py文件)</p> <h3 id="11-模型整体架构设计">1.1 模型整体架构设计</h3> <p>Mamba采用层次化设计,由外向内可分为三层核心结构: Mamba → ResidualBlock → MambaBlock</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">Mamba</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">ResidualBlock</span><span class="p">(</span><span class="n">config</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)])</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">norm_f</span> <span class="o">=</span> <span class="n">RMSNorm</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">d_model</span><span class="p">)</span> </span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">ResidualBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">mixer</span> <span class="o">=</span> <span class="n">MambaBlock</span><span class="p">(</span><span class="n">config</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">RMSNorm</span><span class="p">(</span><span class="n">config</span><span class="o">.</span><span class="n">d_model</span><span class="p">)</span> </span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MambaBlock</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">config</span><span class="p">:</span> <span class="n">MambaConfig</span><span class="p">):</span> </span></span><span class="line"><span class="cl"> <span class="c1"># 各种layer的定义...</span> </span></span></code></pre></div><ul> <li>最外层结构是——Mamba类,创建了n_layer个ResidualBlock的list</li> <li>中间层——其次每个ResidualBlock内部都有一个Mambablock作为核心的计算单元</li> <li>最后在最内层——MambaBlock囊括了所有实际的计算逻辑,投影层+卷积层+SSM计算</li> </ul> <p>这里的三层结构设计刚好就体现了现代深度学习架构的关键构思:<strong>模块化</strong>、<strong>残差连接</strong>以及<strong>层标准化</strong></p> <h3 id="12-mambaconfig参数化配置的精髓">1.2 MambaConfig:参数化配置的精髓</h3> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@dataclass</span> </span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">MambaConfig</span><span class="p">:</span> </span></span><span class="line"><span class="cl"> <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span> <span class="c1"># 模型维度 D</span> </span></span><span class="line"><span class="cl"> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span> <span class="c1"># 层数</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_rank</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">int</span><span class="p">,</span> <span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;auto&#39;</span> <span class="c1"># Δ投影的秩</span> </span></span><span class="line"><span class="cl"> <span class="n">d_state</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">16</span> <span class="c1"># 状态空间维度 N</span> </span></span><span class="line"><span class="cl"> <span class="n">expand_factor</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># 扩展因子 E</span> </span></span><span class="line"><span class="cl"> <span class="n">d_conv</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1"># 卷积核大小</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"> <span class="c1"># Δ参数初始化相关配置</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_min</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.001</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_max</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_init</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&#34;random&#34;</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_scale</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">1.0</span> </span></span><span class="line"><span class="cl"> <span class="n">dt_init_floor</span> <span class="o">=</span> <span class="mf">1e-4</span> </span></span></code></pre></div><p>精细的参数化配置使得模型具有高度可调整性,对比Transformer的配置来看,Mamba引入了多个特有参数:</p>