一、需求描述
实战四分为三部分来实现,第二部分是基于PyTorch的猫狗图像可视化训练的教程,实现了一个完整的猫狗分类模型训练流程,使用预训练的ResNet50模型进行迁移学习,并通过SwanLab进行实验跟踪。
效果图
二、实现思路
总体思路
- 导入和初始化配置:设置训练超参数(学习率、批次大小、训练轮数等);
- 加载数据集:读取自定义数据集,并设置数据加载器;
- 模型构建:加载预训练的ResNet50模型,并修改全连接层适配二分类任务;
- 训练配置:定义交叉熵损失函数,设置Adam优化器;
- 模型训练:循环遍历训练轮次,在每轮次遍历每个批次的数据,并实时打印训练进度及记录损失值到SwanLab。
2.1 导入和初始化配置
import swanlab
num_epochs=20
lr=1e-4
batch_size=8
num_classes=2
device="cuda"swanlab.init(experiment_name="模型训练实验",description="猫狗分类",mode="local",config={"model":"resnet50","optim":"Adam","lr":lr,"batch_size":batch_size,"num_epochs":num_epochs,"num_class":num_classes,"device":device,}
)
import swanlab- 导入SwanLab库,用于实验跟踪和可视化num_epochs=20- 设置训练轮数为20轮lr=1e-4- 设置学习率为0.0001batch_size=8- 设置批次大小为8num_classes=2- 设置分类类别数为2(猫和狗)device="cuda"- 设置使用GPU进行训练swanlab.init()- 初始化SwanLab实验,记录实验配置参数
2.2 加载数据集
import readDataset
from torch.utils.data import DataLoader
train_dataset=readDataset.DatasetLoader(readDataset.ds_train)
train_loader=(DataLoader(train_dataset,batch_size=batch_size,shuffle=True))
import readDataset- 导入自定义的数据集读取模块from torch.utils.data import DataLoader- 导入PyTorch的数据加载器train_dataset=readDataset.DatasetLoader(readDataset.ds_train)- 创建训练数据集对象train_loader=(DataLoader(train_dataset,batch_size=batch_size,shuffle=True))- 创建数据加载器,设置批次大小并启用随机打乱
2.3 模型构建
import torch
import torchvision
from torchvision.models import ResNet50_Weightsmodel=torchvision.models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
in_features=model.fc.in_features
model.fc=torch.nn.Linear(in_features,num_classes)
model.to(device)
import torch- 导入PyTorch深度学习框架import torchvision- 导入计算机视觉库from torchvision.models import ResNet50_Weights- 导入ResNet50预训练权重model=torchvision.models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)- 加载预训练的ResNet50模型in_features=model.fc.in_features- 获取全连接层的输入特征数model.fc=torch.nn.Linear(in_features,num_classes)- 替换最后的全连接层,输出类别数为2model.to(device)- 将模型移动到GPU设备
2.4 训练配置
criterion=torch.nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(model.parameters(),lr=lr)
criterion=torch.nn.CrossEntropyLoss()- 定义交叉熵损失函数,适用于多分类问题optimizer=torch.optim.Adam(model.parameters(),lr=lr)- 定义Adam优化器,设置学习率
2.5 模型训练
for epoch in range(num_epochs):model.train()for iter,(inputs,labels) in enumerate(train_loader):inputs,labels=inputs.to(device),labels.to(device)optimizer.zero_grad()outputs=model(inputs)loss=criterion(outputs,labels)loss.backward()optimizer.step()print('Epoch[{}/{}],Iteration[{}/{}],Loss:{:.4f}'.format(epoch+1,num_epochs,iter+1,len(train_loader),loss.item()))swanlab.log({"train_loss":loss.item()})
for epoch in range(num_epochs):- 外层循环,遍历每个训练轮次model.train()- 设置模型为训练模式for iter,(inputs,labels) in enumerate(train_loader):- 内层循环,遍历每个批次的数据inputs,labels=inputs.to(device),labels.to(device)- 将输入数据和标签移动到GPUoptimizer.zero_grad()- 清空梯度outputs=model(inputs)- 前向传播,获取模型预测结果loss=criterion(outputs,labels)- 计算损失loss.backward()- 反向传播,计算梯度optimizer.step()- 更新模型参数print(...)- 打印训练进度和损失值swanlab.log({"train_loss":loss.item()})- 记录损失值到SwanLab实验跟踪系统
三、完整代码
import swanlab
num_epochs=20
lr=1e-4
batch_size=8
num_classes=2
device="cuda"swanlab.init(experiment_name="模型训练实验",description="猫狗分类",mode="local",config={"model":"resnet50","optim":"Adam","lr":lr,"batch_size":batch_size,"num_epochs":num_epochs,"num_class":num_classes,"device":device,}
)import readDataset
from torch.utils.data import DataLoader
train_dataset=readDataset.DatasetLoader(readDataset.ds_train)
train_loader=(DataLoader(train_dataset,batch_size=batch_size,shuffle=True))import torch
import torchvision
from torchvision.models import ResNet50_Weightsmodel=torchvision.models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
in_features=model.fc.in_features
model.fc=torch.nn.Linear(in_features,num_classes)
model.to(device)
criterion=torch.nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(model.parameters(),lr=lr)for epoch in range(num_epochs):model.train()for iter,(inputs,labels) in enumerate(train_loader):inputs,labels=inputs.to(device),labels.to(device)optimizer.zero_grad()outputs=model(inputs)loss=criterion(outputs,labels)loss.backward()optimizer.step()print('Epoch[{}/{}],Iteration[{}/{}],Loss:{:.4f}'.format(epoch+1,num_epochs,iter+1,len(train_loader),loss.item()))swanlab.log({"train_loss":loss.item()})
四、效果展示
- PyCharm运行日志

- PyCharm终端日志

- SwanLab工作区

- 模拟训练实验的概览

- 模拟训练实验的实验图表

- 模拟训练实验的日志

- 模拟训练实验的实验环境

五、问题与解决
问题一:ModuleNotFoundError: No module named ‘XXX’
解决一:pip install XXX
pip install 'swanlab[dashboard]'