第 6 章:进阶话题
过拟合vs欠拟合:模型复杂度和泛化能力的关系
在前面的章节中,我们已经学习了神经网络的基础知识、常见架构和基本训练流程。然而,在实际的深度学习项目中,仅仅掌握这些基础知识是不够的。我们还需要了解一些进阶话题,这些话题往往决定了模型能否在实际应用中取得成功。
本章将探讨三个重要的进阶话题:
- 过拟合与正则化:如何防止模型过度记忆训练数据
- 不同的优化器:如何选择合适的优化算法
- 超参数调优:如何系统地寻找最佳的超参数组合
这些话题虽然听起来很技术性,但它们都是实际项目中必须面对的挑战。掌握这些知识,将让你的深度学习项目更加稳健和高效。
6.1 过拟合与正则化:防止模型"死记硬背"
什么是过拟合?

过拟合示例:模型在训练数据上完美拟合,但在新数据上表现很差
过拟合(Overfitting)是机器学习中最常见的问题之一。简单来说,过拟合就是模型在训练数据上表现很好,但在新数据上表现很差的现象。
想象一下,如果一个学生只是死记硬背了所有的练习题答案,但没有真正理解知识点,那么当遇到新的题目时,他就会束手无策。这就是过拟合的典型表现。
过拟合的识别
让我们通过一个简单的例子来观察过拟合现象:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data import DataLoader, TensorDataset# 生成一些简单的数据
np.random.seed(42)
x = np.linspace(0, 10, 100)
y_true = 2 * x + 1 + np.random.normal(0, 0.5, 100) # 真实的线性关系加噪声# 转换为 PyTorch 张量
x_tensor = torch.FloatTensor(x).reshape(-1, 1)
y_tensor = torch.FloatTensor(y_true).reshape(-1, 1)# 创建数据集
dataset = TensorDataset(x_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)# 定义一个可能过拟合的复杂模型
class OverfittingModel(nn.Module):def __init__(self, hidden_size=100, num_layers=5):super(OverfittingModel, self).__init__()layers = []input_size = 1for i in range(num_layers):layers.append(nn.Linear(input_size, hidden_size))layers.append(nn.ReLU())input_size = hidden_sizelayers.append(nn.Linear(hidden_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 训练模型
model = OverfittingModel()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)# 记录训练损失
train_losses = []# 训练
epochs = 1000
for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)train_losses.append(avg_loss)if epoch % 100 == 0:print(f'Epoch {epoch}, Loss: {avg_loss:.4f}')# 可视化结果
plt.figure(figsize=(12, 4))# 训练损失曲线
plt.subplot(1, 2, 1)
plt.plot(train_losses)
plt.title('训练损失')
plt.xlabel('Epoch')
plt.ylabel('Loss')# 模型预测结果
plt.subplot(1, 2, 2)
model.eval()
with torch.no_grad():predictions = model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真实数据')
plt.plot(x, predictions.numpy(), 'r-', linewidth=2, label='模型预测')
plt.title('模型拟合结果')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()plt.tight_layout()
plt.show()
从这个例子中,我们可以看到:
- 训练损失持续下降,最终接近零
- 模型预测的曲线非常复杂,试图拟合每一个数据点
- 这种复杂的拟合很可能在新数据上表现不佳
正则化技术
为了防止过拟合,我们需要使用正则化技术。以下是几种最常用的方法:
1. Dropout

Dropout机制:在训练时随机"关闭"一部分神经元,防止过拟合
Dropout 是最简单有效的正则化方法之一。它的思想是在训练时随机"关闭"一部分神经元,迫使网络不能过度依赖某些特定的神经元。
class RegularizedModel(nn.Module):def __init__(self, hidden_size=100, num_layers=5, dropout_rate=0.5):super(RegularizedModel, self).__init__()layers = []input_size = 1for i in range(num_layers):layers.append(nn.Linear(input_size, hidden_size))layers.append(nn.ReLU())layers.append(nn.Dropout(dropout_rate)) # 添加 Dropoutinput_size = hidden_sizelayers.append(nn.Linear(hidden_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 使用正则化的模型
regularized_model = RegularizedModel(dropout_rate=0.3)
criterion = nn.MSELoss()
optimizer = optim.Adam(regularized_model.parameters(), lr=0.01)# 训练正则化模型
reg_train_losses = []
epochs = 1000for epoch in range(epochs):regularized_model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = regularized_model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)reg_train_losses.append(avg_loss)if epoch % 100 == 0:print(f'Epoch {epoch}, Loss: {avg_loss:.4f}')# 比较两个模型
plt.figure(figsize=(15, 5))# 训练损失对比
plt.subplot(1, 3, 1)
plt.plot(train_losses, label='无正则化')
plt.plot(reg_train_losses, label='有 Dropout')
plt.title('训练损失对比')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()# 无正则化模型预测
plt.subplot(1, 3, 2)
model.eval()
with torch.no_grad():predictions = model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真实数据')
plt.plot(x, predictions.numpy(), 'r-', linewidth=2, label='无正则化')
plt.title('无正则化模型')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()# 正则化模型预测
plt.subplot(1, 3, 3)
regularized_model.eval()
with torch.no_grad():reg_predictions = regularized_model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真实数据')
plt.plot(x, reg_predictions.numpy(), 'g-', linewidth=2, label='有 Dropout')
plt.title('正则化模型')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()plt.tight_layout()
plt.show()
2. L1 和 L2 正则化
L1 和 L2 正则化通过在损失函数中添加权重惩罚项来防止过拟合:
- L2 正则化(权重衰减):添加权重的平方和到损失函数
- L1 正则化:添加权重的绝对值之和到损失函数
# L2 正则化(权重衰减)
optimizer_l2 = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.01)# L1 正则化需要手动实现
def l1_regularization(model, lambda_l1=0.01):l1_loss = 0for param in model.parameters():l1_loss += torch.sum(torch.abs(param))return lambda_l1 * l1_loss# 在训练循环中使用 L1 正则化
for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)# 添加 L1 正则化l1_loss = l1_regularization(model, lambda_l1=0.01)total_loss_with_l1 = loss + l1_losstotal_loss_with_l1.backward()optimizer.step()total_loss += loss.item() # 只记录原始损失用于显示
3. 早停(Early Stopping)
早停是一种简单但有效的正则化方法。它的思想是在验证集性能开始下降时停止训练:
def train_with_early_stopping(model, train_loader, val_loader, patience=10):criterion = nn.MSELoss()optimizer = optim.Adam(model.parameters(), lr=0.01)best_val_loss = float('inf')patience_counter = 0train_losses = []val_losses = []for epoch in range(1000):# 训练阶段model.train()train_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()train_loss += loss.item()# 验证阶段model.eval()val_loss = 0with torch.no_grad():for batch_x, batch_y in val_loader:outputs = model(batch_x)loss = criterion(outputs, batch_y)val_loss += loss.item()avg_train_loss = train_loss / len(train_loader)avg_val_loss = val_loss / len(val_loader)train_losses.append(avg_train_loss)val_losses.append(avg_val_loss)# 早停检查if avg_val_loss < best_val_loss:best_val_loss = avg_val_losspatience_counter = 0# 保存最佳模型torch.save(model.state_dict(), 'best_model.pth')else:patience_counter += 1if patience_counter >= patience:print(f'Early stopping at epoch {epoch}')breakif epoch % 50 == 0:print(f'Epoch {epoch}: Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}')return train_losses, val_losses
数据增强
对于图像数据,数据增强是一种非常有效的正则化方法:
from torchvision import transforms# 图像数据增强
transform_train = transforms.Compose([transforms.RandomHorizontalFlip(p=0.5), # 随机水平翻转transforms.RandomRotation(10), # 随机旋转transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)), # 随机平移transforms.ColorJitter(brightness=0.2, contrast=0.2), # 颜色抖动transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))
])# 测试时只做基本变换
transform_test = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))
])
6.2 不同的优化器:选择合适的"登山路径"

梯度下降:沿着损失函数的"山坡"向下走,寻找最小值
在梯度下降中,优化器决定了我们如何沿着损失函数的"山坡"向下走。不同的优化器有不同的"走路方式",适用于不同的场景。
随机梯度下降(SGD)

带动量的SGD:在梯度方向上累积动量,减少震荡
SGD 是最基础的优化器,它直接使用梯度来更新参数:
# 基础 SGD
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# SGD 的变体:带动量的 SGD
optimizer_sgd_momentum = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# SGD 的变体:带动量和权重衰减的 SGD
optimizer_sgd_full = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001
)
SGD 的特点:
- 简单直接,易于理解
- 在某些情况下可能收敛较慢
- 需要仔细调整学习率
- 带动量的 SGD 通常比基础 SGD 表现更好
Adam 优化器
Adam优化器:结合动量和自适应学习率的优势
Adam 是目前最受欢迎的优化器之一,它结合了动量和自适应学习率的优点:
# 基础 Adam
optimizer_adam = optim.Adam(model.parameters(), lr=0.001)# 带权重衰减的 Adam
optimizer_adam_wd = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001
)# AdamW(改进的 Adam,更好的权重衰减)
optimizer_adamw = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01
)
Adam 的特点:
- 自适应学习率,通常不需要手动调整
- 收敛速度快
- 对超参数相对不敏感
- 在大多数情况下表现良好
RMSprop
RMSprop 是另一种自适应学习率优化器:
optimizer_rmsprop = optim.RMSprop(model.parameters(), lr=0.001, alpha=0.99, # 移动平均的衰减率eps=1e-08 # 数值稳定性常数
)
优化器比较实验
让我们通过一个实验来比较不同优化器的性能:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset
import numpy as np# 生成一些测试数据
np.random.seed(42)
x = np.random.randn(1000, 10)
y = np.sum(x * np.random.randn(10), axis=1) + np.random.normal(0, 0.1, 1000)x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)dataset = TensorDataset(x_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)# 定义一个简单的模型
class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc1 = nn.Linear(10, 50)self.fc2 = nn.Linear(50, 1)def forward(self, x):x = torch.relu(self.fc1(x))x = self.fc2(x)return x# 定义不同的优化器
optimizers = {'SGD': optim.SGD,'SGD with Momentum': lambda params: optim.SGD(params, momentum=0.9),'Adam': optim.Adam,'RMSprop': optim.RMSprop,'AdamW': optim.AdamW
}# 训练函数
def train_model(optimizer_class, model, dataloader, epochs=100):if 'Momentum' in optimizer_class.__name__:optimizer = optimizer_class(model.parameters(), lr=0.01, momentum=0.9)else:optimizer = optimizer_class(model.parameters(), lr=0.001)criterion = nn.MSELoss()losses = []for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)losses.append(avg_loss)return losses# 比较不同优化器
results = {}
for name, optimizer_class in optimizers.items():print(f"训练 {name}...")model = SimpleModel()losses = train_model(optimizer_class, model, dataloader)results[name] = losses# 可视化结果
plt.figure(figsize=(12, 8))for name, losses in results.items():plt.plot(losses, label=name, linewidth=2)plt.title('不同优化器的收敛速度比较')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log') # 使用对数坐标更好地显示差异
plt.show()# 打印最终损失
print("\n最终损失值:")
for name, losses in results.items():print(f"{name}: {losses[-1]:.6f}")
如何选择优化器?
- Adam:大多数情况下的默认选择,特别是对于深度学习任务
- SGD + Momentum:当需要更精确的控制时,或者在某些特定任务上表现更好
- AdamW:当使用权重衰减时,通常比 Adam 更好
- RMSprop:在某些特定场景下可能表现更好
6.3 超参数调优:寻找最佳配置
超参数搜索空间:在多个维度上寻找最优配置
超参数是那些不能通过训练自动学习的参数,如学习率、网络层数、神经元数量等。选择合适的超参数对模型性能至关重要。
常见的超参数
- 学习率(Learning Rate):最重要的超参数之一
- 批次大小(Batch Size):影响训练稳定性和内存使用
- 网络架构:层数、每层神经元数量
- 正则化参数:Dropout 率、权重衰减系数
- 优化器参数:动量、β1、β2 等
网格搜索(Grid Search)
网格搜索是最简单但计算成本最高的方法:
import itertoolsdef grid_search_hyperparameters():# 定义超参数网格param_grid = {'learning_rate': [0.001, 0.01, 0.1],'hidden_size': [50, 100, 200],'dropout_rate': [0.1, 0.3, 0.5],'batch_size': [16, 32, 64]}# 生成所有组合param_combinations = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]best_score = float('inf')best_params = Noneresults = []for params in param_combinations:print(f"测试参数: {params}")# 创建模型和训练model = SimpleModel(hidden_size=params['hidden_size'])train_loader = DataLoader(dataset, batch_size=params['batch_size'], shuffle=True)optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])criterion = nn.MSELoss()# 简单训练(为了演示,只训练几个 epoch)for epoch in range(10):model.train()total_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()final_loss = total_loss / len(train_loader)results.append((params, final_loss))if final_loss < best_score:best_score = final_lossbest_params = paramsreturn best_params, best_score, results# 运行网格搜索
best_params, best_score, all_results = grid_search_hyperparameters()
print(f"\n最佳参数: {best_params}")
print(f"最佳分数: {best_score:.6f}")
随机搜索(Random Search)
随机搜索通常比网格搜索更高效:
import randomdef random_search_hyperparameters(n_trials=20):best_score = float('inf')best_params = Noneresults = []for trial in range(n_trials):# 随机采样超参数params = {'learning_rate': random.choice([0.001, 0.01, 0.1]),'hidden_size': random.choice([50, 100, 200, 300]),'dropout_rate': random.uniform(0.1, 0.6),'batch_size': random.choice([16, 32, 64, 128])}print(f"试验 {trial + 1}: {params}")# 训练和评估(简化版)model = SimpleModel(hidden_size=int(params['hidden_size']))train_loader = DataLoader(dataset, batch_size=int(params['batch_size']), shuffle=True)optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])criterion = nn.MSELoss()# 训练for epoch in range(10):model.train()total_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()final_loss = total_loss / len(train_loader)results.append((params, final_loss))if final_loss < best_score:best_score = final_lossbest_params = paramsreturn best_params, best_score, results
学习率调度
学习率调度是另一种重要的超参数调优技术:
# 学习率调度器
from torch.optim.lr_scheduler import StepLR, ExponentialLR, ReduceLROnPlateau# StepLR:每隔一定步数降低学习率
scheduler_step = StepLR(optimizer, step_size=30, gamma=0.1)# ExponentialLR:指数衰减学习率
scheduler_exp = ExponentialLR(optimizer, gamma=0.95)# ReduceLROnPlateau:当验证损失停止改善时降低学习率
scheduler_plateau = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10)# 在训练循环中使用
for epoch in range(epochs):# 训练代码...# 更新学习率scheduler_step.step() # 或者 scheduler_exp.step()# 对于 ReduceLROnPlateau,需要传入验证损失# scheduler_plateau.step(val_loss)
超参数调优的最佳实践
- 从小范围开始:先在小范围内搜索,找到大致方向后再扩大搜索范围
- 使用验证集:确保超参数在未见过的数据上也能表现良好
- 记录所有实验:记录每次实验的参数和结果,避免重复实验
- 考虑计算成本:在计算资源有限时,优先调优最重要的超参数
- 使用交叉验证:对于小数据集,使用交叉验证来获得更可靠的评估
6.4 实践:完整的模型调优流程
让我们通过一个完整的例子来展示如何应用这些进阶技术:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
import matplotlib.pyplot as plt
import numpy as np# 生成更复杂的数据
np.random.seed(42)
x = np.random.randn(2000, 20)
y = np.sum(x * np.random.randn(20), axis=1) + np.random.normal(0, 0.1, 2000)x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)# 分割数据集
dataset = TensorDataset(x_tensor, y_tensor)
train_size = int(0.7 * len(dataset))
val_size = int(0.15 * len(dataset))
test_size = len(dataset) - train_size - val_sizetrain_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size]
)# 定义完整的模型类
class AdvancedModel(nn.Module):def __init__(self, input_size=20, hidden_sizes=[100, 50], dropout_rate=0.3):super(AdvancedModel, self).__init__()layers = []prev_size = input_sizefor hidden_size in hidden_sizes:layers.append(nn.Linear(prev_size, hidden_size))layers.append(nn.ReLU())layers.append(nn.Dropout(dropout_rate))prev_size = hidden_sizelayers.append(nn.Linear(prev_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 训练函数
def train_model_with_regularization(model, train_loader, val_loader, optimizer, scheduler, epochs=100):criterion = nn.MSELoss()train_losses = []val_losses = []best_val_loss = float('inf')patience = 15patience_counter = 0for epoch in range(epochs):# 训练阶段model.train()train_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()train_loss += loss.item()# 验证阶段model.eval()val_loss = 0with torch.no_grad():for batch_x, batch_y in val_loader:outputs = model(batch_x)loss = criterion(outputs, batch_y)val_loss += loss.item()avg_train_loss = train_loss / len(train_loader)avg_val_loss = val_loss / len(val_loader)train_losses.append(avg_train_loss)val_losses.append(avg_val_loss)# 学习率调度if isinstance(scheduler, optim.lr_scheduler.ReduceLROnPlateau):scheduler.step(avg_val_loss)else:scheduler.step()# 早停if avg_val_loss < best_val_loss:best_val_loss = avg_val_losspatience_counter = 0torch.save(model.state_dict(), 'best_model.pth')else:patience_counter += 1if patience_counter >= patience:print(f'Early stopping at epoch {epoch}')breakif epoch % 20 == 0:print(f'Epoch {epoch}: Train Loss: {avg_train_loss:.6f}, 'f'Val Loss: {avg_val_loss:.6f}, LR: {optimizer.param_groups[0]["lr"]:.6f}')return train_losses, val_losses# 超参数优化
def optimize_hyperparameters():best_score = float('inf')best_params = None# 定义搜索空间param_combinations = [{'lr': 0.001, 'hidden_sizes': [100, 50], 'dropout_rate': 0.3, 'batch_size': 32},{'lr': 0.01, 'hidden_sizes': [200, 100], 'dropout_rate': 0.2, 'batch_size': 64},{'lr': 0.001, 'hidden_sizes': [150, 75], 'dropout_rate': 0.4, 'batch_size': 32},{'lr': 0.005, 'hidden_sizes': [100, 100, 50], 'dropout_rate': 0.3, 'batch_size': 32},]for params in param_combinations:print(f"\n测试参数: {params}")# 创建数据加载器train_loader = DataLoader(train_dataset, batch_size=params['batch_size'], shuffle=True)val_loader = DataLoader(val_dataset, batch_size=params['batch_size'])# 创建模型model = AdvancedModel(hidden_sizes=params['hidden_sizes'],dropout_rate=params['dropout_rate'])# 创建优化器和调度器optimizer = optim.Adam(model.parameters(), lr=params['lr'])scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10, verbose=True)# 训练模型train_losses, val_losses = train_model_with_regularization(model, train_loader, val_loader, optimizer, scheduler, epochs=50)# 评估final_val_loss = val_losses[-1]if final_val_loss < best_score:best_score = final_val_lossbest_params = paramsreturn best_params, best_score# 运行优化
print("开始超参数优化...")
best_params, best_score = optimize_hyperparameters()
print(f"\n最佳参数: {best_params}")
print(f"最佳验证损失: {best_score:.6f}")# 使用最佳参数训练最终模型
print("\n使用最佳参数训练最终模型...")
train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=best_params['batch_size'])
test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])final_model = AdvancedModel(hidden_sizes=best_params['hidden_sizes'],dropout_rate=best_params['dropout_rate']
)optimizer = optim.Adam(final_model.parameters(), lr=best_params['lr'])
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10
)train_losses, val_losses = train_model_with_regularization(final_model, train_loader, val_loader, optimizer, scheduler, epochs=100
)# 最终评估
final_model.eval()
test_loss = 0
criterion = nn.MSELoss()with torch.no_grad():for batch_x, batch_y in test_loader:outputs = final_model(batch_x)loss = criterion(outputs, batch_y)test_loss += loss.item()test_loss /= len(test_loader)
print(f"\n最终测试损失: {test_loss:.6f}")# 可视化训练过程
plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)
plt.plot(train_losses, label='训练损失')
plt.plot(val_losses, label='验证损失')
plt.title('训练和验证损失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)plt.subplot(1, 2, 2)
plt.plot(train_losses, label='训练损失', alpha=0.7)
plt.plot(val_losses, label='验证损失', alpha=0.7)
plt.title('训练和验证损失(对数坐标)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log')plt.tight_layout()
plt.show()
总结
在本章中,我们深入探讨了深度学习的三个重要进阶话题:
-
过拟合与正则化:
- 理解了过拟合的概念和危害
- 学习了多种正则化技术:Dropout、L1/L2 正则化、早停、数据增强
- 掌握了如何识别和防止过拟合
-
不同的优化器:
- 比较了 SGD、Adam、RMSprop 等主要优化器
- 了解了每种优化器的特点和适用场景
- 学会了如何选择合适的优化器
-
超参数调优:
- 学习了网格搜索、随机搜索、贝叶斯优化等方法
- 掌握了学习率调度技术
- 了解了超参数调优的最佳实践
这些进阶话题是实际深度学习项目中不可或缺的知识。掌握这些技术,将让你的模型更加稳健、高效,并能够在实际应用中取得更好的效果。
损失: {test_loss:.6f}")
可视化训练过程
plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)
plt.plot(train_losses, label='训练损失')
plt.plot(val_losses, label='验证损失')
plt.title('训练和验证损失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)plt.subplot(1, 2, 2)
plt.plot(train_losses, label='训练损失', alpha=0.7)
plt.plot(val_losses, label='验证损失', alpha=0.7)
plt.title('训练和验证损失(对数坐标)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log')plt.tight_layout()
plt.show()
总结
在本章中,我们深入探讨了深度学习的三个重要进阶话题:
-
过拟合与正则化:
- 理解了过拟合的概念和危害
- 学习了多种正则化技术:Dropout、L1/L2 正则化、早停、数据增强
- 掌握了如何识别和防止过拟合
-
不同的优化器:
- 比较了 SGD、Adam、RMSprop 等主要优化器
- 了解了每种优化器的特点和适用场景
- 学会了如何选择合适的优化器
-
超参数调优:
- 学习了网格搜索、随机搜索、贝叶斯优化等方法
- 掌握了学习率调度技术
- 了解了超参数调优的最佳实践
这些话题是实际深度学习项目中不可或缺的知识。掌握这些技术,将让你的模型更加稳健、高效,并能够在实际应用中取得更好的效果。