Pytorch构建神经网络

发表于 2020-01-27 更新于 2022-02-13 分类于 pytorch 阅读次数： Disqus：

神经网络的训练过程

定义一个包含可训练参数的神经网络
迭代整个输入
通过神经网络处理输入
计算损失(loss)
反向传播梯度到神经网络的参数
更新网络的参数，典型的用一个简单的更新方法：weight = weight - learning_rate *gradient

定义神经网络

用下面代码可以构建一个典型的神经网络

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Conv2d

方法原型 torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros') 参数说明 - in_channels:输入图片的通道数 - out_channels:卷积操作后输出图片的通道数 - kernel_size:卷积核的尺寸 - stride:卷积的步长，默认为1 - padding:卷积的padding，默认为0 kernel_size, stride, padding取值可以为a single int或者a tuple of two ints，a single int表示宽和高相同。a tuple of two ints表示第1个为宽的大小，第2个为高的大小。计算输出图片的尺寸的方法：（dilation的值默认为1）

m = nn.Conv2d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 50, 100)
print(m(input).size())
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
print(m(input).size())

Linear

方法原型 torch.nn.Linear(in_features, out_features, bias=True) $y = xW^T + b$ 对传入的数据进行线性变换。 参数说明 in_features: 每个输入样例的尺寸 out_features:每个输出样例的尺寸 bias:如果设置为False，则不会学习额外的bias参数

m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())

max_pool2d

最大池化 方法原型 torch.nn.functional.max_pool3d(*args, ** kwargs) 细节参考MaxPool2d torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False) 参数说明 略

m = nn.MaxPool2d(3, stride=2)
input = torch.randn(20, 16, 50, 32)
output = m(input)
print(output.size())

relu

方法原型 torch.nn.functional.relu(input, inplace=False) → Tensor relu激活函数细节参照ReLU $ReLU(x)=max(0,x)$

net.parameters()

一个模型可训练的参数可以通过调用 net.parameters() 返回

1
2
3

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

计算损失值

一个损失函数需要一对输入：模型输出和目标，nn包中提供了一些常见的损失函数。其中nn.MSELoss(均方差)是最简单的损失函数。

# 计算损失值
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

反向传播

# 反向传播前后conv1的偏置项的变化
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

更新神经网络参数

# 最简单的更新规则：随机梯度下降
# weight = weight - learning_rate * gradient

# 使用python来实现这个规则
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

# 使用内置包
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update 实现参数更新，一般放在反向传播后面