计算机视觉实验 - 卷积神经网络 PyTorch试水

edwin99
edwin99
2024-02-04 01:23
31 阅读
0 评论
文章封面
目录
正在加载目录...

卷积神经网络:找图像中特定物体

 

准备工作:

import torch

import torch.nn as nn

import torchvision

import matplotlib.pyplot as plt

from torchinfo import summary

import numpy as np

 

from pytorchcv import load_mnist, train, plot_results, plot_convolution, display_datasetload_mnist(batch_size=128)

卷积过滤器:在每个像素移动的小窗口,计算邻接像素的平均W,提取物体pattern;由权系数矩阵定义;

 

MNIST数字用到的两种过滤器(水平,垂直):

plot_convolution(torch.tensor([[-1.,0.,1.],[-1.,0.,1.],[-1.,0.,1.]]),'Vertical edge filter')plot_convolution(torch.tensor([[-1.,-1.,-1.],[0.,0.,0.],[1.,1.,1.]]),'Horizontal edge filter')

 

水平边缘过滤器:每处理一个均匀像素域就置零

碰到垂直边缘就置高值,水平边缘被平均掉;

垂直边缘过滤器:

反过来增强水平线条,垂直边缘被平均

传统CV vs 深度学习:

  1. 传统要手动设多滤波器,再特征提取,再分类器

  2. 深度学习的网络自动学习最优滤波器,再端对端分类

 

卷积层:nn.Conv2d;参数有3ge1(in_channels=1灰度输入通道数,out_channels=9用9个滤波器,kernel_size=5x5滑动窗口)

输出输入尺寸:28x28卷积后输出张量9x24x24(因为28-5+1=24)

网络结构:卷积层展平后9x24x24=5184维向量,变成线性层输出10类,中间用ReLU激活

总结:滤波器#决定特征提取,卷积核大小影响(不是决定)输出尺寸

代码:

class OneConv(nn.Module):

def __init__(self):

super(OneConv, self).__init__()

self.conv = nn.Conv2d(in_channels=1,out_channels=9,kernel_size=(5,5))

self.flatten = nn.Flatten()

self.fc = nn.Linear(5184,10)

 

def forward(self, x):

x = nn.functional.relu(self.conv(x))

x = self.flatten(x)

x = nn.functional.log_softmax(self.fc(x),dim=1)

return x

 

net = OneConv()

 

summary(net,input_size=(1,1,28,28))

输出:

========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== ├─Conv2d: 1-1 [1, 9, 24, 24] 234 ├─Flatten: 1-2 [1, 5184] -- ├─Linear: 1-3 [1, 10] 51,850 ========================================================================================== Total params: 52,084 Trainable params: 52,084 Non-trainable params: 0 Total mult-adds (M): 0.18 ========================================================================================== Input size (MB): 0.00 Forward/backward pass size (MB): 0.04 Params size (MB): 0.21 Estimated Total Size (MB): 0.25 ==========================================================================================

 

结果:网络含50k训练参数,相比80k全连接多层神经网络,可在小数据集存储更加结果

绘图:

hist = train(net,train_loader,test_loader,epochs=5)

plot_results(hist)

结果:

Epoch 0, Train acc=0.947, Val acc=0.969, Train loss=0.001, Val loss=0.001

Epoch 1, Train acc=0.979, Val acc=0.975, Train loss=0.001, Val loss=0.001

Epoch 2, Train acc=0.985, Val acc=0.977, Train loss=0.000, Val loss=0.001

Epoch 3, Train acc=0.988, Val acc=0.975, Train loss=0.000, Val loss=0.001

Epoch 4, Train acc=0.988, Val acc=0.976, Train loss=0.000, Val loss=0.001

可视化训练卷积层的权:

fig,ax = plt.subplots(1,9)with torch.no_grad():

p = next(net.conv.parameters())

for i,x in enumerate(p):

ax[i].imshow(x.detach().cpu()[0,...])

ax[i].axis('off')

卷积层:检测图像pattern;过程:初层检测简单pattern(线条)-后续层结合pattern+检测复杂特征(形状)-最终层对物体分类

 

池化层:减少图像空间尺寸+降低计算量+防止过拟合,2种

  1. 平均池化:滑动窗口2x2pix算平均值

  2. 最大池化:用最大值替换窗口,检测pattern

CNN架构:层次(在多卷积层间插入池化层),特征图(空间尺寸逐减,滤波器数逐增),输出(用全连接层+激活函数来分类)

 

金字塔架构:空间尺寸逐减,滤波器数逐增,用来高效提取分类特征

分析:输入1x28x28图像 - 第一卷积层 - 第二卷积层 - 输出(全输出层,LogSOftmax用来分10类)

(每个卷积层都是2x2窗口,10个滤波器,R额LU激活函数,最大池化)

代码:

class MultiLayerCNN(nn.Module):

def __init__(self):

super(MultiLayerCNN, self).__init__()

self.conv1 = nn.Conv2d(1, 10, 5)

self.pool = nn.MaxPool2d(2, 2)

self.conv2 = nn.Conv2d(10, 20, 5)

self.fc = nn.Linear(320,10)

 

def forward(self, x):

x = self.pool(nn.functional.relu(self.conv1(x)))

x = self.pool(nn.functional.relu(self.conv2(x)))

x = x.view(-1, 320)

x = nn.functional.log_softmax(self.fc(x),dim=1)

return x

 

net = MultiLayerCNN()summary(net,input_size=(1,1,28,28))

输出:

========================================================================================== Layer (type:depth-idx) Output Shape Param # ========================================================================================== ├─Conv2d: 1-1 [1, 10, 24, 24] 260 ├─MaxPool2d: 1-2 [1, 10, 12, 12] -- ├─Conv2d: 1-3 [1, 20, 8, 8] 5,020 ├─MaxPool2d: 1-4 [1, 20, 4, 4] -- ├─Linear: 1-5 [1, 10] 3,210 ========================================================================================== Total params: 8,490 Trainable params: 8,490 Non-trainable params: 0 Total mult-adds (M): 0.47 ========================================================================================== Input size (MB): 0.00 Forward/backward pass size (MB): 0.06 Params size (MB): 0.03 Estimated Total Size (MB): 0.09 ==========================================================================================

 

Flatten:在forward用view展平张量,不用单独层

Pooling:无参数可复用

参数:少,~8.5k,防止、过拟合

 

代码:

hist = train(net,train_loader,test_loader,epochs=5)

 

结果:一层精准度更高更快,只用1-2个epoch

Epoch 0, Train acc=0.952, Val acc=0.977, Train loss=0.001, Val loss=0.001

Epoch 1, Train acc=0.982, Val acc=0.983, Train loss=0.000, Val loss=0.000

Epoch 2, Train acc=0.986, Val acc=0.983, Train loss=0.000, Val loss=0.000

Epoch 3, Train acc=0.986, Val acc=0.978, Train loss=0.000, Val loss=0.001

Epoch 4, Train acc=0.987, Val acc=0.981, Train loss=0.000, Val loss=0.000

 

 

用CIFAR-10数据集:60000张图(32x32,10类:飞机-车-鸟-猫-鹿-狗-蛙-马-船-卡车,比手写数字复杂),常用架构:LeNet

transform = torchvision.transforms.Compose(

[torchvision.transforms.ToTensor(),

torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

 

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=14, shuffle=True)testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)testloader = torch.utils.data.DataLoader(testset, batch_size=14, shuffle=False)classes = ('plane', 'car', 'bird', 'cat',

'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

display_dataset(trainset,classes=classes)

 

LeNet架构:输入RGB,输出层不用long_softmax,直返全连接层输出,损失函数用CrossEntropyLoss优化模型

 

class LeNet(nn.Module):

def __init__(self):

super(LeNet, self).__init__()

self.conv1 = nn.Conv2d(3, 6, 5)

self.pool = nn.MaxPool2d(2)

self.conv2 = nn.Conv2d(6, 16, 5)

self.conv3 = nn.Conv2d(16,120,5)

self.flat = nn.Flatten()

self.fc1 = nn.Linear(120,64)

self.fc2 = nn.Linear(64,10)

 

def forward(self, x):

x = self.pool(nn.functional.relu(self.conv1(x)))

x = self.pool(nn.functional.relu(self.conv2(x)))

x = nn.functional.relu(self.conv3(x))

x = self.flat(x)

x = nn.functional.relu(self.fc1(x))

x = self.fc2(x)

return x

 

net = LeNet()

 

summary(net,input_size=(1,3,32,32))

opt = torch.optim.SGD(net.parameters(),lr=0.001,momentum=0.9)

hist = train(net, trainloader, testloader, epochs=3, optimizer=opt, loss_fn=nn.CrossEntropyLoss())

输出:3个epoch,复杂模型50%已算不错

Epoch 0, Train acc=0.261, Val acc=0.388, Train loss=0.143, Val loss=0.121 Epoch 1, Train acc=0.437, Val acc=0.491, Train loss=0.110, Val loss=0.101 Epoch 2, Train acc=0.508, Val acc=0.522, Train loss=0.097, Val loss=0.094

 

扩展阅读:

  1. 识别宠物脸部实验

  2. 卷积神经网络PyTorch试水

  3. OPenCV实验

  4. 用光流实现手掌运动检测

 

 

 

 

评论区 (0)

登录后参与评论

暂无评论,抢沙发吧!