迁移学习 - TensorFlow在迁移学习中的实践

edwin99
edwin99
2024-02-05 23:09
125 阅读
0 评论
目录
正在加载目录...

import tensorflow as tf

from tensorflow import keras

import matplotlib.pyplot as plt

import numpy as np

import os

from tfcv import *

 

if not os.path.exists('data/kagglecatsanddogs_5340.zip'):

!wget -P data https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip

import zipfile

if not os.path.exists('data/PetImages'):

with zipfile.ZipFile('data/kagglecatsanddogs_5340.zip', 'r') as zip_ref:

zip_ref.extractall('data')

check_image_dir('data/PetImages/Cat/*.jpg')

check_image_dir('data/PetImages/Dog/*.jpg')

 

加载数据集:图像数据集太大无法全部加载,可用image_dataset_from_directory (),自动分批加载数据,缩放,划分数据集

 

代码:

data_dir = 'data/PetImages'batch_size = 64ds_train = keras.preprocessing.image_dataset_from_directory(

data_dir,

validation_split = 0.2,

subset = 'training',

seed = 13,

image_size = (224,224),

batch_size = batch_size)ds_test = keras.preprocessing.image_dataset_from_directory(

data_dir,

validation_split = 0.2,

subset = 'validation',

seed = 13,

image_size = (224,224),

batch_size = batch_size)

输出:

Found 24769 files belonging to 2 classes.

Using 19816 files for training.

Found 24769 files belonging to 2 classes.

Using 4953 files for validation.

 

注意:相同seed->相同随机采样规则->保证训练/验证集无重叠;类别名称自动按目录名排序(于模型输出logits顺序一致)

 

 

ds_train.class_names

输出:

['Cat', 'Dog']

这两个类可直接pass fit()到训练模型

 

for x,y in ds_train:

print(f"Training batch shape: features={x.shape}, labels={y.shape}")

x_sample, y_sample = x,y

break

display_dataset(x_sample.numpy().astype(np.int),np.expand_dims(y_sample,1),classes=ds_train.class_names)

 

 

预训练模型:可直接给拟合函数(fit)训练模型,包括对应图像和标签,然后遍历

 

vgg = keras.applications.VGG16()

inp = keras.applications.vgg16.preprocess_input(x_sample[:1])

 

res = vgg(inp)

print(f"Most probable class = {tf.argmax(res,1)}")

 

keras.applications.vgg16.decode_predictions(res.numpy())

结果:

[[('n02099712', 'Labrador_retriever', 0.5340957),

('n02100236', 'German_short-haired_pointer', 0.0939442),

('n02092339', 'Weimaraner', 0.08160535),

('n02099849', 'Chesapeake_Bay_retriever', 0.057179328),

('n02109047', 'Great_Dane', 0.03733857)]]

 

注意:

  1. 预训练前,对数据预处理,调用preprocess_input()

  2. VGG16包括归一化图像,从每个通道减掉预定义平均值

  3. 神经网络处理输入批次之后,输入1000元素张量(元素对应一个类别概率),调用argmax()找概率最高的类别编号

  4. 用decode_predictions(),编号转名称

vgg.summary()

 

GPU和Keras可以一起加速,Keras是自动加速

tf.config.list_physical_devices('GPU')

输出:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

 

 

提取VGG特征:

 

vgg = keras.applications.VGG16(include_top=False)

 

inp = keras.applications.vgg16.preprocess_input(x_sample[:1])

res = vgg(inp)print(f"Shape after applying VGG-16: {res[0].shape}")

plt.figure(figsize=(15,3))

plt.imshow(res[0].numpy().reshape(-1,512))

特征张量维度7x7x512,map()接收数据集,龙lambda函数转换,生成新数据集ds_features_train和ds_features_test(两个数据集是VGG提取的特征,非原始图像)

 

num = batch_size*50

ds_features_train = ds_train.take(50).map(lambda x,y : (vgg(x),y))ds_features_test = ds_test.take(10).map(lambda x,y : (vgg(x),y))

 

for x,y in ds_features_train:

print(x.shape,y.shape)

break

输出:

(64, 7, 7, 512) (64,)

 

预训练特征的二次分类:

  1. 用.take(50)限制样本量,特征向量形状(7,7,512;预训练模型中间层输出)

  2. 分类器设计:输入(7,7,512)展平成一维向量(7x7x512=25088),输出1个神经元(二元分类:猫狗),激活函数SIgmoid(输出概率值),损失函数(二院交叉熵)

model = keras.models.Sequential([

keras.layers.Flatten(input_shape=(7,7,512)),

keras.layers.Dense(1,activation='sigmoid')])model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])hist = model.fit(ds_features_train, validation_data=ds_features_test)

用1个VGG网络迁移学习:

 

model = keras.models.Sequential()

model.add(keras.applications.VGG16(include_top=False,input_shape=(224,224,3)))

model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(1,activation='sigmoid'))

 

model.layers[0].trainable = False

 

model.summary()

 

训练过程将原始VGG16当成整体用,避免手动与计算特征:将特征提取器作为网络第一层

 

避免重新训练:冻结卷积特征提取器权重,调用model.layers[0](访问网络第一层),trainable=False;其他的跟PyTorch一样

 

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

hist = model.fit(ds_train, validation_data=ds_test)

 

结果:

310/310 [==============================] - 265s 716ms/step - loss: 0.9917 - acc: 0.9512 - val_loss: 0.8156 - val_acc: 0.9671

 

 

保存模型,加载模型:

model.save('data/cats_dogs.tf')

model = keras.models.load_model('data/cats_dogs.tf')

 

过拟合迁移学习:这部分跟PyTorch一样

model.layers[0].summary()

 

解冻:

model.layers[0].trainable = True

 

for i in range(len(model.layers[0].layers)-4):

model.layers[0].layers[i].trainable = False

model.summary()

hist = model.fit(ds_train, validation_data=ds_test)

 

其他模型:同PyTorch

resnet = keras.applications.ResNet50()resnet.summary()

输出:略

 

扩展阅读:

  1. PyTorch在迁移学习中的时间

  2. 迁移学习在牛津宠物分类的学习

 

 

 

 

 

 

 

 

 

评论区 (0)

登录后参与评论

暂无评论,抢沙发吧!