文章详情

import tensorflow as tf

from tensorflow import keras

import matplotlib.pyplot as plt

import numpy as np

import os

from tfcv import *

if not os.path.exists('data/kagglecatsanddogs_5340.zip'):

!wget -P data https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip

import zipfile

if not os.path.exists('data/PetImages'):

with zipfile.ZipFile('data/kagglecatsanddogs_5340.zip', 'r') as zip_ref:

zip_ref.extractall('data')

check_image_dir('data/PetImages/Cat/*.jpg')

check_image_dir('data/PetImages/Dog/*.jpg')

加载数据集：图像数据集太大无法全部加载，可用image_dataset_from_directory ()，自动分批加载数据，缩放，划分数据集

代码：

data_dir = 'data/PetImages'batch_size = 64ds_train = keras.preprocessing.image_dataset_from_directory(

data_dir,

validation_split = 0.2,

subset = 'training',

seed = 13,

image_size = (224,224),

batch_size = batch_size)ds_test = keras.preprocessing.image_dataset_from_directory(

data_dir,

validation_split = 0.2,

subset = 'validation',

seed = 13,

image_size = (224,224),

batch_size = batch_size)

输出：

Found 24769 files belonging to 2 classes.

Using 19816 files for training.

Found 24769 files belonging to 2 classes.

Using 4953 files for validation.

注意：相同seed->相同随机采样规则->保证训练/验证集无重叠；类别名称自动按目录名排序（于模型输出logits顺序一致）

ds_train.class_names

输出：

['Cat', 'Dog']

这两个类可直接pass fit()到训练模型

for x,y in ds_train:

print(f"Training batch shape: features={x.shape}, labels={y.shape}")

x_sample, y_sample = x,y

break

display_dataset(x_sample.numpy().astype(np.int),np.expand_dims(y_sample,1),classes=ds_train.class_names)

预训练模型：可直接给拟合函数（fit）训练模型，包括对应图像和标签，然后遍历

vgg = keras.applications.VGG16()

inp = keras.applications.vgg16.preprocess_input(x_sample[:1])

res = vgg(inp)

print(f"Most probable class = {tf.argmax(res,1)}")

keras.applications.vgg16.decode_predictions(res.numpy())

结果：

[[('n02099712', 'Labrador_retriever', 0.5340957),

('n02100236', 'German_short-haired_pointer', 0.0939442),

('n02092339', 'Weimaraner', 0.08160535),

('n02099849', 'Chesapeake_Bay_retriever', 0.057179328),

('n02109047', 'Great_Dane', 0.03733857)]]

注意：

预训练前，对数据预处理，调用preprocess_input()
VGG16包括归一化图像，从每个通道减掉预定义平均值
神经网络处理输入批次之后，输入1000元素张量（元素对应一个类别概率），调用argmax()找概率最高的类别编号
用decode_predictions()，编号转名称

vgg.summary()

GPU和Keras可以一起加速，Keras是自动加速

tf.config.list_physical_devices('GPU')

输出：

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

提取VGG特征：

vgg = keras.applications.VGG16(include_top=False)

inp = keras.applications.vgg16.preprocess_input(x_sample[:1])

res = vgg(inp)print(f"Shape after applying VGG-16: {res[0].shape}")

plt.figure(figsize=(15,3))

plt.imshow(res[0].numpy().reshape(-1,512))

特征张量维度7x7x512，map()接收数据集，龙lambda函数转换，生成新数据集ds_features_train和ds_features_test（两个数据集是VGG提取的特征，非原始图像）

num = batch_size*50

ds_features_train = ds_train.take(50).map(lambda x,y : (vgg(x),y))ds_features_test = ds_test.take(10).map(lambda x,y : (vgg(x),y))

for x,y in ds_features_train:

print(x.shape,y.shape)

break

输出：

(64, 7, 7, 512) (64,)

预训练特征的二次分类：

用.take(50)限制样本量，特征向量形状（7,7,512；预训练模型中间层输出）
分类器设计：输入(7,7,512)展平成一维向量（7x7x512=25088），输出1个神经元（二元分类：猫狗），激活函数SIgmoid（输出概率值），损失函数（二院交叉熵）

model = keras.models.Sequential([

keras.layers.Flatten(input_shape=(7,7,512)),

keras.layers.Dense(1,activation='sigmoid')])model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])hist = model.fit(ds_features_train, validation_data=ds_features_test)

用1个VGG网络迁移学习：

model = keras.models.Sequential()

model.add(keras.applications.VGG16(include_top=False,input_shape=(224,224,3)))

model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(1,activation='sigmoid'))

model.layers[0].trainable = False

model.summary()

训练过程将原始VGG16当成整体用，避免手动与计算特征：将特征提取器作为网络第一层

避免重新训练：冻结卷积特征提取器权重，调用model.layers[0]（访问网络第一层），trainable=False；其他的跟PyTorch一样

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

hist = model.fit(ds_train, validation_data=ds_test)

结果：

310/310 [==============================] - 265s 716ms/step - loss: 0.9917 - acc: 0.9512 - val_loss: 0.8156 - val_acc: 0.9671

保存模型，加载模型：

model.save('data/cats_dogs.tf')

model = keras.models.load_model('data/cats_dogs.tf')

过拟合迁移学习：这部分跟PyTorch一样

model.layers[0].summary()

解冻：

model.layers[0].trainable = True

for i in range(len(model.layers[0].layers)-4):

model.layers[0].layers[i].trainable = False

model.summary()

hist = model.fit(ds_train, validation_data=ds_test)

其他模型：同PyTorch

resnet = keras.applications.ResNet50()resnet.summary()

输出：略

扩展阅读：

PyTorch在迁移学习中的时间
迁移学习在牛津宠物分类的学习

迁移学习 - TensorFlow在迁移学习中的实践

目录

评论区 (0)