在实际应用场景中,由于训练数据集不足,所以很少有人会从头开始训练整个网络。普遍的做法是,在一个非常大的基础数据集上训练得到一个预训练模型,然后使用该模型来初始化网络的权重参数或作为固定特征提取器应用于特定的任务中。本章将使用迁移学习的方法对ImageNet数据集中的狼和狗图像进行分类。
如果你对MindSpore感兴趣,可以关注昇思MindSpore社区
一、环境准备
1.进入ModelArts官网
云平台帮助用户快速创建和部署模型,管理全周期AI工作流,选择下面的云平台以开始使用昇思MindSpore,获取安装命令,安装MindSpore版本,可以在昇思教程中进入ModelArts官网
选择下方CodeLab立即体验
等待环境搭建完成
2.使用CodeLab体验Notebook实例
下载NoteBook样例代码,Pix2Pix实现图像转换,.ipynb为样例代码
选择ModelArts Upload Files上传.ipynb文件
选择Kernel环境
切换至GPU环境,切换成第一个限时免费
进入昇思MindSpore官网,点击上方的安装获取安装命令
回到Notebook中,在第一块代码前加入命令
conda update-n base-c defaults conda安装MindSpore GPU版本
conda install mindspore=2.0.0a0-c mindspore-c conda-forge安装mindvision
pip install mindvision安装下载download
pip install download二、基础原理
cGAN的生成器与传统GAN的生成器在原理上有一些区别,cGAN的生成器是将输入图片作为指导信息,由输入图像不断尝试生成用于迷惑判别器的“假”图像,由输入图像转换输出为相应“假”图像的本质是从像素到另一个像素的映射,而传统GAN的生成器是基于一个给定的随机噪声生成图像,输出图像通过其他约束条件控制生成,这是cGAN和GAN的在图像翻译任务中的差异。Pix2Pix中判别器的任务是判断从生成器输出的图像是真实的训练图像还是生成的“假”图像。在生成器与判别器的不断博弈过程中,模型会达到一个平衡点,生成器输出的图像与真实训练数据使得判别器刚好具有50%的概率判断正确。
三、准备环节
配置环境文件
本案例在GPU,CPU和Ascend平台的动静态模式都支持。
准备数据
在本教程中,我们将使用指定数据集,该数据集是已经经过处理的外墙(facades)数据,可以直接使用mindspore.dataset的方法读取。
from downloadimportdownload url="https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/application/dataset_pix2pix.tar"download(url,"./dataset",kind="tar",replace=True)数据展示
调用Pix2PixDataset和create_train_dataset读取训练集,这里我们直接下载已经处理好的数据集。
from mindsporeimportdataset as dsimportmatplotlib.pyplot as plt dataset=ds.MindDataset("./dataset/dataset_pix2pix/train.mindrecord",columns_list=["input_images","target_images"],shuffle=True)data_iter=next(dataset.create_dict_iterator(output_numpy=True))# 可视化部分训练数据plt.figure(figsize=(10,3),dpi=140)fori, imageinenumerate(data_iter['input_images'][:10],1): plt.subplot(3,10, i)plt.axis("off")plt.imshow((image.transpose(1,2,0)+1)/2)plt.show()四、创建网络
当处理完数据后,就可以来进行网络的搭建了。网络搭建将逐一详细讨论生成器、判别器和损失函数。生成器G用到的是U-Net结构,输入的轮廓图x
编码再解码成真是图片,判别器D用到的是作者自己提出来的条件判别器PatchGAN,判别器D的作用是在轮廓图 x
的条件下,对于生成的图片G(x) 判断为假,对于真实判断为真。
生成器G结构
U-Net是德国Freiburg大学模式识别和图像处理组提出的一种全卷积结构。它分为两个部分,其中左侧是由卷积和降采样操作组成的压缩路径,右侧是由卷积和上采样组成的扩张路径,扩张的每个网络块的输入由上一层上采样的特征和压缩路径部分的特征拼接而成。网络模型整体是一个U形的结构,因此被叫做U-Net。和常见的先降采样到低维度,再升采样到原始分辨率的编解码结构的网络相比,U-Net的区别是加入skip-connection,对应的feature
maps和decode之后的同样大小的feature maps按通道拼一起,用来保留不同分辨率下像素级的细节信息。
定义UNet Skip Connection Block
importmindsporeimportmindspore.nn as nnimportmindspore.ops as ops class UNetSkipConnectionBlock(nn.Cell): def __init__(self, outer_nc, inner_nc,in_planes=None,dropout=False,submodule=None,outermost=False,innermost=False,alpha=0.2,norm_mode='batch'): super(UNetSkipConnectionBlock, self).__init__()down_norm=nn.BatchNorm2d(inner_nc)up_norm=nn.BatchNorm2d(outer_nc)use_bias=Falseifnorm_mode=='instance':down_norm=nn.BatchNorm2d(inner_nc,affine=False)up_norm=nn.BatchNorm2d(outer_nc,affine=False)use_bias=Trueifin_planes is None: in_planes=outer_nc down_conv=nn.Conv2d(in_planes, inner_nc,kernel_size=4,stride=2,padding=1,has_bias=use_bias,pad_mode='pad')down_relu=nn.LeakyReLU(alpha)up_relu=nn.ReLU()ifoutermost: up_conv=nn.Conv2dTranspose(inner_nc *2, outer_nc,kernel_size=4,stride=2,padding=1,pad_mode='pad')down=[down_conv]up=[up_relu, up_conv, nn.Tanh()]model=down +[submodule]+ upelifinnermost: up_conv=nn.Conv2dTranspose(inner_nc, outer_nc,kernel_size=4,stride=2,padding=1,has_bias=use_bias,pad_mode='pad')down=[down_relu, down_conv]up=[up_relu, up_conv, up_norm]model=down + up else: up_conv=nn.Conv2dTranspose(inner_nc *2, outer_nc,kernel_size=4,stride=2,padding=1,has_bias=use_bias,pad_mode='pad')down=[down_relu, down_conv, down_norm]up=[up_relu, up_conv, up_norm]model=down +[submodule]+ upifdropout: model.append(nn.Dropout(p=0.5))self.model=nn.SequentialCell(model)self.skip_connections=not outermost def construct(self, x): out=self.model(x)ifself.skip_connections: out=ops.concat((out, x),axis=1)returnout基于UNet的生成器
class UNetGenerator(nn.Cell): def __init__(self, in_planes, out_planes,ngf=64,n_layers=8,norm_mode='bn',dropout=False): super(UNetGenerator, self).__init__()unet_block=UNetSkipConnectionBlock(ngf *8, ngf *8,in_planes=None,submodule=None,norm_mode=norm_mode,innermost=True)for_inrange(n_layers -5): unet_block=UNetSkipConnectionBlock(ngf *8, ngf *8,in_planes=None,submodule=unet_block,norm_mode=norm_mode,dropout=dropout)unet_block=UNetSkipConnectionBlock(ngf *4, ngf *8,in_planes=None,submodule=unet_block,norm_mode=norm_mode)unet_block=UNetSkipConnectionBlock(ngf *2, ngf *4,in_planes=None,submodule=unet_block,norm_mode=norm_mode)unet_block=UNetSkipConnectionBlock(ngf, ngf *2,in_planes=None,submodule=unet_block,norm_mode=norm_mode)self.model=UNetSkipConnectionBlock(out_planes, ngf,in_planes=in_planes,submodule=unet_block,outermost=True,norm_mode=norm_mode)def construct(self, x):returnself.model(x)原始cGAN的输入是条件x和噪声z两种信息,这里的生成器只使用了条件信息,因此不能生成多样性的结果。因此Pix2Pix在训练和测试时都使用了dropout,这样可以生成多样性的结果。
基于PatchGAN的判别器
判别器使用的PatchGAN结构,可看做卷积。生成的矩阵中的每个点代表原图的一小块区域(patch)。通过矩阵中的各个值来判断原图中对应每个Patch的真假。
importmindspore.nn as nn class ConvNormRelu(nn.Cell): def __init__(self, in_planes, out_planes,kernel_size=4,stride=2,alpha=0.2,norm_mode='batch',pad_mode='CONSTANT',use_relu=True,padding=None): super(ConvNormRelu, self).__init__()norm=nn.BatchNorm2d(out_planes)ifnorm_mode=='instance':norm=nn.BatchNorm2d(out_planes,affine=False)has_bias=(norm_mode=='instance')ifnot padding: padding=(kernel_size -1)//2ifpad_mode=='CONSTANT':conv=nn.Conv2d(in_planes, out_planes, kernel_size, stride,pad_mode='pad',has_bias=has_bias,padding=padding)layers=[conv, norm]else: paddings=((0,0),(0,0),(padding,padding),(padding,padding))pad=nn.Pad(paddings=paddings,mode=pad_mode)conv=nn.Conv2d(in_planes, out_planes, kernel_size, stride,pad_mode='pad',has_bias=has_bias)layers=[pad, conv, norm]ifuse_relu: relu=nn.ReLU()ifalpha>0: relu=nn.LeakyReLU(alpha)layers.append(relu)self.features=nn.SequentialCell(layers)def construct(self, x): output=self.features(x)returnoutput class Discriminator(nn.Cell): def __init__(self,in_planes=3,ndf=64,n_layers=3,alpha=0.2,norm_mode='batch'): super(Discriminator, self).__init__()kernel_size=4layers=[nn.Conv2d(in_planes, ndf, kernel_size,2,pad_mode='pad',padding=1), nn.LeakyReLU(alpha)]nf_mult=ndfforiinrange(1, n_layers): nf_mult_prev=nf_mult nf_mult=min(2** i,8)* ndf layers.append(ConvNormRelu(nf_mult_prev, nf_mult, kernel_size,2, alpha, norm_mode,padding=1))nf_mult_prev=nf_mult nf_mult=min(2** n_layers,8)* ndf layers.append(ConvNormRelu(nf_mult_prev, nf_mult, kernel_size,1, alpha, norm_mode,padding=1))layers.append(nn.Conv2d(nf_mult,1, kernel_size,1,pad_mode='pad',padding=1))self.features=nn.SequentialCell(layers)def construct(self, x, y): x_y=ops.concat((x, y),axis=1)output=self.features(x_y)returnoutputPix2Pix的生成器和判别器初始化
实例化Pix2Pix生成器和判别器。
importmindspore.nn as nn from mindspore.commonimportinitializer as init g_in_planes=3g_out_planes=3g_ngf=64g_layers=8d_in_planes=6d_ndf=64d_layers=3alpha=0.2init_gain=0.02init_type='normal'net_generator=UNetGenerator(in_planes=g_in_planes,out_planes=g_out_planes,ngf=g_ngf,n_layers=g_layers)for_, cellinnet_generator.cells_and_names():ifisinstance(cell,(nn.Conv2d, nn.Conv2dTranspose)):ifinit_type=='normal':cell.weight.set_data(init.initializer(init.Normal(init_gain), cell.weight.shape))elifinit_type=='xavier':cell.weight.set_data(init.initializer(init.XavierUniform(init_gain), cell.weight.shape))elifinit_type=='constant':cell.weight.set_data(init.initializer(0.001, cell.weight.shape))else: raise NotImplementedError('initialization method [%s] is not implemented'% init_type)elifisinstance(cell, nn.BatchNorm2d): cell.gamma.set_data(init.initializer('ones', cell.gamma.shape))cell.beta.set_data(init.initializer('zeros', cell.beta.shape))net_discriminator=Discriminator(in_planes=d_in_planes,ndf=d_ndf,alpha=alpha,n_layers=d_layers)for_, cellinnet_discriminator.cells_and_names():ifisinstance(cell,(nn.Conv2d, nn.Conv2dTranspose)):ifinit_type=='normal':cell.weight.set_data(init.initializer(init.Normal(init_gain), cell.weight.shape))elifinit_type=='xavier':cell.weight.set_data(init.initializer(init.XavierUniform(init_gain), cell.weight.shape))elifinit_type=='constant':cell.weight.set_data(init.initializer(0.001, cell.weight.shape))else: raise NotImplementedError('initialization method [%s] is not implemented'% init_type)elifisinstance(cell, nn.BatchNorm2d): cell.gamma.set_data(init.initializer('ones', cell.gamma.shape))cell.beta.set_data(init.initializer('zeros', cell.beta.shape))class Pix2Pix(nn.Cell):"""Pix2Pix模型网络""" def __init__(self, discriminator, generator): super(Pix2Pix, self).__init__(auto_prefix=True)self.net_discriminator=discriminator self.net_generator=generator def construct(self, reala): fakeb=self.net_generator(reala)returnfakeb五、训练
训练分为两个主要部分:训练判别器和训练生成器。训练判别器的目的是最大程度地提高判别图像真伪的概率。训练生成器是希望能产生更好的虚假图像。在这两个部分中,分别获取训练过程中的损失,并在每个周期结束时进行统计。
下面进行训练:
importnumpy as npimportosimportdatetime from mindsporeimportvalue_and_grad, Tensor epoch_num=100ckpt_dir="results/ckpt"dataset_size=400val_pic_size=256lr=0.0002n_epochs=100n_epochs_decay=100def get_lr(): lrs=[lr]* dataset_size * n_epochs lr_epoch=0forepochinrange(n_epochs_decay): lr_epoch=lr *(n_epochs_decay - epoch)/ n_epochs_decay lrs+=[lr_epoch]* dataset_size lrs+=[lr_epoch]* dataset_size *(epoch_num - n_epochs_decay - n_epochs)returnTensor(np.array(lrs).astype(np.float32))dataset=ds.MindDataset("./dataset/dataset_pix2pix/train.mindrecord",columns_list=["input_images","target_images"],shuffle=True,num_parallel_workers=16)steps_per_epoch=dataset.get_dataset_size()loss_f=nn.BCEWithLogitsLoss()l1_loss=nn.L1Loss()def forword_dis(reala, realb): lambda_dis=0.5fakeb=net_generator(reala)pred0=net_discriminator(reala, fakeb)pred1=net_discriminator(reala, realb)loss_d=loss_f(pred1, ops.ones_like(pred1))+ loss_f(pred0, ops.zeros_like(pred0))loss_dis=loss_d * lambda_disreturnloss_dis def forword_gan(reala, realb): lambda_gan=0.5lambda_l1=100fakeb=net_generator(reala)pred0=net_discriminator(reala, fakeb)loss_1=loss_f(pred0, ops.ones_like(pred0))loss_2=l1_loss(fakeb, realb)loss_gan=loss_1 * lambda_gan + loss_2 * lambda_l1returnloss_gan d_opt=nn.Adam(net_discriminator.trainable_params(),learning_rate=get_lr(),beta1=0.5,beta2=0.999,loss_scale=1)g_opt=nn.Adam(net_generator.trainable_params(),learning_rate=get_lr(),beta1=0.5,beta2=0.999,loss_scale=1)grad_d=value_and_grad(forword_dis, None, net_discriminator.trainable_params())grad_g=value_and_grad(forword_gan, None, net_generator.trainable_params())def train_step(reala, realb): loss_dis, d_grads=grad_d(reala, realb)loss_gan, g_grads=grad_g(reala, realb)d_opt(d_grads)g_opt(g_grads)returnloss_dis, loss_ganifnot os.path.isdir(ckpt_dir): os.makedirs(ckpt_dir)g_losses=[]d_losses=[]data_loader=dataset.create_dict_iterator(output_numpy=True,num_epochs=epoch_num)forepochinrange(epoch_num):fori, datainenumerate(data_loader): start_time=datetime.datetime.now()input_image=Tensor(data["input_images"])target_image=Tensor(data["target_images"])dis_loss, gen_loss=train_step(input_image, target_image)end_time=datetime.datetime.now()delta=(end_time - start_time).microsecondsifi %2==0: print("ms per step:{:.2f} epoch:{}/{} step:{}/{} Dloss:{:.4f} Gloss:{:.4f} ".format((delta/1000),(epoch+1),(epoch_num),i,steps_per_epoch,float(dis_loss),float(gen_loss)))d_losses.append(dis_loss.asnumpy())g_losses.append(gen_loss.asnumpy())if(epoch +1)==epoch_num: mindspore.save_checkpoint(net_generator, ckpt_dir +"Generator.ckpt")六、推理
获取上述训练过程完成后的ckpt文件,通过load_checkpoint和load_param_into_net将ckpt中的权重参数导入到模型中,获取数据进行推理并对推理的效果图进行演示(由于时间问题,训练过程只进行了100个epoch)。
from mindsporeimportload_checkpoint, load_param_into_net param_g=load_checkpoint(ckpt_dir +"Generator.ckpt")load_param_into_net(net_generator, param_g)dataset=ds.MindDataset("./dataset/dataset_pix2pix/train.mindrecord",columns_list=["input_images","target_images"],shuffle=True)data_iter=next(dataset.create_dict_iterator())predict_show=net_generator(data_iter["input_images"])plt.figure(figsize=(10,3),dpi=140)foriinrange(10): plt.subplot(2,10, i +1)plt.imshow((data_iter["input_images"][i].asnumpy().transpose(1,2,0)+1)/2)plt.axis("off")plt.subplots_adjust(wspace=0.05,hspace=0.02)plt.subplot(2,10, i +11)plt.imshow((predict_show[i].asnumpy().transpose(1,2,0)+1)/2)plt.axis("off")plt.subplots_adjust(wspace=0.05,hspace=0.02)plt.show()各数据集分别推理的效果如下