AlexNet
AlexNet虽然如今在实践中并不常用,但是很具有历史意义,它是第一个在ImageNet比赛中获奖的CNN结构。
VGG
VGGNet实际上和AlexNet一样都是对CNN进行堆叠,只不过层数更多一些。常用的有Vgga(11层),Vgg16和Vgg19。结构如下图所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# VGG16 Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) # Convs 0.conv1_1 [3, 64, 3, 3] 2.conv1_2 [64, 64, 3, 3] 5.conv2_1 [64, 128, 3, 3] 7.conv2_2 [128, 128, 3, 3] 10.conv3_1 [128, 256, 3, 3] 12.conv3_2 [256, 256, 3, 3] 14.conv3_3 [256, 256, 3, 3] 17.conv4_1 [256, 512, 3, 3] 19.conv4_2 [512, 512, 3, 3] 21.conv4_3 [512, 512, 3, 3] 24.conv5_1 [512, 512, 3, 3] 26.conv5_2 [512, 512, 3, 3] 28.conv5_3 [512, 512, 3, 3] # VGG19 Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace) (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (17): ReLU(inplace) (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace) (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (24): ReLU(inplace) (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (26): ReLU(inplace) (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace) (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (31): ReLU(inplace) (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (33): ReLU(inplace) (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (35): ReLU(inplace) (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) # Convs 0.conv1_1 [3, 64, 3, 3] 2.conv1_2 [64, 64, 3, 3] 5.conv2_1 [64, 128, 3, 3] 7.conv2_2 [128, 128, 3, 3] 10.conv3_1 [128, 256, 3, 3] 12.conv3_2 [256, 256, 3, 3] 14.conv3_3 [256, 256, 3, 3] 16.conv3_4 [256, 256, 3, 3] 19.conv4_1 [256, 512, 3, 3] 21.conv4_2 [512, 512, 3, 3] 23.conv4_3 [512, 512, 3, 3] 25.conv4_4 [512, 512, 3, 3] 28.conv5_1 [512, 512, 3, 3] 30.conv5_2 [512, 512, 3, 3] 32.conv5_3 [512, 512, 3, 3] 34.conv5_4 [512, 512, 3, 3] Total number of parameters: 20024384 |
ResNet
Arxiv https://arxiv.org/abs/1512.03385
VGG将卷积层的层数加深了,那么是不是卷积层数越多,网络的效果一定越好呢?
随着网络的加深,在反向传播的过程中会出现梯度消失和梯度爆炸的问题。使用正态分布进行参数初始化和Batch Normalization可以略微缓解这样的问题,但是准确率却无法再提高了。ResNet解决的问题是如何在加深网络的同时还能继续提高准确率(至少保证准确率不降低)。
为此作者提出了残差映射的概念:令H(x) = F(x)+x 。如下图所示,其中F(x)表示权重层的简单堆叠,在文中使用两层或三层的堆叠(只使用1层没有明显的效果),F(x)与x相加实现了“快捷连接”。
理想情况下,较深的网络不应该比较浅的网络效果更差(实际上却会出现这样的情况,如下图所示),ResNet将网络的处理与输入相加,如果网络的效果已经达到了饱和,那么再增加层数时,优化中会使F(x)——>0,这样更深的层数就不会降低网络的表现了。事实上,ResNet在深度大幅增加后能够很容易地获得准确率的提升。
ResNet的层数大大加深,常用的有Res50(50层)和Res101(101层)。ResNet的参数个数比VGG更少,但是能够取得更好的效果。
DenseNet
DenseNet借鉴了ResNet的思想,每一层都和之前的所有层有连接,既增加了特征的利用率,又减少了深度的冗余性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Dense( (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (dense_block1): _DenseBlock( # DenseBlock1有6层 (denselayer1): _DenseLayer( (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # 64进32出 (denselayer2): _DenseLayer( (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # (64+32)进32出 (denselayer3): _DenseLayer( (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # (64+32×2)进32出 (denselayer4): _DenseLayer( (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # (64+32×3)进32出 (denselayer5): _DenseLayer( (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # (64+32×4)进32出 (denselayer6): _DenseLayer( (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) # (64+32×5)进32出 ) (dense_block2):... |
参数比较
1 2 3 4 5 6 7 8 |
Model: n_params receptive field* Alexnet 61,100,840 51 Vgg16 138,357,544 27 Vgg19 143,667,240 43 Res50 25,557,032 983 Res101 44,549,160 2,071 Res152 60,192,808 2,967 Dense121 7,978,856 239 |