從基本概念到實現，全卷積網絡實現更簡潔的圖像識別

llman 7年前發布 | 47K 次閱讀圖像識別神經網絡

眾所周知，圖像就是像素值的集合，而這個觀點可以幫助計算機科學家和研究者們構建一個和人類大腦相似并能實現特殊功能的神經網絡。有時候，這種神經網絡甚至能超過人類的準準度。

上圖是一個非常好的案例，其說明了圖像是由像素值表征的這一特征。這些小的像素塊形成了最基本的卷積神經網絡。

卷積神經網絡與一般的神經網絡有非常高的相似性，它們都是由可學習的權重和偏置項還有神經元組成。每個神經元接受一些輸入，然后執行點積（標量），隨后可選擇性地執行非線性分類。整個網絡仍然表示單可微分（single differentiable）的評估函數（score function），整個網絡從一端輸入原始圖像像素，另一端輸出類別的概率。該網絡仍然具有損失函數，因為損失函數可以在最后（全連接）層計算相對概率（如支持向量機/Softmax），并且學習常規神經網絡的各種開發技巧都能應用到損失函數上。

卷積是如何進行的。每一個像素由周圍像素的加權和所替代，神經網絡會學習這些權重。

最近，隨著數據量和計算力的大大提升，ConvNets 在人臉識別、物體識別、交通標志、機器人和自動駕駛等方向表現得十分出色。

下圖顯展示了在 ConvNet 中四種主要的操作：

1. 卷積（Convolution）

2. 非線性（如 ReLU）

3. 池化或子采樣（Pooling or Sub Sampling）

4. 分類（Classification）

一張汽車的圖片經過 ConNet，并在全連接層輸出類別為汽車

全卷積網絡（All Convolution Network）

大多數用于目標識別的現代卷積神經網絡（CNN）都是運用同一原理構建：交替卷積和最大池化層，并伴隨著少量全連接層。以前就有一篇論文提出，最大池化（max-pooling）可被一個帶有增加步長的卷積層輕易替換，而沒有在圖像識別基準上出現精確度的損失。論文中提到的另一個趣事也是用一個全局平均池化（Global Average pooling）替換全連接層。

如需詳細了解全卷積網絡，可查閱論文：https://arxiv.org/abs/1412.6806#

去掉全連接層也許不是一件讓人很驚訝的事，因為長久以來人們本來就不怎么使用它。不久前 Yann LeCun 甚至在非死book 上說，我從一開始就沒用過全連接層。

這不無道理，全連接層與卷積層的唯一區別就是后者的神經元只與輸入中的局部域相連，并且卷積空間之中的很多神經元共享參數。然而，全連接層和卷積層中的神經元依然計算點積，它們的函數形式是相同的。因此，結果證明全連接層和卷積層之間的轉換是可能的，有時甚至可用卷積層替換全連接層。

正如上文提到的，下一步是從網絡中去除空間池化運算。現在這也許會引起一些疑惑。讓我們詳細看一下這個概念。

空間池化（spatial Pooling），也稱為子采樣（subsampling）或下采樣（downsampling），其減少了每一個特征映射的維度，但是保留了最重要的信息。

讓我們以最大池化為例。在這種情況下，我們定義了一個空間窗口（spatial window），并從其中的特征映射獲取最大元素，現在記住圖 2（卷積是如何工作的）。直觀來講帶有更大步長的卷積層可作為子采樣和下采樣層，從而使輸入表征更小更可控。同樣它也可減少網絡中的參數數量和計算，進而控制過擬合的發生。

為了減少表征尺寸，在卷積層中使用更大步長有時成了很多案例中的最佳選擇。在訓練好的生成模型，如變分自動編碼器（VAE）或生成對抗網絡（GAN）中，放棄池化層也是十分重要的。此外，未來的神經網絡架構可能會具有非常少的或根本沒有池化層。

鑒于所有以上提到的小技巧或微調比較重要，我們在 Github 上發布了使用 Keras 模型實現全卷積神經網絡：https://github.com/MateLabs/All-Conv-Keras

導入庫（library）和依賴項（dependency）

from __future__ import print_function
import tensorflow as tf
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Activation, Convolution2D, GlobalAveragePooling2D
from keras.utils import np_utils
from keras.optimizers import SGD
from keras import backend as K
from keras.models import Model
from keras.layers.core import Lambda
from keras.callbacks import ModelCheckpoint
import pandas

在多 GPU 上訓練

對于模型的多 GPU 實現，我們有一個可將訓練數據分配給可用 GPU 的自定義函數。

計算在 GPU 上完成，輸出數據傳給 CPU 以完成模型。

def make_parallel(model, gpu_count):
    def get_slice(data, idx, parts):
        shape = tf.shape(data)
        size = tf.concat(0, [ shape[:1] // parts, shape[1:] ])
        stride = tf.concat(0, [ shape[:1] // parts, shape[1:]*0 ])
        start = stride * idx
        return tf.slice(data, start, size)
    outputs_all = []
    for i in range(len(model.outputs)):
        outputs_all.append([])

#Place a copy of the model on each GPU, each getting a slice of the batch

    for i in range(gpu_count):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('tower_%d' % i) as scope:
            inputs = []

#Slice each input into a piece for processing on this GPU

            for x in model.inputs:
                input_shape = tuple(x.get_shape().as_list())[1:]
                slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
                inputs.append(slice_n)
            outputs = model(inputs)

            if not isinstance(outputs, list):
                outputs = [outputs]

#Save all the outputs for merging back together later

            for l in range(len(outputs)):
                outputs_all[l].append(outputs[l])

# merge outputs on CPU

with tf.device('/cpu:0'):
    merged = []
    for outputs in outputs_all:
        merged.append(merge(outputs, mode='concat', concat_axis=0))
    return Model(input=model.inputs, output=merged)

配置批量大小（batch size）、類（class）數量以及迭代次數

由于我們用的是擁有 10 個類（不同對象的種類）的 CIFAR 10 數據集，所以類的數量是 10，批量大小（batch size）等于 32。迭代次數由你自己的可用時間和設備計算能力決定。在這個例子中我們迭代 1000 次。

圖像尺寸是 32*32，顏色通道 channels=3（rgb）

batch_size = 32
nb_classes = 10
nb_epoch = 1000
rows, cols = 32, 32
channels = 3

把數據集切分成「訓練集」、「測試集」和「驗證集」三部分

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
print (X_train.shape[1:])

Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

構建模型

model = Sequential()

model.add(Convolution2D(96, 3, 3, border_mode = 'same', input_shape=(3, 32, 32)))
model.add(Activation('relu'))
model.add(Convolution2D(96, 3, 3,border_mode='same'))
model.add(Activation('relu'))

#The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.

model.add(Convolution2D(96, 3, 3, border_mode='same', subsample = (2,2)))
model.add(Dropout(0.5))
model.add(Convolution2D(192, 3, 3, border_mode = 'same'))
model.add(Activation('relu'))
model.add(Convolution2D(192, 3, 3,border_mode='same'))
model.add(Activation('relu'))

# The next layer is the substitute of max pooling, we are taking a strided convolution layer to reduce the dimensionality of the image.

model.add(Convolution2D(192, 3, 3,border_mode='same', subsample = (2,2)))
model.add(Dropout(0.5))
model.add(Convolution2D(192, 3, 3, border_mode = 'same'))
model.add(Activation('relu'))
model.add(Convolution2D(192, 1, 1,border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(10, 1, 1, border_mode='valid'))

model.add(GlobalAveragePooling2D())
model.add(Activation('softmax'))
model = make_parallel(model, 4)
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

打印模型。這會給你一個模型的一覽，它非常有助于視覺化模型的維度和參數數量

print (model.summary())

數據擴充

datagen = ImageDataGenerator(
featurewise_center=False,  # set input mean to 0 over the dataset

samplewise_center=False,  # set each sample mean to 0

featurewise_std_normalization=False,  # divide inputs by std of the dataset

samplewise_std_normalization=False,  # divide each input by its std

zca_whitening=False,  # apply ZCA whitening

rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)

width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)

height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)

horizontal_flip=False,  # randomly flip images

vertical_flip=False)  # randomly flip images

datagen.fit(X_train)

在你的模型中保存最佳權重并添加檢查點

filepath="weights.{epoch:02d}-{val_loss:.2f}.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='max')

callbacks_list = [checkpoint]

# Fit the model on the batches generated by datagen.flow().

history_callback = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, validation_data=(X_test, Y_test), callbacks=callbacks_list, verbose=0)

最后，拿到訓練過程的日志并保存你的模型

pandas.DataFrame(history_callback.history).to_csv("history.csv")
model.save('keras_allconv.h5')

以上的模型在前 350 次迭代后很容易就實現超過 90% 的精確度。如果你想要增加精確度，那你可以用計算時間為代價，并嘗試擴充更大的數據。

來自：http://www.jiqizhixin.com/article/2461

本文由用戶 llman 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1489480478001.html

圖像識別神經網絡

從基本概念到實現，全卷積網絡實現更簡潔的圖像識別

相關經驗

相關資訊

相關文檔

目錄