神級深度學習框架TensorFlow理解的正確姿勢

wangdawei 8年前發布 | 63K 次閱讀 TensorFlow

在2015年11月9號Google發布了人工智能系統TensorFlow并宣布開源，此舉在深度學習領域影響巨大，也受到大量的深度學習開發者極大的關注。當然，對于人工智能這個領域，依然有不少質疑的聲音，但不可否認的是人工智能仍然是未來發展的趨勢。

而TensorFlow能夠在登陸GitHub的當天就成為最受關注的項目，作為構建深度學習模型的最佳方式、深度學習框架的領頭者，在發布當周輕松獲得超過1萬個星數評級，這主要是因為Google在人工智能領域的研發成績斐然和神級的技術人才儲備。當然還有一點是在圍棋上第一次打敗人類，然后升級版Master保持連續60盤不敗的AlphaGo，其強化學習的框架也是基于TensorFlow的高級API實現的。

TensorFlow: 為什么是它?

作為Goolge二代DL框架，使用數據流圖的形式進行計算的TensorFlow已經成為了機器學習、深度學習領域中最受歡迎的框架之一。自從發布以來，TensorFlow不斷在完善并增加新功能，并在今年的2月26號在Mountain View舉辦的首屆年度TensorFlow開發者峰會上正式發布了TensorFlow 1.0版本，其最大的亮點就是通過優化模型達到最快的速度，且快到令人難以置信，更讓人想不到的是很多擁護者用TensorFlow 1.0的發布來定義AI的元年。

通過以上Google指數，深度學習占據目前流程技術的第一位

TensorFlow在過去獲得成績主要有以下幾點：

TensorFlow被應用在Google很多的應用包括：Gmail, Google Play Recommendation, Search, Translate, Map等等；
在醫療方面，TensorFlow被科學家用來搭建根據視網膜來預防糖尿病致盲（后面也提到Stanford的PHD使用TensorFlow來預測皮膚癌，相關工作上了Nature封面）；
通過在音樂、繪畫這塊的領域使用TensorFlow構建深度學習模型來幫助人類更好地理解藝術；
使用TensorFlow框架和高科技設備，構建自動化的海洋生物檢測系統，用來幫助科學家了解海洋生物的情況；
TensorFlow在移動客戶端發力，有多款在移動設備上使用TensorFlow做翻譯、風格化等工作；
TensorFlow在移動設備CPU（高通820）上，能夠達到更高的性能和更低的功耗；
TensorFlow ecosystem結合其他開源項目能夠快速地搭建高性能的生產環境；
TensorBoard Embedded vector可視化工作
能夠幫助PHD/科研工作者快速開展project研究工作。

Google第一代分布式機器學習框架DistBelief不再滿足Google內部的需求，Google的小伙伴們在DistBelief基礎上做了重新設計，引入各種計算設備的支持包括CPU/GPU/TPU，以及能夠很好地運行在移動端，如安卓設備、ios、樹莓派等等，支持多種不同的語言（因為各種high-level的api，訓練僅支持Python，inference支持包括C++，Go，Java等等），另外包括像TensorBoard這類很棒的工具，能夠有效地提高深度學習研究工作者的效率。

TensorFlow在Google內部項目應用的增長也十分迅速：在Google多個產品都有應用如：Gmail，Google Play Recommendation， Search， Translate， Map等等；有將近100多project和paper使用TensorFlow做相關工作。

TensorFlow在正式版發布前的過去14個月的時間內也獲得了很多的成績，包括475+非Google的Contributors，14000+次commit，超過5500標題中出現過TensorFlow的github project以及在Stack Overflow上有包括5000+個已被回答的問題，平均每周80+的issue提交，且被一些頂尖的學術研究項目使用： – Neural Machine Translation – Neural Architecture Search – Show and Tell.

當然了，說到底深度學習就是用非監督式或者半監督式的特征學習，分層特征提取高校算法來替代手工獲取特征。目前研究人員和從事深度學習的開發者使用深度學習框架也并非只有TensorFlow一個，同樣也有很多在視覺、語言、自然語言處理和生物信息等領域較為優秀的框架，比如Torch、Caffe、Theano、Deeplearning4j等。

下面，編者帶大家深入理解TensorFlow這個在深度學習領域的領頭者的一些關鍵技術、算法以及思想。

GoogLeNet

GoogLeNet是ILSVRC 2014的冠軍，主要是致敬經典的LeNet-5算法，主要是Google的team成員完成，paper見Going Deeper with Convolutions.相關工作主要包括LeNet-5、Gabor filters、Network-in-Network.Network-in-Network改進了傳統的CNN網絡，采用少量的參數就輕松地擊敗了AlexNet網絡，使用Network-in-Network的模型最后大小約為29MNetwork-in-Network caffe model.GoogLeNet借鑒了Network-in-Network的思想，下面會詳細講述下。

1）Network-in-Network

左邊是我們CNN的線性卷積層，一般來說線性卷積層用來提取線性可分的特征，但所提取的特征高度非線性時，我們需要更加多的filters來提取各種潛在的特征，這樣就存在一個問題，filters太多，導致網絡參數太多，網絡過于復雜對于計算壓力太大。

文章主要從兩個方法來做了一些改良：1，卷積層的改進：MLPconv，在每個local部分進行比傳統卷積層復雜的計算，如上圖右，提高每一層卷積層對于復雜特征的識別能力，這里舉個不恰當的例子，傳統的CNN網絡，每一層的卷積層相當于一個只會做單一任務，你必須要增加海量的filters來達到完成特定量類型的任務，而MLPconv的每層conv有更加大的能力，每一層能夠做多種不同類型的任務，在選擇filters時只需要很少量的部分；2，采用全局均值池化來解決傳統CNN網絡中最后全連接層參數過于復雜的特點，而且全連接會造成網絡的泛化能力差，Alexnet中有提高使用dropout來提高網絡的泛化能力。

最后作者設計了一個4層的Network-in-network+全局均值池化層來做imagenet的分類問題。

class NiN(Network): 
   def setup(self): 
       (self.feed('data') 
            .conv(11, 11, 96, 4, 4, padding='VALID', name='conv1') 
            .conv(1, 1, 96, 1, 1, name='cccp1') 
            .conv(1, 1, 96, 1, 1, name='cccp2') 
            .max_pool(3, 3, 2, 2, name='pool1') 
            .conv(5, 5, 256, 1, 1, name='conv2') 
            .conv(1, 1, 256, 1, 1, name='cccp3') 
            .conv(1, 1, 256, 1, 1, name='cccp4') 
            .max_pool(3, 3, 2, 2, padding='VALID', name='pool2') 
            .conv(3, 3, 384, 1, 1, name='conv3') 
            .conv(1, 1, 384, 1, 1, name='cccp5') 
            .conv(1, 1, 384, 1, 1, name='cccp6') 
            .max_pool(3, 3, 2, 2, padding='VALID', name='pool3') 
            .conv(3, 3, 1024, 1, 1, name='conv4-1024') 
            .conv(1, 1, 1024, 1, 1, name='cccp7-1024') 
            .conv(1, 1, 1000, 1, 1, name='cccp8-1024') 
            .avg_pool(6, 6, 1, 1, padding='VALID', name='pool4') 
            .softmax(name='prob'))

網絡基本結果如上，代碼見https://github.com/ethereon/caffe-tensorflow. 這里因為我最近工作變動的問題，沒有了機器來跑一篇，也無法畫下基本的網絡結構圖，之后我會補上。這里指的提出的是中間cccp1和ccp2（cross channel pooling）等價于1*1kernel大小的卷積層。caffe中NIN的實現如下：

name: "nin_imagenet" 
layers { 
  top: "data" 
  top: "label" 
  name: "data" 
  type: DATA 
  data_param { 
    source: "/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb" 
    backend: LMDB 
    batch_size: 64 
  } 
  transform_param { 
    crop_size: 224 
    mirror: true 
    mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean" 
  } 
  include: { phase: TRAIN } 
} 
layers { 
  top: "data" 
  top: "label" 
  name: "data" 
  type: DATA 
  data_param { 
    source: "/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb" 
    backend: LMDB 
    batch_size: 89 
  } 
  transform_param { 
    crop_size: 224 
    mirror: false 
    mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean" 
  } 
  include: { phase: TEST } 
} 
layers { 
  bottom: "data" 
  top: "conv1" 
  name: "conv1" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 96 
    kernel_size: 11 
    stride: 4 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.01 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "conv1" 
  top: "conv1" 
  name: "relu0" 
  type: RELU 
} 
layers { 
  bottom: "conv1" 
  top: "cccp1" 
  name: "cccp1" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 96 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp1" 
  top: "cccp1" 
  name: "relu1" 
  type: RELU 
} 
layers { 
  bottom: "cccp1" 
  top: "cccp2" 
  name: "cccp2" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 96 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp2" 
  top: "cccp2" 
  name: "relu2" 
  type: RELU 
} 
layers { 
  bottom: "cccp2" 
  top: "pool0" 
  name: "pool0" 
  type: POOLING 
  pooling_param { 
    pool: MAX 
    kernel_size: 3 
    stride: 2 
  } 
} 
layers { 
  bottom: "pool0" 
  top: "conv2" 
  name: "conv2" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 256 
    pad: 2 
    kernel_size: 5 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "conv2" 
  top: "conv2" 
  name: "relu3" 
  type: RELU 
} 
layers { 
  bottom: "conv2" 
  top: "cccp3" 
  name: "cccp3" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 256 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp3" 
  top: "cccp3" 
  name: "relu5" 
  type: RELU 
} 
layers { 
  bottom: "cccp3" 
  top: "cccp4" 
  name: "cccp4" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 256 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp4" 
  top: "cccp4" 
  name: "relu6" 
  type: RELU 
} 
layers { 
  bottom: "cccp4" 
  top: "pool2" 
  name: "pool2" 
  type: POOLING 
  pooling_param { 
    pool: MAX 
    kernel_size: 3 
    stride: 2 
  } 
} 
layers { 
  bottom: "pool2" 
  top: "conv3" 
  name: "conv3" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 384 
    pad: 1 
    kernel_size: 3 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.01 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "conv3" 
  top: "conv3" 
  name: "relu7" 
  type: RELU 
} 
layers { 
  bottom: "conv3" 
  top: "cccp5" 
  name: "cccp5" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 384 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp5" 
  top: "cccp5" 
  name: "relu8" 
  type: RELU 
} 
layers { 
  bottom: "cccp5" 
  top: "cccp6" 
  name: "cccp6" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 384 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp6" 
  top: "cccp6" 
  name: "relu9" 
  type: RELU 
} 
layers { 
  bottom: "cccp6" 
  top: "pool3" 
  name: "pool3" 
  type: POOLING 
  pooling_param { 
    pool: MAX 
    kernel_size: 3 
    stride: 2 
  } 
} 
layers { 
  bottom: "pool3" 
  top: "pool3" 
  name: "drop" 
  type: DROPOUT 
  dropout_param { 
    dropout_ratio: 0.5 
  } 
} 
layers { 
  bottom: "pool3" 
  top: "conv4" 
  name: "conv4-1024" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 1024 
    pad: 1 
    kernel_size: 3 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "conv4" 
  top: "conv4" 
  name: "relu10" 
  type: RELU 
} 
layers { 
  bottom: "conv4" 
  top: "cccp7" 
  name: "cccp7-1024" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 1024 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.05 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp7" 
  top: "cccp7" 
  name: "relu11" 
  type: RELU 
} 
layers { 
  bottom: "cccp7" 
  top: "cccp8" 
  name: "cccp8-1024" 
  type: CONVOLUTION 
  blobs_lr: 1 
  blobs_lr: 2 
  weight_decay: 1 
  weight_decay: 0 
  convolution_param { 
    num_output: 1000 
    kernel_size: 1 
    stride: 1 
    weight_filler { 
      type: "gaussian" 
      mean: 0 
      std: 0.01 
    } 
    bias_filler { 
      type: "constant" 
      value: 0 
    } 
  } 
} 
layers { 
  bottom: "cccp8" 
  top: "cccp8" 
  name: "relu12" 
  type: RELU 
} 
layers { 
  bottom: "cccp8" 
  top: "pool4" 
  name: "pool4" 
  type: POOLING 
  pooling_param { 
    pool: AVE 
    kernel_size: 6 
    stride: 1 
  } 
} 
layers { 
  name: "accuracy" 
  type: ACCURACY 
  bottom: "pool4" 
  bottom: "label" 
  top: "accuracy" 
  include: { phase: TEST } 
} 
layers { 
  bottom: "pool4" 
  bottom: "label" 
  name: "loss" 
  type: SOFTMAX_LOSS 
  include: { phase: TRAIN } 
}

NIN的提出其實也可以認為我們加深了網絡的深度，通過加深網絡深度（增加單個NIN的特征表示能力）以及將原先全連接層變為aver_pool層，大大減少了原先需要的filters數，減少了model的參數。paper中實驗證明達到Alexnet相同的性能，最終model大小僅為29M。

理解NIN之后，再來看GoogLeNet就不會有不明所理的感覺。

痛點

越大的CNN網絡，有更大的model參數，也需要更多的計算力支持，并且由于模型過于復雜會過擬合；

在CNN中，網絡的層數的增加會伴隨著需求計算資源的增加；

稀疏的network是可以接受，但是稀疏的數據結構通常在計算時效率很低

Inception module

Inception module的提出主要考慮多個不同size的卷積核能夠hold圖像當中不同cluster的信息，為方便計算，paper中分別使用1*1，3*3，5*5，同時加入3*3 max pooling模塊。然而這里存在一個很大的計算隱患，每一層Inception module的輸出的filters將是分支所有filters數量的綜合，經過多層之后，最終model的數量將會變得巨大，naive的inception會對計算資源有更大的依賴。前面我們有提到Network-in-Network模型，1*1的模型能夠有效進行降維（使用更少的來表達盡可能多的信息），所以文章提出了”Inception module with dimension reduction”,在不損失模型特征表示能力的前提下，盡量減少filters的數量，達到減少model復雜度的目的：

Overall of GoogLeNet

在tensorflow構造GoogLeNet基本的代碼：

from kaffe.tensorflow import Network 
 
class GoogleNet(Network): 
    def setup(self): 
        (self.feed('data') 
             .conv(7, 7, 64, 2, 2, name='conv1_7x7_s2') 
             .max_pool(3, 3, 2, 2, name='pool1_3x3_s2') 
             .lrn(2, 2e-05, 0.75, name='pool1_norm1') 
             .conv(1, 1, 64, 1, 1, name='conv2_3x3_reduce') 
             .conv(3, 3, 192, 1, 1, name='conv2_3x3') 
             .lrn(2, 2e-05, 0.75, name='conv2_norm2') 
             .max_pool(3, 3, 2, 2, name='pool2_3x3_s2') 
             .conv(1, 1, 64, 1, 1, name='inception_3a_1x1')) 
 
        (self.feed('pool2_3x3_s2') 
             .conv(1, 1, 96, 1, 1, name='inception_3a_3x3_reduce') 
             .conv(3, 3, 128, 1, 1, name='inception_3a_3x3')) 
 
        (self.feed('pool2_3x3_s2') 
             .conv(1, 1, 16, 1, 1, name='inception_3a_5x5_reduce') 
             .conv(5, 5, 32, 1, 1, name='inception_3a_5x5')) 
 
        (self.feed('pool2_3x3_s2') 
             .max_pool(3, 3, 1, 1, name='inception_3a_pool') 
             .conv(1, 1, 32, 1, 1, name='inception_3a_pool_proj')) 
 
        (self.feed('inception_3a_1x1', 
                   'inception_3a_3x3', 
                   'inception_3a_5x5', 
                   'inception_3a_pool_proj') 
             .concat(3, name='inception_3a_output') 
             .conv(1, 1, 128, 1, 1, name='inception_3b_1x1')) 
 
        (self.feed('inception_3a_output') 
             .conv(1, 1, 128, 1, 1, name='inception_3b_3x3_reduce') 
             .conv(3, 3, 192, 1, 1, name='inception_3b_3x3')) 
 
        (self.feed('inception_3a_output') 
             .conv(1, 1, 32, 1, 1, name='inception_3b_5x5_reduce') 
             .conv(5, 5, 96, 1, 1, name='inception_3b_5x5')) 
 
        (self.feed('inception_3a_output') 
             .max_pool(3, 3, 1, 1, name='inception_3b_pool') 
             .conv(1, 1, 64, 1, 1, name='inception_3b_pool_proj')) 
 
        (self.feed('inception_3b_1x1', 
                   'inception_3b_3x3', 
                   'inception_3b_5x5', 
                   'inception_3b_pool_proj') 
             .concat(3, name='inception_3b_output') 
             .max_pool(3, 3, 2, 2, name='pool3_3x3_s2') 
             .conv(1, 1, 192, 1, 1, name='inception_4a_1x1')) 
 
        (self.feed('pool3_3x3_s2') 
             .conv(1, 1, 96, 1, 1, name='inception_4a_3x3_reduce') 
             .conv(3, 3, 208, 1, 1, name='inception_4a_3x3')) 
 
        (self.feed('pool3_3x3_s2') 
             .conv(1, 1, 16, 1, 1, name='inception_4a_5x5_reduce') 
             .conv(5, 5, 48, 1, 1, name='inception_4a_5x5')) 
 
        (self.feed('pool3_3x3_s2') 
             .max_pool(3, 3, 1, 1, name='inception_4a_pool') 
             .conv(1, 1, 64, 1, 1, name='inception_4a_pool_proj')) 
 
        (self.feed('inception_4a_1x1', 
                   'inception_4a_3x3', 
                   'inception_4a_5x5', 
                   'inception_4a_pool_proj') 
             .concat(3, name='inception_4a_output') 
             .conv(1, 1, 160, 1, 1, name='inception_4b_1x1')) 
 
        (self.feed('inception_4a_output') 
             .conv(1, 1, 112, 1, 1, name='inception_4b_3x3_reduce') 
             .conv(3, 3, 224, 1, 1, name='inception_4b_3x3')) 
 
        (self.feed('inception_4a_output') 
             .conv(1, 1, 24, 1, 1, name='inception_4b_5x5_reduce') 
             .conv(5, 5, 64, 1, 1, name='inception_4b_5x5')) 
 
        (self.feed('inception_4a_output') 
             .max_pool(3, 3, 1, 1, name='inception_4b_pool') 
             .conv(1, 1, 64, 1, 1, name='inception_4b_pool_proj')) 
 
        (self.feed('inception_4b_1x1', 
                   'inception_4b_3x3', 
                   'inception_4b_5x5', 
                   'inception_4b_pool_proj') 
             .concat(3, name='inception_4b_output') 
             .conv(1, 1, 128, 1, 1, name='inception_4c_1x1')) 
 
        (self.feed('inception_4b_output') 
             .conv(1, 1, 128, 1, 1, name='inception_4c_3x3_reduce') 
             .conv(3, 3, 256, 1, 1, name='inception_4c_3x3')) 
 
        (self.feed('inception_4b_output') 
             .conv(1, 1, 24, 1, 1, name='inception_4c_5x5_reduce') 
             .conv(5, 5, 64, 1, 1, name='inception_4c_5x5')) 
 
        (self.feed('inception_4b_output') 
             .max_pool(3, 3, 1, 1, name='inception_4c_pool') 
             .conv(1, 1, 64, 1, 1, name='inception_4c_pool_proj')) 
 
        (self.feed('inception_4c_1x1', 
                   'inception_4c_3x3', 
                   'inception_4c_5x5', 
                   'inception_4c_pool_proj') 
             .concat(3, name='inception_4c_output') 
             .conv(1, 1, 112, 1, 1, name='inception_4d_1x1')) 
 
        (self.feed('inception_4c_output') 
             .conv(1, 1, 144, 1, 1, name='inception_4d_3x3_reduce') 
             .conv(3, 3, 288, 1, 1, name='inception_4d_3x3')) 
 
        (self.feed('inception_4c_output') 
             .conv(1, 1, 32, 1, 1, name='inception_4d_5x5_reduce') 
             .conv(5, 5, 64, 1, 1, name='inception_4d_5x5')) 
 
        (self.feed('inception_4c_output') 
             .max_pool(3, 3, 1, 1, name='inception_4d_pool') 
             .conv(1, 1, 64, 1, 1, name='inception_4d_pool_proj')) 
 
        (self.feed('inception_4d_1x1', 
                   'inception_4d_3x3', 
                   'inception_4d_5x5', 
                   'inception_4d_pool_proj') 
             .concat(3, name='inception_4d_output') 
             .conv(1, 1, 256, 1, 1, name='inception_4e_1x1')) 
 
        (self.feed('inception_4d_output') 
             .conv(1, 1, 160, 1, 1, name='inception_4e_3x3_reduce') 
             .conv(3, 3, 320, 1, 1, name='inception_4e_3x3')) 
 
        (self.feed('inception_4d_output') 
             .conv(1, 1, 32, 1, 1, name='inception_4e_5x5_reduce') 
             .conv(5, 5, 128, 1, 1, name='inception_4e_5x5')) 
 
        (self.feed('inception_4d_output') 
             .max_pool(3, 3, 1, 1, name='inception_4e_pool') 
             .conv(1, 1, 128, 1, 1, name='inception_4e_pool_proj')) 
 
        (self.feed('inception_4e_1x1', 
                   'inception_4e_3x3', 
                   'inception_4e_5x5', 
                   'inception_4e_pool_proj') 
             .concat(3, name='inception_4e_output') 
             .max_pool(3, 3, 2, 2, name='pool4_3x3_s2') 
             .conv(1, 1, 256, 1, 1, name='inception_5a_1x1')) 
 
        (self.feed('pool4_3x3_s2') 
             .conv(1, 1, 160, 1, 1, name='inception_5a_3x3_reduce') 
             .conv(3, 3, 320, 1, 1, name='inception_5a_3x3')) 
 
        (self.feed('pool4_3x3_s2') 
             .conv(1, 1, 32, 1, 1, name='inception_5a_5x5_reduce') 
             .conv(5, 5, 128, 1, 1, name='inception_5a_5x5')) 
 
        (self.feed('pool4_3x3_s2') 
             .max_pool(3, 3, 1, 1, name='inception_5a_pool') 
             .conv(1, 1, 128, 1, 1, name='inception_5a_pool_proj')) 
 
        (self.feed('inception_5a_1x1', 
                   'inception_5a_3x3', 
                   'inception_5a_5x5', 
                   'inception_5a_pool_proj') 
             .concat(3, name='inception_5a_output') 
             .conv(1, 1, 384, 1, 1, name='inception_5b_1x1')) 
 
        (self.feed('inception_5a_output') 
             .conv(1, 1, 192, 1, 1, name='inception_5b_3x3_reduce') 
             .conv(3, 3, 384, 1, 1, name='inception_5b_3x3')) 
 
        (self.feed('inception_5a_output') 
             .conv(1, 1, 48, 1, 1, name='inception_5b_5x5_reduce') 
             .conv(5, 5, 128, 1, 1, name='inception_5b_5x5')) 
 
        (self.feed('inception_5a_output') 
             .max_pool(3, 3, 1, 1, name='inception_5b_pool') 
             .conv(1, 1, 128, 1, 1, name='inception_5b_pool_proj')) 
 
        (self.feed('inception_5b_1x1', 
                   'inception_5b_3x3', 
                   'inception_5b_5x5', 
                   'inception_5b_pool_proj') 
             .concat(3, name='inception_5b_output') 
             .avg_pool(7, 7, 1, 1, padding='VALID', name='pool5_7x7_s1') 
             .fc(1000, relu=False, name='loss3_classifier') 
             .softmax(name='prob'))

代碼在https://github.com/ethereon/caffe-tensorflow中，作者封裝了一些基本的操作，了解網絡結構之后，構造GoogLeNet很容易。之后等到新公司之后，我會試著在tflearn的基礎上寫下GoogLeNet的網絡代碼。

GoogLeNet on Tensorflow

GoogLeNet為了實現方便，我用tflearn來重寫了下，代碼中和caffe model里面不一樣的就是一些padding的位置，因為改的比較麻煩，必須保持inception部分的concat時要一致，我這里也不知道怎么修改pad的值（caffe prototxt），所以統一padding設定為same，具體代碼如下：

# -*- coding: utf-8 -*- 
 
""" GoogLeNet. 
Applying 'GoogLeNet' to Oxford's 17 Category Flower Dataset classification task. 
References: 
    - Szegedy, Christian, et al. 
    Going deeper with convolutions. 
    - 17 Category Flower Dataset. Maria-Elena Nilsback and Andrew Zisserman. 
Links: 
    - [GoogLeNet Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf) 
    - [Flower Dataset (17)](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/) 
""" 
 
from __future__ import division, print_function, absolute_import 
 
import tflearn 
from tflearn.layers.core import input_data, dropout, fully_connected 
from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d 
from tflearn.layers.normalization import local_response_normalization 
from tflearn.layers.merge_ops import merge 
from tflearn.layers.estimator import regression 
 
import tflearn.datasets.oxflower17 as oxflower17 
X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227)) 
 
 
network = input_data(shape=[None, 227, 227, 3]) 
conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation='relu', name = 'conv1_7_7_s2') 
pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2) 
pool1_3_3 = local_response_normalization(pool1_3_3) 
conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation='relu',name = 'conv2_3_3_reduce') 
conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation='relu', name='conv2_3_3') 
conv2_3_3 = local_response_normalization(conv2_3_3) 
pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name='pool2_3_3_s2') 
inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation='relu', name='inception_3a_1_1') 
inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation='relu', name='inception_3a_3_3_reduce') 
inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3,  activation='relu', name = 'inception_3a_3_3') 
inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation='relu', name ='inception_3a_5_5_reduce' ) 
inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation='relu', name= 'inception_3a_5_5') 
inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, ) 
inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation='relu', name='inception_3a_pool_1_1') 
 
# merge the inception_3a__ 
inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode='concat', axis=3) 
 
inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation='relu', name= 'inception_3b_1_1' ) 
inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation='relu', name='inception_3b_3_3_reduce') 
inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3,  activation='relu',name='inception_3b_3_3') 
inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation='relu', name = 'inception_3b_5_5_reduce') 
inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5,  name = 'inception_3b_5_5') 
inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1,  name='inception_3b_pool') 
inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation='relu', name='inception_3b_pool_1_1') 
 
#merge the inception_3b_* 
inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode='concat',axis=3,name='inception_3b_output') 
 
pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name='pool3_3_3') 
inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation='relu', name='inception_4a_1_1') 
inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation='relu', name='inception_4a_3_3_reduce') 
inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3,  activation='relu', name='inception_4a_3_3') 
inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation='relu', name='inception_4a_5_5_reduce') 
inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5,  activation='relu', name='inception_4a_5_5') 
inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1,  name='inception_4a_pool') 
inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation='relu', name='inception_4a_pool_1_1') 
 
inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode='concat', axis=3, name='inception_4a_output') 
 
 
inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation='relu', name='inception_4a_1_1') 
inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation='relu', name='inception_4b_3_3_reduce') 
inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation='relu', name='inception_4b_3_3') 
inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation='relu', name='inception_4b_5_5_reduce') 
inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5,  activation='relu', name='inception_4b_5_5') 
 
inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1,  name='inception_4b_pool') 
inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation='relu', name='inception_4b_pool_1_1') 
 
inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode='concat', axis=3, name='inception_4b_output') 
 
 
inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu',name='inception_4c_1_1') 
inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu', name='inception_4c_3_3_reduce') 
inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256,  filter_size=3, activation='relu', name='inception_4c_3_3') 
inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation='relu', name='inception_4c_5_5_reduce') 
inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64,  filter_size=5, activation='relu', name='inception_4c_5_5') 
 
inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1) 
inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation='relu', name='inception_4c_pool_1_1') 
 
inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode='concat', axis=3,name='inception_4c_output') 
 
inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation='relu', name='inception_4d_1_1') 
inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation='relu', name='inception_4d_3_3_reduce') 
inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation='relu', name='inception_4d_3_3') 
inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation='relu', name='inception_4d_5_5_reduce') 
inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5,  activation='relu', name='inception_4d_5_5') 
inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1,  name='inception_4d_pool') 
inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation='relu', name='inception_4d_pool_1_1') 
 
inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode='concat', axis=3, name='inception_4d_output') 
 
inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation='relu', name='inception_4e_1_1') 
inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation='relu', name='inception_4e_3_3_reduce') 
inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_4e_3_3') 
inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation='relu', name='inception_4e_5_5_reduce') 
inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128,  filter_size=5, activation='relu', name='inception_4e_5_5') 
inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1,  name='inception_4e_pool') 
inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation='relu', name='inception_4e_pool_1_1') 
 
 
inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode='concat') 
 
pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name='pool_3_3') 
 
 
inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation='relu', name='inception_5a_1_1') 
inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation='relu', name='inception_5a_3_3_reduce') 
inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_5a_3_3') 
inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation='relu', name='inception_5a_5_5_reduce') 
inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5,  activation='relu', name='inception_5a_5_5') 
inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1,  name='inception_5a_pool') 
inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation='relu', name='inception_5a_pool_1_1') 
 
inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode='concat') 
 
 
inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation='relu', name='inception_5b_1_1') 
inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation='relu', name='inception_5b_3_3_reduce') 
inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384,  filter_size=3,activation='relu', name='inception_5b_3_3') 
inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation='relu', name='inception_5b_5_5_reduce') 
inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5,  activation='relu', name='inception_5b_5_5' ) 
inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1,  name='inception_5b_pool') 
inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation='relu', name='inception_5b_pool_1_1') 
inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode='concat') 
 
pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1) 
pool5_7_7 = dropout(pool5_7_7, 0.4) 
loss = fully_connected(pool5_7_7, 17,activation='softmax') 
network = regression(loss, optimizer='momentum', 
                     loss='categorical_crossentropy', 
                     learning_rate=0.001) 
model = tflearn.DNN(network, checkpoint_path='model_googlenet', 
                    max_checkpoints=1, tensorboard_verbose=2) 
model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True, 
          show_metric=True, batch_size=64, snapshot_step=200, 
          snapshot_epoch=False, run_id='googlenet_oxflowers17')

大家如果感興趣，可以看看這部分的caffe model prototxt，幫忙檢查下是否有問題，代碼我已經提交到tflearn的官方庫了，add GoogLeNet(Inception) in Example，各位有tensorflow的直接安裝下tflearn，看看是否能幫忙檢查下是否有問題，我這里因為沒有GPU的機器，跑的比較慢，TensorBoard的圖如下，不像之前Alexnet那么明顯（主要還是沒有跑那么多epoch,這里在寫入的時候發現主機上沒有磁盤空間了，尷尬，然后從新寫了restore來跑的，TensorBoard的圖也貌似除了點問題，好像每次載入都不太一樣，但是從基本的log里面的東西來看，是逐步在收斂的，這里圖也貼下看看吧）