畳み込みニューラルネットワークにおける1D、2D、および3D畳み込みの直感的理解

Question

例を使ってCNN（Deep Learning）の1D、2D、および3D畳み込みの違いを誰かが明確に説明できますか？

runhani · Answer

C3D から絵で説明したい。

一言で言えば、畳み込み方向＆出力形状が重要です。

↑↑↑↑↑1次元畳み込み - 基本↑↑↑↑↑

ちょうど1 - convを計算する方向（時間軸）
入力= [W]、フィルター= [k]、出力= [W]
例）入力= [1,1,1,1,1]、フィルター= [0.25,0.5,0.25]、出力= [1,1,1,1,1]
出力形状は1次元配列です
例）グラフの平滑化

tf.nn.conv1dコードToy例

import tensorflow as tf import numpy as np sess = tf.Session() ones_1d = np.ones(5) weight_1d = np.ones(3) strides_1d = 1 in_1d = tf.constant(ones_1d, dtype=tf.float32) filter_1d = tf.constant(weight_1d, dtype=tf.float32) in_width = int(in_1d.shape[0]) filter_width = int(filter_1d.shape[0]) input_1d = tf.reshape(in_1d, [1, in_width, 1]) kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]) output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME')) print sess.run(output_1d)

↑↑↑↑↑2D畳み込み - 基本↑↑↑↑↑

2 - convを計算する方向（x、y）
出力形状は2D行列
入力= [W、H]、フィルター= [k、k]出力= [W、H]
例） Sobel Egde Fllter

tf.nn.conv2d - おもちゃの例

ones_2d = np.ones((5,5)) weight_2d = np.ones((3,3)) strides_2d = [1, 1, 1, 1] in_2d = tf.constant(ones_2d, dtype=tf.float32) filter_2d = tf.constant(weight_2d, dtype=tf.float32) in_width = int(in_2d.shape[0]) in_height = int(in_2d.shape[1]) filter_width = int(filter_2d.shape[0]) filter_height = int(filter_2d.shape[1]) input_2d = tf.reshape(in_2d, [1, in_height, in_width, 1]) kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME')) print sess.run(output_2d)

↑↑↑↑↑3D畳み込み - 基本↑↑↑↑↑

- convateを計算する方向（x、y、z）
出力形状はD音量です
入力= [W、H、L]、フィルター= [k、k、d]出力= [W、H、M]
d <Lが重要です！ボリューム出力する
例）C3D

tf.nn.conv3d - おもちゃの例

ones_3d = np.ones((5,5,5)) weight_3d = np.ones((3,3,3)) strides_3d = [1, 1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) in_depth = int(in_3d.shape[2]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) filter_depth = int(filter_3d.shape[2]) input_3d = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1]) kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1]) output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME')) print sess.run(output_3d)

↑↑↑↑↑3D入力による2D畳み込み - LeNet、VGG、...、↑↑↑↑↑

イベントハフ入力は3Dの場合）224×224×3、112×112×32
output-shapeはD Volumeではなく--- 2D Matrix
フィルタの深さ= Lは入力チャンネルと一致する必要があるため、= L
2 - convを計算する方向（x、y）！ 3Dではない
入力= [W、H、L]、フィルター= [k、k、L]出力= [W、H]
出力形状は2D行列
n個のフィルタをトレーニングしたい場合（Nはフィルタの数）
それから出力形状は（積み上げ2D）D = 2D x N matrixです。

conv2d - LeNet、VGG、... 1フィルター用

in_channels = 32 # 3 for RGB, 32, 64, 128, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae with in_channels weight_3d = np.ones((3,3,in_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME')) print sess.run(output_2d)

conv2d - LeNet、VGG、... N個のフィルタ

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME') print sess.run(output_3d)

↑↑↑↑↑ボーナス1x1コンバージョンCNN - GoogLeNet、...、↑↑↑↑↑

あなたがこれをsobelのような2D画像フィルタと考えると1x1 convは混乱します
cNNの1x1変換では、入力は上の写真のように3D形状です。
それは深さ方向のフィルタリングを計算します
入力= [W、H、L]、フィルタ= [1,1、L]出力= [W、H]
出力積層形状は、D = 2D x N行列です。

tf.nn.conv2d - 特別な場合1x1の変換

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME') print sess.run(output_3d)

アニメーション（3D入力による2D変換）

- 元のリンク： LINK
- 著者：MartinGörner
- ツイッター：@martin_gorner
- Google +：plus.google.com/+MartinGorne

2D入力によるボーナス1D畳み込み

↑↑↑↑↑1次元入力の1次元畳み込み↑↑↑↑↑

↑↑↑↑↑2D入力による1D畳み込み↑↑↑↑↑

イベントハフ入力は2 D ex）20 x 14
出力形状は2Dではなく1D行列
フィルタの高さ= Lは入力の高さ= Lと一致する必要があるため
1 - convateを計算するための方向（x） 2Dではありません
入力= [W、L]、フィルター= [k、L]出力= [W]
出力形状は1D行列
n個のフィルタをトレーニングしたい場合（Nはフィルタの数）
それから出力形状は（積み重ねられた1D）2D = 1D x N行列です。

ボーナスC3D

in_channels = 32 # 3, 32, 64, 128, ... out_channels = 64 # 3, 32, 64, 128, ... ones_4d = np.ones((5,5,5,in_channels)) weight_5d = np.ones((3,3,3,in_channels,out_channels)) strides_3d = [1, 1, 1, 1, 1] in_4d = tf.constant(ones_4d, dtype=tf.float32) filter_5d = tf.constant(weight_5d, dtype=tf.float32) in_width = int(in_4d.shape[0]) in_height = int(in_4d.shape[1]) in_depth = int(in_4d.shape[2]) filter_width = int(filter_5d.shape[0]) filter_height = int(filter_5d.shape[1]) filter_depth = int(filter_5d.shape[2]) input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_depth, in_channels]) kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels]) output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME') print sess.run(output_4d) sess.close()

テンソル流の入出力

概要

Jerry Liu · Answer

CNN 1D、2D、または3Dは、入力またはフィルターの次元ではなく、畳み込みの方向を指します。
1チャネル入力の場合、CNN2DはCNN1Dに等しく、カーネル長=入力長です。（1コンバージョン方向）