Keras分類-オブジェクト検出

Question

KerasとPythonを使用した分類とオブジェクト検出に取り組んでいます。猫/犬を80％以上の精度で分類しましたが、現在の結果では大丈夫です。私の質問は、入力画像から猫または犬をどのように検出するのですか？私は完全に混乱しています。私はインターネットから事前に訓練されたものではなく、自分の身長を使いたいです。

現在、私のコードは次のとおりです。

from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Convolution2D, MaxPooling2D from keras.layers import Activation, Dropout, Flatten, Dense import numpy as np import matplotlib.pyplot as plt import matplotlib from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img ######################################################################################################### #VALUES # dimensions of our images. img_width, img_height = 150, 150 train_data_dir = 'data/train' validation_data_dir = 'data/validation' nb_train_samples = 2000 #1000 cats/dogs nb_validation_samples = 800 #400cats/dogs nb_Epoch = 50 ######################################################################################################### #MODEL model = Sequential() model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(32, 3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Convolution2D(64, 3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) # this is the augmentation configuration we will use for training train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) ########################################################################################################## #TEST AUGMENTATION img = load_img('data/train/cats/cat.0.jpg') # this is a PIL image x = img_to_array(img) # this is a Numpy array with shape (3, 150, 150) x = x.reshape((1,) + x.shape) # this is a Numpy array with shape (1, 3, 150, 150) # the .flow() command below generates batches of randomly transformed images # and saves the results to the `preview/` directory i = 0 for batch in train_datagen.flow(x, batch_size=1, save_to_dir='data/TEST AUGMENTATION', save_prefix='cat', save_format='jpeg'): i += 1 if i > 20: break # otherwise the generator would loop indefinitely ########################################################################################################## # this is the augmentation configuration we will use for testing: # only rescaling test_datagen = ImageDataGenerator(rescale=1./255) #PREPARE TRAINING DATA train_generator = train_datagen.flow_from_directory( train_data_dir, #data/train target_size=(img_width, img_height), #RESIZE to 150/150 batch_size=32, class_mode='binary') #since we are using binarycrosentropy need binary labels #PREPARE VALIDATION DATA validation_generator = test_datagen.flow_from_directory( validation_data_dir, #data/validation target_size=(img_width, img_height), #RESIZE 150/150 batch_size=32, class_mode='binary') #START model.fit history =model.fit_generator( train_generator, #train data samples_per_Epoch=nb_train_samples, nb_Epoch=nb_Epoch, validation_data=validation_generator, #validation data nb_val_samples=nb_validation_samples) model.save_weights('savedweights.h5') # list all data in history print(history.history.keys()) #ACC VS VAL_ACC plt.plot(history.history['acc']) plt.plot(history.history['val_acc']) plt.title('model accuracy ACC VS VAL_ACC') plt.ylabel('accuracy') plt.xlabel('Epoch') plt.legend(['train', 'test'], loc='upper left') plt.show() # summarize history for loss #LOSS VS VAL_LOSS plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss LOSS vs VAL_LOSS') plt.ylabel('loss') plt.xlabel('Epoch') plt.legend(['train', 'test'], loc='upper left') plt.show() model.load_weights('first_try.h5')

だから今、私は猫と犬を分類したので、画像を入力し、境界ボックスでその中の猫または犬を見つけるためにどのようにそして何をする必要がありますか？私はこれに完全に新しいので、これに正しい方法で取り組んでいるかどうかさえわかりませんか？ありがとうございました。

[〜＃〜] update [〜＃〜]やあ150,150シェイプにエラーが発生するため、イメージをインポートして、1,3,150,150シェイプに再形成しています。

Exception: Error when checking : expected convolution2d_input_1 to have 4 dimensions, but got array with shape (150L, 150L)

画像のインポート：

#load test image img=load_img('data/prediction/cat.155.jpg') #reshape to 1,3,150,150 img = np.arange(1* 150 * 150).reshape((1,3,150, 150)) #check shape print(img.shape)

次に、def predict_function（x）を次のように変更しました。

def predict_function(x): # example of prediction function for simplicity, you # should probably use `return model.predict(x)` # random.seed(x[0][0]) # return random.random() return model.predict(img)

今実行すると：

best_box = get_best_bounding_box(img, predict_function) print('best bounding box %r' % (best_box, ))

最適な境界ボックスとして出力が得られます：なし

だから私はちょうど走った：

model.predict(img)

そして、以下を入手してください：

model.predict(img) Out[54]: array([[ 0.]], dtype=float32)

だから、猫か犬かはまったくチェックしていません...何かアイデアはありますか？

注：def predict）function（x）が使用している場合：

random.seed(x[0][0]) return random.random()

私は出力を取得し、チェックボックスをオンにして、最高のものを提供します。

ShmulikA · Accepted Answer

構築した機械学習モデルと達成しようとしているタスクは同じではありません。モデルは分類タスクの解決を試みますが、目標は画像内のオブジェクトを検出することであり、これはオブジェクト検出タスクです。

分類にはブール型の質問があり、検出質問には3つ以上の回答があります。

何ができる？

私はあなたに試す3つの可能性を提案できます：

1.モデルと組み合わせたスライディングウィンドウを使用する

定義されたサイズ（20X20から160X160など）のボックスを切り取り、スライディングウィンドウを使用各ウィンドウについて、犬の確率を予測し、最後に予測した最大ウィンドウを取得します。

これにより、バウンディングボックスの複数の候補が生成され、得られた最高の確率を使用してバウンディングボックスを選択します。

数百以上のサンプルを予測する必要があるため、これは遅いかもしれません。

別のオプションは [〜＃〜] rcnn [〜＃〜] （ another link ）または Faster-RCNN ネットワークの実装を試みることですネットワーク。これらのネットワークは、基本的に、使用する境界ボックスウィンドウの候補の数を減らしています。

更新-スライディングウィンドウの計算例

次のコードは、スライディングウィンドウアルゴリズムの実行方法を示しています。パラメータを変更できます。

import random import numpy as np WINDOW_SIZES = [i for i in range(20, 160, 20)] def get_best_bounding_box(img, predict_fn, step=10, window_sizes=WINDOW_SIZES): best_box = None best_box_prob = -np.inf # loop window sizes: 20x20, 30x30, 40x40...160x160 for win_size in window_sizes: for top in range(0, img.shape[0] - win_size + 1, step): for left in range(0, img.shape[1] - win_size + 1, step): # compute the (top, left, bottom, right) of the bounding box box = (top, left, top + win_size, left + win_size) # crop the original image cropped_img = img[box[0]:box[2], box[1]:box[3]] # predict how likely this cropped image is dog and if higher # than best save it print('predicting for box %r' % (box, )) box_prob = predict_fn(cropped_img) if box_prob > best_box_prob: best_box = box best_box_prob = box_prob return best_box def predict_function(x): # example of prediction function for simplicity, you # should probably use `return model.predict(x)` random.seed(x[0][0]) return random.random() # dummy array of 256X256 img = np.arange(256 * 256).reshape((256, 256)) best_box = get_best_bounding_box(img, predict_function) print('best bounding box %r' % (best_box, ))

出力例：

predicting for box (0, 0, 20, 20) predicting for box (0, 10, 20, 30) predicting for box (0, 20, 20, 40) ... predicting for box (110, 100, 250, 240) predicting for box (110, 110, 250, 250) best bounding box (140, 80, 160, 100)

2.オブジェクト検出タスクのために新しいネットワークをトレーニングする

20個のクラスを含む Pascalデータセット （ここの例）それらの2つは猫と犬です。

データセットには、オブジェクトの位置がYターゲットとして含まれています。

3.このタスクに既存のネットワークを使用する

最後になりましたが、既存のネットワークを再利用したり、特定のタスクに対して「知識の転送」（ここではケラスの例）を行うこともできます。

以下を見てください convnets-keras lib。

最適な方法を選択して、結果を更新してください。