fit_generatorの精度がKerasのevaluate_generatorの精度と異なるのはなぜですか？

Question

私がすること：

Keras fit_generator()を使用して、事前トレーニング済みのCNNをトレーニングしています。これにより、各エポックの後に評価メトリック（_loss, acc, val_loss, val_acc_）が生成されます。モデルをトレーニングした後、evaluate_generator()を使用して評価メトリック（_loss, acc_）を生成します。

私が期待すること：

1つのエポックについてモデルをトレーニングする場合、fit_generator()とevaluate_generator()で取得されるメトリックは同じであると期待します。どちらもデータセット全体に基づいて指標を導出する必要があります。

私が観察するもの：

lossとaccはどちらもfit_generator()およびevaluate_generator()とは異なります：

わからないこと：

fit_generator()の精度がevaluate_generator()の精度と異なる理由

私のコード：

_def generate_data(path, imagesize, nBatches): datagen = ImageDataGenerator(rescale=1./255) generator = datagen.flow_from_directory\ (directory=path, # path to the target directory target_size=(imagesize,imagesize), # dimensions to which all images found will be resize color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels classes=None, # optional list of class subdirectories class_mode='categorical', # type of label arrays that are returned batch_size=nBatches, # size of the batches of data shuffle=True) # whether to shuffle the data return generator _

[...]

_def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath): history = model.fit_generator(generator=trainGenerator, steps_per_Epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples) epochs=nEpochs, # number of epochs to train the model verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per Epoch callbacks=None, # keras.callbacks.Callback instances to apply during training validation_data=valGenerator, # generator or Tuple on which to evaluate the loss and any model metrics at the end of each Epoch validation_steps= valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every Epoch class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function max_queue_size=10, # maximum size for the generator queue workers=32, # maximum number of processes to spin up when using process-based threading use_multiprocessing=True, # whether to use process-based threading shuffle=False, # whether to shuffle the order of the batches at the beginning of each Epoch initial_Epoch=0) # Epoch at which to start training print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S')) # Save model modelPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelArchitecture.h5') weightsPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelWeights.h5') model.save(modelPath) model.save_weights(weightsPath) print("%s: Model saved." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S')) return history, model _

[...]

_def evaluate_model(model, generator): score = model.evaluate_generator(generator=generator, # Generator yielding tuples steps= generator.samples//nBatches) # number of steps (batches of samples) to yield from generator before stopping print("%s: Model evaluated:" "
						 Loss: %.3f" "
						 Accuracy: %.3f" % (datetime.now().strftime('%Y-%m-%d_%H-%M-%S'), score[0], score[1])) _

[...]

_def main(): # Create model modelUntrained = create_model(imagesize, nBands, nClasses) # Prepare training and validation data trainGenerator = generate_data(imagePathTraining, imagesize, nBatches) valGenerator = generate_data(imagePathValidation, imagesize, nBatches) # Train and save model history, modelTrained = train_model(modelUntrained, nBatches, nEpochs, trainGenerator, valGenerator, resultPath) # Evaluate on validation data print("%s: Model evaluation (valX, valY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S')) evaluate_model(modelTrained, valGenerator) # Evaluate on training data print("%s: Model evaluation (trainX, trainY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S')) evaluate_model(modelTrained, trainGenerator) _

更新

この問題について報告しているサイトをいくつか見つけました。

これまでに成功しなかった提案された解決策のいくつかを試してみました。 accおよびlossは、トレーニングおよび検証に同じジェネレーターで生成されたまったく同じデータを使用する場合でも、fit_generator()およびevaluate_generator()とは異なります。これが私が試したものです：

スクリプト全体または学習済みのレイヤーに新しいレイヤーを追加する前に学習フェーズを静的に設定する

_ K.set_learning_phase(0) # testing K.set_learning_phase(1) # training _

事前トレーニング済みのモデルからすべてのバッチ正規化レイヤーをフリーズ解除

_ for i in range(len(model.layers)): if str.startswith(model.layers[i].name, 'bn'): model.layers[i].trainable=True _

ドロップアウトまたはバッチ正規化をトレーニングされていないレイヤーとして追加しない

_ # Create pre-trained base model basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model weights='imagenet', # pre-training on ImageNet input_tensor=None, # optional tensor to use as image input for the model input_shape=(imagesize, # shape Tuple imagesize, nBands), pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer classes=nClasses) # number of classes to classify images into # Create new untrained layers x = basemodel.output x = GlobalAveragePooling2D()(x) # global spatial average pooling layer x = Dense(1024, activation='relu')(x) # fully-connected layer y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1 # Create model combining pre-trained base model and new untrained layers model = Model(inputs=basemodel.input, outputs=y) # Freeze weights on pre-trained layers for layer in basemodel.layers: layer.trainable = False # Define learning optimizer learningRate = 0.01 optimizerSGD = optimizers.SGD(lr=learningRate, # learning rate. momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations decay=learningRate/nEpochs, # learning rate decay over each update nesterov=True) # whether to apply Nesterov momentum # Compile model model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer loss='categorical_crossentropy', # objective function metrics=['accuracy'], # metrics to be evaluated by the model during training and testing loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs sample_weight_mode=None, # sample-wise weights weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing target_tensors=None) # tensor model's target, which will be fed with the target data during training _

事前トレーニング済みのさまざまなCNNをベースモデルとして使用（VGG19、InceptionV3、InceptionResNetV2、Xception）

_ from keras.applications.vgg19 import VGG19 basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model weights='imagenet', # pre-training on ImageNet input_tensor=None, # optional tensor to use as image input for the model input_shape=(imagesize, # shape Tuple imagesize, nBands), pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer classes=nClasses) # number of classes to classify images into _

他に解決策がない場合は、お知らせください。

Anakin · Answer

この場合、1つのエポックのトレーニングでは十分な情報が得られない可能性があります。また、ランダムなシードをflow_from_directoryメソッドに設定していないため、トレーニングデータとテストデータは完全に同じではない可能性があります。ご覧くださいこちら。

たぶん、シードを設定し、（もしあれば）拡張を削除し、トレーニングされたモデルの重みを保存して、後でロードして確認することができます。

Ken · Answer

use_multiprocessing=Falseをfit_generatorレベルに設定すると、問題が修正されますが、トレーニングが大幅に遅くなります。以下のコードはkerasのuse_multiprocessing=False関数から変更されているため、検証ジェネレーターのみにfit_generatorを設定することをお勧めしますが、それでも不完全です。

... try: if do_validation: if val_gen and workers > 0: # Create an Enqueuer that can be reused val_data = validation_data if isinstance(val_data, Sequence): val_enqueuer = OrderedEnqueuer(val_data, **use_multiprocessing=False**) validation_steps = len(val_data) else: val_enqueuer = GeneratorEnqueuer(val_data, **use_multiprocessing=False**) val_enqueuer.start(workers=workers, max_queue_size=max_queue_size) val_enqueuer_gen = val_enqueuer.get() ...