KerasのLSTMを使用した多層Seq2Seqモデル

Question

私はkerasでseq2seqモデルを作成していました。私は単層のエンコーダーとデコーダーを構築しましたが、それらは正常に機能していました。しかし今、私はそれを多層エンコーダーとデコーダーに拡張したいと思います。 Keras FunctionalAPIを使用してビルドしています。

トレーニング：-

エンコーダーのコード：-

encoder_input=Input(shape=(None,vec_dimension)) encoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(encoder_input) encoder_lstm=LSTM(vec_dimension,return_state=True)(encoder_lstm) encoder_output,encoder_h,encoder_c=encoder_lstm

デコーダーのコード：-

encoder_state=[encoder_h,encoder_c] decoder_input=Input(shape=(None,vec_dimension)) decoder_lstm= LSTM(vec_dimension,return_state=True,return_sequences=True (decoder_input,initial_state=encoder_state) decoder_lstm=LSTM(vec_dimension,return_state=True,return_sequences=True)(decoder_lstm) decoder_output,_,_=decoder_lstm

検査用の：-

encoder_model=Model(inputs=encoder_input,outputs=encoder_state) decoder_state_input_h=Input(shape=(None,vec_dimension)) decoder_state_input_c=Input(shape=(None,vec_dimension)) decoder_states_input=[decoder_state_input_h,decoder_state_input_c] decoder_output,decoder_state_h,decoder_state_c =decoder_lstm #(decoder_input,initial_state=decoder_states_input) decoder_states=[decoder_state_h,decoder_state_c] decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)

今、私が番号を増やそうとすると。トレーニング用のデコーダー内のレイヤーの数は、トレーニングは正常に機能しますが、テスト用には機能せず、エラーが発生します。

実際の問題は、それをマルチレイヤーにするときに、initial_stateを最後に指定されていた中間レイヤーにシフトしたことです。したがって、テスト中に呼び出すと、エラーがスローされます。

RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_64:0", shape=(?, ?, 150), dtype=float32) at layer "input_64".The following previous layers were accessed without issue: []

エラーをスローしないように、入力レイヤー用のinitial_state=decoder_states_inputを渡すにはどうすればよいですか。最初の入力レイヤーのエンドレイヤーでinitial_state=decoder_states_inputを渡すにはどうすればよいですか？

編集：-

そのコードでは、デコーダーLSTMの複数のレイヤーを作成しようとしました。しかし、それはエラーを与えています。単一レイヤーで作業する場合正しいコードは次のとおりです：-

エンコーダー（トレーニング）：-

encoder_input=Input(shape=(None,vec_dimension)) encoder_lstm =LSTM(vec_dimension,return_state=True)(encoder_input) encoder_output,encoder_h,encoder_c=encoder_lstm

デコーダー（トレーニング）：-

encoder_state=[encoder_h,encoder_c] decoder_input=Input(shape=(None,vec_dimension)) decoder_lstm= LSTM(vec_dimension, return_state=True, return_sequences=True) decoder_output,_,_=decoder_lstm(decoder_input,initial_state=encoder_state)

デコーダー（テスト）

decoder_output,decoder_state_h,decoder_state_c=decoder_lstm( decoder_input, initial_state=decoder_states_input) decoder_states=[decoder_state_h,decoder_state_c] decoder_output,decoder_state_h,decoder_state_c=decoder_lstm (decoder_input,initial_state=decoder_states_input) decoder_model=Model(inputs=[decoder_input]+decoder_states_input,outputs=[decoder_output]+decoder_states)

Jeremy Wortz · Answer

編集-KerasとRNNで機能的なAPIモデルを使用するように更新されました

from keras.models import Model from keras.layers import Input, LSTM, Dense, RNN layers = [256,128] # we loop LSTMCells then wrap them in an RNN layer encoder_inputs = Input(shape=(None, num_encoder_tokens)) e_outputs, h1, c1 = LSTM(latent_dim, return_state=True, return_sequences=True)(encoder_inputs) _, h2, c2 = LSTM(latent_dim, return_state=True)(e_outputs) encoder_states = [h1, c1, h2, c2] decoder_inputs = Input(shape=(None, num_decoder_tokens)) out_layer1 = LSTM(latent_dim, return_sequences=True, return_state=True) d_outputs, dh1, dc1 = out_layer1(decoder_inputs,initial_state= [h1, c1]) out_layer2 = LSTM(latent_dim, return_sequences=True, return_state=True) final, dh2, dc2 = out_layer2(d_outputs, initial_state= [h2, c2]) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(final) model = Model([encoder_inputs, decoder_inputs], decoder_outputs) model.summary()

そして、これが推論の設定です：

encoder_model = Model(encoder_inputs, encoder_states) decoder_state_input_h = Input(shape=(latent_dim,)) decoder_state_input_c = Input(shape=(latent_dim,)) decoder_state_input_h1 = Input(shape=(latent_dim,)) decoder_state_input_c1 = Input(shape=(latent_dim,)) decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c, decoder_state_input_h1, decoder_state_input_c1] d_o, state_h, state_c = out_layer1( decoder_inputs, initial_state=decoder_states_inputs[:2]) d_o, state_h1, state_c1 = out_layer2( d_o, initial_state=decoder_states_inputs[-2:]) decoder_states = [state_h, state_c, state_h1, state_c1] decoder_outputs = decoder_dense(d_o) decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states) decoder_model.summary()

最後に、Keras seq2seqの例に従っている場合は、管理する必要のある非表示の状態が複数あるのに対し、単層の例では2つしかないため、予測スクリプトを変更する必要があります。レイヤーの非表示状態の数は2倍になります

# Reverse-lookup token index to decode sequences back to # something readable. reverse_input_char_index = dict( (i, char) for char, i in input_token_index.items()) reverse_target_char_index = dict( (i, char) for char, i in target_token_index.items()) def decode_sequence(input_seq): # Encode the input as state vectors. states_value = encoder_model.predict(input_seq) # Generate empty target sequence of length 1. target_seq = np.zeros((1, 1, num_decoder_tokens)) # Populate the first character of target sequence with the start character. target_seq[0, 0, target_token_index['	']] = 1. # Sampling loop for a batch of sequences # (to simplify, here we assume a batch of size 1). stop_condition = False decoded_sentence = '' while not stop_condition: output_tokens, h, c, h1, c1 = decoder_model.predict( [target_seq] + states_value) #######NOTICE THE ADDITIONAL HIDDEN STATES # Sample a token sampled_token_index = np.argmax(output_tokens[0, -1, :]) sampled_char = reverse_target_char_index[sampled_token_index] decoded_sentence += sampled_char # Exit condition: either hit max length # or find stop character. if (sampled_char == '
' or len(decoded_sentence) > max_decoder_seq_length): stop_condition = True # Update the target sequence (of length 1). target_seq = np.zeros((1, 1, num_decoder_tokens)) target_seq[0, 0, sampled_token_index] = 1. # Update states states_value = [h, c, h1, c1]#######NOTICE THE ADDITIONAL HIDDEN STATES return decoded_sentence for seq_index in range(100): # Take one sequence (part of the training set) # for trying out decoding. input_seq = encoder_input_data[seq_index: seq_index + 1] decoded_sentence = decode_sequence(input_seq) print('-') print('Input sentence:', input_texts[seq_index]) print('Target sentence:', target_texts[seq_index]) print('Decoded sentence:', decoded_sentence)

Sam Ragusa · Answer

Jeremy Wortzのawesome回答を一般化して、リストからモデルを作成しました。「latent_dims」は、固定の2つの深さではなく、「len（latent_dims）」の深さになります。

'latent_dims'宣言から開始します。

# latent_dims is an array which defines the depth of the encoder/decoder, as well as how large # the layers should be. So an array of sizes [a,b,c] would produce a depth-3 encoder and decoder # with layer sizes equal to [a,b,c] and [c,b,a] respectively. latent_dims = [1024, 512, 256]

トレーニング用のモデルの作成：

# Define an input sequence and process it by going through a len(latent_dims)-layer deep encoder encoder_inputs = Input(shape=(None, num_encoder_tokens)) outputs = encoder_inputs encoder_states = [] for j in range(len(latent_dims))[::-1]: outputs, h, c = LSTM(latent_dims[j], return_state=True, return_sequences=bool(j))(outputs) encoder_states += [h, c] # Set up the decoder, setting the initial state of each layer to the state of the layer in the encoder # which is it's mirror (so for encoder: a->b->c, you'd have decoder initial states: c->b->a). decoder_inputs = Input(shape=(None, num_decoder_tokens)) outputs = decoder_inputs output_layers = [] for j in range(len(latent_dims)): output_layers.append( LSTM(latent_dims[len(latent_dims) - j - 1], return_sequences=True, return_state=True) ) outputs, dh, dc = output_layers[-1](outputs, initial_state=encoder_states[2*j:2*(j+1)]) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(outputs) # Define the model that will turn # `encoder_input_data` & `decoder_input_data` into `decoder_target_data` model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

推論のためにそれは次の通りです：

# Define sampling models (modified for n-layer deep network) encoder_model = Model(encoder_inputs, encoder_states) d_outputs = decoder_inputs decoder_states_inputs = [] decoder_states = [] for j in range(len(latent_dims))[::-1]: current_state_inputs = [Input(shape=(latent_dims[j],)) for _ in range(2)] temp = output_layers[len(latent_dims)-j-1](d_outputs, initial_state=current_state_inputs) d_outputs, cur_states = temp[0], temp[1:] decoder_states += cur_states decoder_states_inputs += current_state_inputs decoder_outputs = decoder_dense(d_outputs) decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

そして最後に、Jeremy Wortzの「decode_sequence」関数にいくつかの変更を加えて、次のようにします。

def decode_sequence(input_seq, encoder_model, decoder_model): # Encode the input as state vectors. states_value = encoder_model.predict(input_seq) # Generate empty target sequence of length 1. target_seq = np.zeros((1, 1, num_decoder_tokens)) # Populate the first character of target sequence with the start character. target_seq[0, 0, target_token_index['	']] = 1. # Sampling loop for a batch of sequences # (to simplify, here we assume a batch of size 1). stop_condition = False decoded_sentence = [] #Creating a list then using "".join() is usually much faster for string creation while not stop_condition: to_split = decoder_model.predict([target_seq] + states_value) output_tokens, states_value = to_split[0], to_split[1:] # Sample a token sampled_token_index = np.argmax(output_tokens[0, 0]) sampled_char = reverse_target_char_index[sampled_token_index] decoded_sentence.append(sampled_char) # Exit condition: either hit max length # or find stop character. if sampled_char == '
' or len(decoded_sentence) > max_decoder_seq_length: stop_condition = True # Update the target sequence (of length 1). target_seq = np.zeros((1, 1, num_decoder_tokens)) target_seq[0, 0, sampled_token_index] = 1. return "".join(decoded_sentence)