Apacheの再起動を引き起こすElastic Beanstalkログのローテーション

Question

私はすでに AWS Elastic Beanstalk-Apacheは常に再起動しています

エラスティックbeanstalkインスタンスは、error_logで次のメッセージを報告しています。

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f564cced560 ***

エラーシーケンスは次のようになります。

[Tue Jun 27 00:01:01.215260 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] mod_wsgi (pid=6429): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'. [Tue Jun 27 00:01:01.215320 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] OSError: failed to write data [Tue Jun 27 00:01:01.222407 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] mod_wsgi (pid=6430): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'. [Tue Jun 27 00:01:01.222460 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] OSError: failed to write data [Tue Jun 27 00:01:04.554810 2017] [core:warn] [pid 8595] AH00045: child process 7614 still did not exit, sending a SIGTERM [Tue Jun 27 00:01:04.554850 2017] [core:warn] [pid 8595] AH00045: child process 7615 still did not exit, sending a SIGTERM [Tue Jun 27 00:01:05.555958 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f5640cae900 *** *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f78649b7970 ***

これは、ほぼ毎時間続きます。一般的なメッセージは次のとおりです。

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart

mpm_prefork module confブロックを探しました... 1つもないので、すべてのデフォルトが使用されています。

elastic beanstalkによって実行されているlogrotationコマンドを探しました

/var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

かなり標準的なもの。 reloadについての私の理解は、正常な再起動を試みることです...

Sudo apachectl -k restartを実行することでエラーメッセージを手動でトリガーできますが、ログローテーション中にこれが実行される場所が見つかりません。

このサーバーが接続をすべて切断した時点で例外をスローしているように見えるダウンストリームサービスがあります。

だから私の質問は、SIGHUPの間にmpm_prefork内でlogrotateを引き起こしている可能性があるのは他に何ですか？私の知る限り、これはエラー状態以外では発生しないはずです。

Apache/2.4.18（Amazon）mod_wsgi/3.5 Python/3.4.3

saaj · Answer

簡単に言うと、現在のElastic Beanstalkログローテーション構成が壊れているため、サービスのダウンタイムが発生しているようです504ゲートウェイタイムアウト。見てみましょう。

再生

最も単純なPython WSGIアプリケーションを作成します。

application.py

import time def application(environ, start_response): # somewhat realistic response duration time.sleep(0.5) status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello world!
']

application.Zipに圧縮します。次に、Elastic Beanstalk Pythonアプリケーションと環境を作成し、アーカイブをアップロードします。所有しているキーペアを使用していることを確認してください。他の設定はデフォルトのままにします。完了するまで待ちます（数分）。

基礎となるEC2インスタンスにssh（EBのログのインスタンス識別子を参照）。タイプ（httpdのlogrotateポストアクション、以下を参照）：

Sudo /sbin/service httpd reload

次に、マシンで次のコマンドを実行します。

siege -v -b -c 10 -t 10S http://your-test-eb.you-aws-region.elasticbeanstalk.com/

そして、実行中に、reloadコマンドを数回繰り返します。

次に、次のようなものが表示されることが期待されます。

** SIEGE 3.0.8 ** Preparing 10 concurrent users for battle. The server is now under siege... HTTP/1.1 200 0.63 secs: 13 bytes ==> GET / HTTP/1.1 200 0.65 secs: 13 bytes ==> GET / HTTP/1.1 200 0.64 secs: 13 bytes ==> GET / HTTP/1.1 200 0.60 secs: 13 bytes ==> GET / ...

reloadを実行すると次のようになります。

HTTP/1.1 504 0.06 secs: 0 bytes ==> GET / HTTP/1.1 504 0.07 secs: 0 bytes ==> GET / HTTP/1.1 504 0.08 secs: 0 bytes ==> GET / HTTP/1.1 504 0.10 secs: 0 bytes ==> GET / HTTP/1.1 504 0.11 secs: 0 bytes ==> GET / HTTP/1.1 504 0.66 secs: 0 bytes ==> GET / HTTP/1.1 504 0.19 secs: 0 bytes ==> GET / HTTP/1.1 504 0.20 secs: 0 bytes ==> GET / HTTP/1.1 504 0.09 secs: 0 bytes ==> GET /

その後、回復します。

HTTP/1.1 200 1.25 secs: 13 bytes ==> GET / HTTP/1.1 200 1.24 secs: 13 bytes ==> GET / HTTP/1.1 200 1.26 secs: 13 bytes ==> GET / ... Lifting the server siege.. done. Transactions: 75 hits Availability: 81.52 % Elapsed time: 9.40 secs Data transferred: 0.00 MB Response time: 1.21 secs Transaction rate: 7.98 trans/sec Throughput: 0.00 MB/sec Concurrency: 9.68 Successful transactions: 75 Failed transactions: 17 Longest transaction: 4.27 Shortest transaction: 0.06

ELBは問題に影響を与えていないようであり、同じことは基盤となるEC2への2つのSSHセッションで再現できます（Amazon AMIにはsiegeがありません）。

ab -v 4 -c 10 -t 10 http://your-test-eb.you-aws-region.elasticbeanstalk.com/

原因

/ etc/cron.hourly/cron.logrotate.elasticbeanstalk.httpd.conf

#!/bin/sh test -x /usr/sbin/logrotate || exit 0 /usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/ etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

postrotateに注意してください。 /sbin/serviceは、/etc/init.d/のスクリプト用のSystemVラッパーです。そのmanページには次のように書かれています。

serviceは、できるだけ予測可能な環境でSystem V initスクリプトを実行し、ほとんどの環境変数を削除して、現在の作業ディレクトリを/に設定します。

reloadは標準のApacheではないことに注意してくださいメンテナンスコマンド。これは、ディストリビューションのダウンストリーム追加です。 initスクリプト/ etc/init.d/httpdを見てみましょう。関連する部分は次のとおりです。

reload() { echo -n $"Reloading $prog: " check13 || exit 1 killproc -p ${pidfile} $httpd -HUP RETVAL=$? echo }

ご覧のとおり、HUPシグナルをApacheに送信します。これは、今すぐ再起動：として解釈されます。

HUPまたはrestartシグナルを親に送信すると、TERMのように子が強制終了されますが、親は終了しません。構成ファイルを再度読み取り、ログファイルを再度開きます。次に、新しい子のセットを生成し、ヒットを提供し続けます。

TERMは504をかなりよく説明しています。しかし、おそらくそれがどのように行われるべきだったかはグレースフルリスタートです。ログも再度開きますが、提供されているリクエストは終了しません。

USR1またはgracefulシグナルにより、親プロセスは、現在の要求の後に終了するように（または、何も提供していない場合はすぐに終了するように）子にアドバイスします。親は構成ファイルを再度読み取り、ログファイルを再度開きます。各子が消滅すると、親はそれを新世代の構成の子に置き換え、新しい要求の処理をすぐに開始します。

...

このコードは、サーバーが新しいリクエストを処理できない時間（オペレーティングシステムによってキューに入れられるため、イベントで失われることはありません）を最小限に抑え、チューニングパラメーターを尊重するように作成されました。

Workaround

.ebextensions を使用して/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.confを置き換えることができます。ルートディレクトリで、。ebextensions/10_logs.configを次の内容で作成します（基本的に「reload」を「graceful」に置き換えます）。

files: "/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf": mode: "000644" owner: root group: root content: | /var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd graceful > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

そして、Elastic Beanstalk環境を再デプロイします。ただし、後続の1秒未満の正常な再起動では、（散発的に）503 Service Unavailableを生成できましたが、ログには当てはまりません。等間隔のグレースフルリスタートと同様に、ローテーションでエラーは発生しませんでした。