web-dev-qa-db-ja.com

3が原因でGaxエラーRPCが失敗したため、モニタリングサービスにエクスポートできません

Java App Engineにアプリケーションがあり、最近次のエラーが発生し始めました:

Unable to export to Monitering service because: GaxError RPC failed, caused by 3:One or more TimeSeries could not be written: Metrics cannot be written to gae_app. See https://cloud.google.com/monitoring/custom-metrics/creating-metrics#which-resource for a list of writable resource types.: timeSeries[0]

これは、ヘルスチェックログの後に毎回発生します。

Health checks: instance=instanceName start=2020-01-14T14:28:07+00:00 end=2020-01-14T14:28:53+00:00 total=18 unhealthy=0 healthy=18

しばらくすると、インスタンスが再起動し、同じことが再び起こり始めます。

app.yaml

 #https://cloud.google.com/appengine/docs/flexible/Java/reference/app-yaml

#General settings
runtime: Java
api_version: '1.0'
env: flex
runtime_config:
  jdk: openjdk8
#service: service_name #Required if creating a service. Optional for the default service.

#https://cloud.google.com/compute/docs/machine-types
#Resource settings
resources:
  cpu: 2
  memory_gb: 6 #memory_gb = cpu * [0.9 - 6.5] - 0.4
#  disk_size_gb: 10 #default

##Liveness checks - Liveness checks confirm that the VM and the Docker container are running. Instances that are deemed unhealthy are restarted.
liveness_check:
  path: "/liveness_check"
  timeout_sec: 20         #1-300   Timeout interval for each request, in seconds.
  check_interval_sec: 30 #1-300   1-300Time interval between checks, in seconds.
  failure_threshold: 6   #1-10    An instance is unhealthy after failing this number of consecutive checks.
  success_threshold: 2   #1-10    An unhealthy instance becomes healthy again after successfully responding to this number of consecutive checks.
  initial_delay_sec: 300 #0-3600  The delay, in seconds, after the instance starts during which health check responses are ignored. This setting can allow an instance more time at deployment to get up and running.

##Readiness checks - Readiness checks confirm that an instance can accept incoming requests. Instances that don't pass the readiness check are not added to the pool of available instances.
readiness_check:
  path: "/readiness_check"
  timeout_sec: 10             #1-300      Timeout interval for each request, in seconds.
  check_interval_sec: 15      #1-300      Time interval between checks, in seconds.
  failure_threshold: 4       #1-10    An instance is unhealthy after failing this number of consecutive checks.
  success_threshold: 2       #1-10    An unhealthy instance becomes healthy after successfully responding to this number of consecutive checks.
  app_start_timeout_sec: 300 #1-3600  The maximum time, in seconds, an instance has to become ready after the VM and other infrastructure are provisioned. After this period, the deployment fails and is rolled back. You might want to increase this setting if your application requires significant initialization tasks, such as downloading a large file, before it is ready to serve.

#Service scaling settings
automatic_scaling:
  min_num_instances: 2
  max_num_instances: 3
  cpu_utilization:
    target_utilization: 0.7
7
Shb

このエラーは、Stackdriver Loggingサイドカーを 1.6.25 version にアップグレードすることで発生します。これにより、OpenCensusを介してStackdriver MonitoringにFluentD指標がプッシュされます。ただし、App Engine Flexとの統合はまだ機能していません。

これらのエラーはログのみである必要があります。ヘルスチェックログとは関係ありません。 VM再起動には影響しません。VMインスタンスが頻繁に再起動される場合は、他の原因が原因である可能性があります。StackdriverLogging UIで、 Free disk spacevm.syslogストリームとunhealthy sidecarsvm.eventsストリーム。一部のログが表示された場合、インスタンスの再起動は、空きディスクサイズが小さいか、正常でないサイドカーコンテナーが原因である可能性があります。

2
Yanwei Guo