web-dev-qa-db-ja.com

Kubernetes WatchConnectionManager:実行の失敗:HTTP 403

エラーExpected HTTP 101 response but was '403 Forbidden'が発生しました。単一のマスターと2つのワーカーでKubeadmを使用して新しいKubernetesクラスターをセットアップした後、以下のERRORメッセージに遭遇したpysparkサンプルアプリを送信します。

spark-submitコマンド

spark-submit --master k8s://master-Host:port \
--deploy-mode cluster --name test-pyspark \
--conf spark.kubernetes.container.image=mm45/pyspark-k8s-example:2.4.1 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=1000m \
--conf spark.driver.memory=1000m \
--conf spark.executor.cores=1 \
--conf spark.driver.cores=1 \
--conf spark.driver.maxResultSize=10g /usr/bin/run.py

エラーの詳細:

19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
Java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

クラスターの詳細:

  1. Kubernetesバージョン1.13.0
  2. Sparkバージョン2.4.1
  3. クラウドプラットフォーム:EC2で実行されるAWS

クラスタロールバインディング:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fabric8-rbac
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

完全なポッドログとエラースタックトレース

++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver-py
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sort -t_ -k4 -n
+ grep SPARK_Java_OPT_
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_Java_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.6.8'
+ export PYTHON_VERSION=3.6.8
+ PYTHON_VERSION=3.6.8
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.32.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.Apache.spark.deploy.PythonRunner file:/usr/bin/run.py
19/08/24 19:38:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
Using Spark's default log4j profile: org/Apache/spark/log4j-defaults.properties
19/08/24 19:38:04 INFO SparkContext: Running Spark version 2.4.1
19/08/24 19:38:04 INFO SparkContext: Submitted application: calculate_pyspark_example
19/08/24 19:38:04 INFO SecurityManager: Changing view acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing view acls groups to:
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls groups to:
19/08/24 19:38:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/08/24 19:38:04 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
19/08/24 19:38:04 INFO SparkEnv: Registering MapOutputTracker
19/08/24 19:38:04 INFO SparkEnv: Registering BlockManagerMaster
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: Using org.Apache.spark.storage.DefaultTopologyMapper for getting topology information
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/08/24 19:38:04 INFO DiskBlockManager: Created local directory at /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/blockmgr-718b703d-3587-44a6-8014-02162ae3a48c
19/08/24 19:38:04 INFO MemoryStore: MemoryStore started with capacity 400.0 MB
19/08/24 19:38:04 INFO SparkEnv: Registering OutputCommitCoordinator
19/08/24 19:38:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/08/24 19:38:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:04 INFO SparkContext: Added file file:///usr/bin/run.py at spark://test-pyspark-1566675457745-driver-svc.default.svc:7078/files/run.py with timestamp 1566675484977
19/08/24 19:38:04 INFO Utils: Copying /usr/bin/run.py to /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8/userFiles-5cfd25bf-4775-404d-86c8-5a392deb1e18/run.py
19/08/24 19:38:06 INFO ExecutorPodsAllocator: Going to request 2 executors from Kubernetes.
19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
Java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
    at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.Java:216)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.Java:183)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.Java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.Java:32)
    at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1149)
    at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:624)
    at Java.lang.Thread.run(Thread.Java:748)
19/08/24 19:38:06 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/08/24 19:38:06 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.Java:201)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.Java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.Java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.Java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.Java:32)
    at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1149)
    at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:624)
    at Java.lang.Thread.run(Thread.Java:748)
19/08/24 19:38:06 INFO SparkUI: Stopped Spark web UI at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
19/08/24 19:38:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/08/24 19:38:06 INFO MemoryStore: MemoryStore cleared
19/08/24 19:38:06 INFO BlockManager: BlockManager stopped
19/08/24 19:38:06 INFO BlockManagerMaster: BlockManagerMaster stopped
19/08/24 19:38:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/08/24 19:38:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/08/24 19:38:06 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
  File "/usr/bin/run.py", line 8, in <module>
    with SparkContext(conf=conf) as sc:
  File "/opt/spark/python/lib/pyspark.Zip/pyspark/context.py", line 136, in __init__
  File "/opt/spark/python/lib/pyspark.Zip/pyspark/context.py", line 198, in _do_init
  File "/opt/spark/python/lib/pyspark.Zip/pyspark/context.py", line 306, in _initialize_context
  File "/opt/spark/python/lib/py4j-0.10.7-src.Zip/py4j/Java_gateway.py", line 1525, in __call__
  File "/opt/spark/python/lib/py4j-0.10.7-src.Zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.Apache.spark.api.Java.JavaSparkContext.
: io.fabric8.kubernetes.client.KubernetesClientException:
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.Java:201)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.Java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.Java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.Java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.Java:32)
    at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1149)
    at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:624)
    at Java.lang.Thread.run(Thread.Java:748)

19/08/24 19:38:06 INFO ShutdownHookManager: Shutdown hook called
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-178fc09d-353c-4906-8899-bdac338c804a
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8

これを理解するのを手伝ってくれませんか?

3

エラーが引き続き発生する場合は、Kubernetesバージョン(v.1.14.3)をダウングレードし、spark 2.4.3を使用します。これは、Kubernetesに最新のアップデートがあり、golangで修正されているためです。

参照 https://andygrove.io/2019/08/Apache-spark-regressions-eks/

1
kabir wadhwa