KubernetesでのDNS解決問題のデバッグ

Question

Ubuntu 18.04でKubesprayを使用してKubernetesクラスターを構築し、DNSの問題に直面したため、基本的にコンテナーはホスト名を介して通信できません。

機能しているもの：

iPアドレスを介したコンテナー通信
コンテナからインターネットが機能している
解決できるkubernetes.default

Kubernetesマスター：

root@k8s-1:~# cat /etc/resolv.conf | grep -v ^\# nameserver 127.0.0.53 search home root@k8s-1:~#

ポッド：

root@k8s-1:~# kubectl exec dnsutils cat /etc/resolv.conf nameserver 169.254.25.10 search default.svc.cluster.local svc.cluster.local cluster.local home options ndots:5 root@k8s-1:~#

CoreDNSポッドは正常です。

root@k8s-1:~# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns NAME READY STATUS RESTARTS AGE coredns-58687784f9-8rmlw 1/1 Running 0 35m coredns-58687784f9-hp8hp 1/1 Running 0 35m root@k8s-1:~#

CoreDNSポッドのログ：

root@k8s-1:~# kubectl describe pods --namespace=kube-system -l k8s-app=kube-dns | tail -n 2 Normal Started 35m kubelet, k8s-2 Started container coredns Warning DNSConfigForming 12s (x33 over 35m) kubelet, k8s-2 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 4.2.2.1 4.2.2.2 208.67.220.220 root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-8rmlw .:53 2020-02-09T22:56:14.390Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2 2020-02-09T22:56:14.391Z [INFO] CoreDNS-1.6.0 2020-02-09T22:56:14.391Z [INFO] linux/AMD64, go1.12.7, 0a218d3 CoreDNS-1.6.0 linux/AMD64, go1.12.7, 0a218d3 root@k8s-1:~# root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-hp8hp .:53 2020-02-09T22:56:20.388Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2 2020-02-09T22:56:20.388Z [INFO] CoreDNS-1.6.0 2020-02-09T22:56:20.388Z [INFO] linux/AMD64, go1.12.7, 0a218d3 CoreDNS-1.6.0 linux/AMD64, go1.12.7, 0a218d3 root@k8s-1:~#

CoreDNSが公開されているようです：

root@k8s-1:~# kubectl get svc --namespace=kube-system | grep coredns coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 37m root@k8s-1:~# root@k8s-1:~# kubectl get ep coredns --namespace=kube-system NAME ENDPOINTS AGE coredns 10.233.64.2:53,10.233.65.3:53,10.233.64.2:53 + 3 more... 37m root@k8s-1:~#

これらは私の問題のあるポッドです-この問題のために影響を受けるすべてのクラスター：

root@k8s-1:~# kubectl get pods -o wide -n default NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 1/1 Running 0 17m 10.233.66.7 k8s-3 <none> <none> dnsutils 1/1 Running 0 50m 10.233.66.5 k8s-3 <none> <none> nginx-86c57db685-p8zhc 1/1 Running 0 43m 10.233.64.3 k8s-1 <none> <none> nginx-86c57db685-st7rw 1/1 Running 0 47m 10.233.66.6 k8s-3 <none> <none> root@k8s-1:~#

IPアドレスを通じてDNSとコンテナーを使用してインターネットに到達できる：

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping 10.233.64.3" PING 10.233.64.3 (10.233.64.3) 56(84) bytes of data. 64 bytes from 10.233.64.3: icmp_seq=1 ttl=62 time=0.481 ms 64 bytes from 10.233.64.3: icmp_seq=2 ttl=62 time=0.551 ms ... root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping google.com" PING google.com (172.217.21.174) 56(84) bytes of data. 64 bytes from fra07s64-in-f174.1e100.net (172.217.21.174): icmp_seq=1 ttl=61 time=77.9 ms ... root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping kubernetes.default" PING kubernetes.default.svc.cluster.local (10.233.0.1) 56(84) bytes of data. 64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=1 ttl=64 time=0.030 ms 64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=2 ttl=64 time=0.069 ms ...

実際の問題：

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping nginx-86c57db685-p8zhc" ping: nginx-86c57db685-p8zhc: Name or service not known command terminated with exit code 2 root@k8s-1:~# root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping dnsutils" ping: dnsutils: Name or service not known command terminated with exit code 2 root@k8s-1:~# oot@k8s-1:~# kubectl exec -ti busybox -- nslookup nginx-86c57db685-p8zhc Server: 169.254.25.10 Address: 169.254.25.10:53 ** server can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: NXDOMAIN *** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer *** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer *** Can't find nginx-86c57db685-p8zhc.home: No answer *** Can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: No answer *** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer *** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer *** Can't find nginx-86c57db685-p8zhc.home: No answer command terminated with exit code 1 root@k8s-1:~#

ホスト名を使用してコンテナー間の通信を修正する方法または何か不足していますか？

どうもありがとう

更新済み

その他のチェック：

root@k8s-1:~# kubectl exec -ti dnsutils -- nslookup kubernetes.default Server: 169.254.25.10 Address: 169.254.25.10#53 Name: kubernetes.default.svc.cluster.local Address: 10.233.0.1

StatefulSetを作成しました：

kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/web/web.yaml

私はサービス「nginx」をpingすることができます：

root@k8s-1:~/kplay# k exec dnsutils -it nslookup nginx Server: 169.254.25.10 Address: 169.254.25.10#53 Name: nginx.default.svc.cluster.local Address: 10.233.66.8 Name: nginx.default.svc.cluster.local Address: 10.233.64.3 Name: nginx.default.svc.cluster.local Address: 10.233.65.5 Name: nginx.default.svc.cluster.local Address: 10.233.66.6

FQDNの使用時にステートフルセットメンバーに連絡することもできます

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0.nginx.default.svc.cluster.local Server: 169.254.25.10 Address: 169.254.25.10#53 Name: web-0.nginx.default.svc.cluster.local Address: 10.233.65.5 root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1.nginx.default.svc.cluster.local Server: 169.254.25.10 Address: 169.254.25.10#53 Name: web-1.nginx.default.svc.cluster.local Address: 10.233.66.8

ただし、ホスト名だけを使用するわけではありません。

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0 Server: 169.254.25.10 Address: 169.254.25.10#53 ** server can't find web-0: NXDOMAIN command terminated with exit code 1 root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1 Server: 169.254.25.10 Address: 169.254.25.10#53 ** server can't find web-1: NXDOMAIN command terminated with exit code 1 root@k8s-1:~/kplay#

それらはすべて同じ名前空間に住んでいます：

root@k8s-1:~/kplay# k get pods -n default NAME READY STATUS RESTARTS AGE busybox 1/1 Running 22 22h dnsutils 1/1 Running 22 22h nginx-86c57db685-p8zhc 1/1 Running 0 22h nginx-86c57db685-st7rw 1/1 Running 0 22h web-0 1/1 Running 0 11m web-1 1/1 Running 0 10m

サービスにpingできることを確認する別のテスト：

kubectl create deployment --image nginx some-nginx kubectl scale deployment --replicas 2 some-nginx kubectl expose deployment some-nginx --port=12345 --type=NodePort root@k8s-1:~/kplay# k exec dnsutils -it nslookup some-nginx Server: 169.254.25.10 Address: 169.254.25.10#53 Name: some-nginx.default.svc.cluster.local Address: 10.233.63.137

最終的な考え

面白い事実ですが、おそらくこれがKubernetesの動作方法でしょうか？一部のポッドに個別に到達したい場合は、サービスのホスト名とステートフルセットのメンバーに到達できます。ステートフルセットでない場合に個々のポッドに到達することは、少なくとも私のk8sの使用法ではあまり重要ではないように思われます（誰にとっても可能性があります）。

mWatney · Accepted Answer

this に従うことをお勧めします。これにより、CoreDNSで起こりうる問題を特定し、正常に機能していることがわかります。

ステートフルセットでない場合に個々のポッドに到達することは、少なくとも私のk8sの使用法ではあまり重要ではないように思われます（誰にとっても可能性があります）。

DNSレコードを使用してポッドに到達することは可能ですが、通常のK8の実装ではそれほど重要ではありません。

有効にすると、ポッドにpod-ip-address.my-namespace.pod.cluster.localの形式のDNS Aレコードが割り当てられます。

たとえば、名前空間defaultにIP 1.2.3.4があり、DNS名がcluster.localであるポッドには、1-2-3-4.default.pod.cluster.localというエントリがあります。ソース

例

$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dnsutils 1/1 Running 20 20h 10.28.2.3 gke-lab-default-pool-87c6b085-wcp8 <none> <none> sample-pod 1/1 Running 0 2m11s 10.28.2.4 gke-lab-default-pool-87c6b085-wcp8 <none> <none> $ kubectl exec -ti dnsutils -- nslookup 10-28-2-4.default.pod.cluster.local Server: 10.31.240.10 Address: 10.31.240.10#53 Name: 10-28-2-4.default.pod.cluster.local Address: 10.28.2.4

面白い事実ですが、おそらくこれがKubernetesの動作方法でしょうか？

はい、CoreDNSは意図したとおりに機能しており、あなたが説明したすべてが期待されています。