Troubleshooting

MicroShift known issues and troubleshooting tips.

On EC2 with RHEL 8.4

service-ca can't be created

If you want to run microshift on EC2 RHEL 8.4(cat /etc/os-release), you might find ingress and service-ca will not stay online.

Inside the failing pods, you might find errors as: 10.43.0.1:443: read: connection timed out.

This a known issue on RHEL 8.4 and will be resolved in 8.5.

In order to work on RHEL 8.4, you may disable the NetworkManager and reboot to resolve this issue.

Example:

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
reboot

You can find the details of this EC2 NetworkManager issue tracked at issue.

OpenShift pods CrashLoopBackOff

A few minutes after microshift started, OpenShift pods fall into CrashLoopBackOff.

If you check up the journalctl |grep iptables, you may see the following:


Sep 21 19:12:54 ip-172-31-85-30.ec2.internal microshift[1297]: I0921 19:12:54.399365    1297 server_others.go:185] Using iptables Proxier.
Sep 21 19:13:50 ip-172-31-85-30.ec2.internal kernel: iptables[2438]: segfault at 88 ip 00007feaf5dc0e47 sp 00007fff6f2fea08 error 4 in libnftnl.so.11.3.0[7feaf5dbc000+16000]
Sep 21 19:13:50 ip-172-31-85-30.ec2.internal systemd-coredump[2442]: Process 2438 (iptables) of user 0 dumped core.
Sep 21 20:35:57 ip-172-31-85-30.ec2.internal microshift[1297]: E0921 20:35:57.914558    1297 remote_runtime.go:143] StopPodSandbox "1ae45abde0b46d8ea5176b6a00f0e5b4291e6bb496762ca25a4196a5f18d0475" from runtime service failed: rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_service-ca-64547678c6-2nxnp_openshift-service-ca_6236deba-fc5f-4915-817d-f8699a4accfc_0(1ae45abde0b46d8ea5176b6a00f0e5b4291e6bb496762ca25a4196a5f18d0475): error removing pod openshift-service-ca_service-ca-64547678c6-2nxnp from CNI network "crio": running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.42.0.3 -j CNI-d5d0edec163ce01e4591c1c4 -m comment --comment name: "crio" id: "1ae45abde0b46d8ea5176b6a00f0e5b4291e6bb496762ca25a4196a5f18d0475" --wait]: exit status 2: iptables v1.8.4 (nf_tables): Chain 'CNI-d5d0edec163ce01e4591c1c4' does not exist

Also, the openshift-ingress pod will fail on:

I0921 17:36:17.811391       1 router.go:262] router "msg"="router is including routes in all namespaces"
E0921 17:36:17.914638       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0921 17:36:17.948417       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

As a workaround, you can follow steps below:

  • delete flannel daemonset

    oc delete ds -n kube-system kube-flannel-ds
    
  • restart all the OpenShift pods.

This workaround won't affect the single node microshift functionality since the flannel daemonset is used for multi-node MicroShift.

This issue is tracked at: #296

Last modified October 24, 2023 at 8:51 AM PST : build(deps): bump actions/setup-node from 3 to 4 (#212) (333d7a1)
Last modified October 24, 2023 at 8:51 AM PST : build(deps): bump actions/setup-node from 3 to 4 (#212) (333d7a1)