In most real-world Kubernetes cases, infrastructure engineers don’t need to understand how a container itself works. This is reasonable because container image developers are typically responsible for it. However, a problem often arises when a container is deployed on Kubernetes, which is such a complex deployment and orchestration system. Sometimes, unexpected components can influence a container’s behavior.
Debugging an unexpected container is feasible for self-hosted Kubernetes, where engineers have better observability and testing capability. However, if you are using GKE or another cloud provider’s Kubernetes engine, the situation can become complicated.
Here’s a problem I recently faced.
The Challenge: Debugging a Crashing vLLM Container on GKE
Recently, I deployed a Large Language Model (LLM) model serving container on GKE as a Deployment. When I ran the serving framework in a non-containerized environment, everything worked perfectly, but it consistently crashed when running inside a Kubernetes pod.
The most challenging aspect of the situation was that no application stdout or logs were being generated by the application. The only information I had was from kubectl
, which showed that the container had exited with status code 1.‘
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
meta-deployment-789d98c8f7-4fsz4 0/1 CrashLoopBackOff 5 10m
To better understand why this issue was happening, I tried the same container with different entry points and commands. I found that the issue was highly related to the keyword “vllm“. Every time a command contained the string “vllm“, it would crash (except when I used echo vllm
). It seemed that echo
worked because it’s a shell built-in command, but any command that involved exec
crashed.
With this context, I decided to use gdb
it to debug the container. To do that, I needed to rebuild the container image on top of the existing one. However, even running gdb --args python -m vllm.entrypoints.api_server ....
still resulted in a crash. To move forward, I decided to rely on strace, a kernel tracing utility, to understand what was happening.
Here’s the tricky part: GKE‘s default worker node OS image is COS (Container-Optimized OS) , a Debian-based Linux OS with many restrictions. For example, it lacks binary/package managers, and most paths under its file system are non-executable. So, how could I easily download and use strace?
Step-by-Step Guide: Debugging GKE CrashLoopBackOff with Strace
Since we were already on a containerized platform, the simplest approach was to deploy another container and perform all operations within it. This container also needed to fulfill three criteria:
- The strace pod must be privileged.
- The strace pod must have the
SYS_PTRACE
Linux capability. - The strace pod must be in the host PID namespace to view all processes.
Here’s an example of pod I used
$ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: strace
spec:
hostPID: true
containers:
- name: ubuntu-container
image: ubuntu:latest
command: ["sleep"]
args: ["infinity"]
securityContext:
privileged: true
capabilities:
add: ["SYS_PTRACE"]
EOF
After deploying the pod, I accessed its shell to check the PID of the target process I wanted to trace. The output below is an example (not from my actual pod). We can see that the container’s entry point, bash
, has a PID of 4128.
$ kubectl exec -it strace -- bash
root@strace:/$ apt update && apt -y install strace
root@strace:/$ ps auxf
...
root 2863 0.0 0.0 1238252 13984 ? Sl 15:05 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id eb73d524f84a7813897af328350daf417e26d91661596ce64a47002ce84e0218 -address /run/containerd/conta
root 2936 0.0 0.0 20436 10512 ? Ss 15:05 0:00 \_ /pause
root 4128 1.3 0.2 3005172 88204 ? Ssl 15:05 0:50 \_ /bash
...
The next step was to use strace to monitor the process, including any child processes that would be created. Meanwhile, I re-ran the vllm container’s entry point to observe the issue.
# In strace pod
root@strace:/$ strace -f -p [PID] 2>&1 | tee strace.log # pid to the target container, e.g. 4128
# Go back to target pod's shell
root@meta-deployment-789d98c8f7-4fsz4:/$ python -m vllm.entrypoints.api_server ....
# Inside strace pod
wait(-1,
fork(....
After that, I was able to examine the strace log to understand why and when my container was crashing.
Summary
In this post, I explained a complicated scenario and the limitations of GKE when debugging it. I also described the solution to overcome these debugging difficulties by enabling the powerful strace utility on a GKE node. As for the reason behind this strange behavior, that will be discussed in a later post. It’s a different but interesting problem.
Cross-references
[1] vLLM deployment example from GCP Vertex Model Garden
apiVersion: apps/v1
kind: Deployment
metadata:
name: meta-deployment
spec:
replicas: 1
selector:
matchLabels:
app: meta-server
template:
metadata:
labels:
app: meta-server
ai.gke.io/model: Llama-3-1-8B-Instruct
ai.gke.io/inference-server: vllm
examples.ai.gke.io/source: model-garden
spec:
containers:
- name: inference-server
image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240821_1034_RC00
resources:
requests:
cpu: 8
memory: 29Gi
ephemeral-storage: 80Gi
nvidia.com/gpu : 1
limits:
cpu: 8
memory: 29Gi
ephemeral-storage: 80Gi
nvidia.com/gpu : 1
command:
args:
- python
- -m
- vllm.entrypoints.api_server
- --host=0.0.0.0
- --port=7080
- --swap-space=16
- --gpu-memory-utilization=0.9
- --max-model-len=32768
- --trust-remote-code
- --disable-log-stats
- --model=gs://vertex-model-garden-public-us/llama3.1/Meta-Llama-3.1-8B-Instruct
- --tensor-parallel-size=1
- --max-num-seqs=12
- --enforce-eager
- --disable-custom-all-reduce
- --enable-chunked-prefill
env:
- name: MODEL_ID
value: "meta-llama/Llama-3.1-8B-Instruct"
- name: DEPLOY_SOURCE
value: "UI_NATIVE_MODEL"
volumeMounts:
- mountPath: /dev/shm
name: dshm
volumes:
- name: dshm
emptyDir:
medium: Memory
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-l4
---
apiVersion: v1
kind: Service
metadata:
name: meta-service
spec:
selector:
app: meta-server
type: ClusterIP
ports:
- protocol: TCP
port: 8000
targetPort: 7080
Leave a Reply