Home/Blog/All/Kubernetes Node Internals — Part 1: Anatomy

Tech

Kubernetes Node Internals — Part 1: Anatomy

Part 1 of a 5-part series: the layered anatomy of a Kubernetes node, from hardware and kernel to kubelet, CRI, OCI, and the control plane boundary.

← Back to all blogsMarch 18, 2026#tech#kubernetes#linux#containers

Kubernetes Node Internals — Part 1: Anatomy

"A node looks like a virtual machine. But what's actually running on it is a precise stack of six or seven processes — each with one job. Pull any one out and the whole thing stops working. Let's meet them."

When people first learn Kubernetes, they usually start from the top: Pods, Deployments, Services, Ingress, and maybe Helm charts.

But eventually you hit a more fundamental question:

What is actually happening on the machine that runs my Pod?

That machine is the node. And while it may look like a regular VM or Linux server, a Kubernetes node is really a carefully layered runtime stack. Each layer does exactly one thing. The kubelet talks to the container runtime. The container runtime delegates to a lower-level runtime. That lower-level runtime asks the Linux kernel to create isolated processes. And the kernel is the only layer that can actually enforce the illusion that a container is "its own little machine."

This post is Part 1 of a 5-part series on what happens inside a Kubernetes node.

Series roadmap

Part 1 — The anatomy of a node
Part 2 — Bootstrap and the secret handshake
Part 3 — A pod is born
Part 4 — Keeping the node alive
Part 5 — CSI, volumes, and mounts on the node

Scope note — what this part intentionally skips

Part 1 is about the execution stack anatomy and component boundaries.

To keep that model clean, it intentionally does not go deep into:

Pod termination and graceful shutdown timing
node storage internals (CSI, volume mounts on the node, and ephemeral storage pressure)
runtime hardening details (seccomp, AppArmor, SELinux, capabilities)

Those are important, but they are easier to understand once the core stack is clear. We will come back to the storage path explicitly in Part 5.

What is a Kubernetes node?

A Kubernetes node is a machine that can run Pods.

That machine might be:

a cloud VM in EKS, GKE, or AKS
a bare-metal server in your data center
a local machine in Minikube or kind

Conceptually, though, a node is not just "a server in the cluster." It is the execution environment where Kubernetes turns a desired state into running Linux processes.

If the control plane is the part of Kubernetes that decides, the node is the part that does.

The scheduler may decide that your Pod should run on node-7, but nothing actually happens until software on node-7 performs the work:

download the image
prepare isolation
create cgroups
wire networking
start processes
report status back

That is why understanding the node is so valuable. If something breaks while a Pod starts, stops, gets OOM-killed, or loses networking, the explanation is almost always somewhere inside the node stack.

The layered stack

The easiest way to understand a node is as a stack of layers.

From bottom to top, it looks like this:

Hardware
Linux kernel
OCI runtime like runc
Container runtime / CRI implementation like containerd or CRI-O
kubelet
Kubernetes API server on the control plane

Let's walk upward.

Control loops at a glance

Across the series, you can think in three loops:

Trust loop (Part 2): how a machine becomes a trusted node
Creation loop (Part 3): how desired Pod state becomes running processes
Survival loop (Part 4): how the node stays healthy under pressure

Part 1 is the static map of components these loops run through.

1. Hardware

At the bottom is the actual machine: CPUs, memory, disks, network interfaces, NUMA topology, and device drivers.

Containers do not run on abstract magic. They run on real cores, allocate real RAM, and write to real block devices.

When you request:

yaml

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"

you are ultimately asking Kubernetes to carve out a safe slice of real machine resources.

2. Linux kernel

The kernel is the most important layer on the node.

It is the kernel that provides the primitives containers rely on:

namespaces for isolation
cgroups for resource control
virtual networking primitives like veth devices and bridges
filesystems like overlayfs
syscalls for process creation and management

This is worth stating clearly:

Containers are not a kernel feature called "containers."

Containers are a packaging idea built out of multiple Linux kernel features working together.

3. OCI runtime

An OCI runtime is the tool that performs the final, low-level act of launching a containerized process.

The most common example is runc.

runc is not a high-level orchestrator. It does not schedule Pods. It does not watch the API server. It does not pull images. It takes a prepared container spec and says, in effect:

"Create this process with these namespaces, these cgroup limits, this root filesystem, and this command."

It is very close to the kernel.

4. Container runtime / CRI implementation

Above runc sits a higher-level container runtime such as containerd or CRI-O.

This layer knows how to:

pull images
unpack layers
manage container lifecycle
expose a stable interface to kubelet
invoke a lower-level OCI runtime such as runc

Think of this layer as the container operations manager.

If runc is the final worker that starts a process, containerd or CRI-O is the supervisor that knows how to prepare the work and keep track of many containers over time.

5. kubelet

The kubelet is the primary Kubernetes agent running on the node.

It watches the API server and asks a simple question over and over:

"What Pods should exist on this machine, and what do I need to do to make reality match that?"

The kubelet does not launch containers directly. Instead, it delegates to the container runtime via the Container Runtime Interface (CRI).

That separation is one of Kubernetes' most important design choices. Kubernetes does not want to be tied to one runtime implementation.

6. API server

Finally, above the node stack is the API server, which lives on the control plane.

The API server is not on the node to run workloads. It is the source of truth for cluster state.

The kubelet constantly communicates with it:

reading desired state
writing Pod status
renewing heartbeats
updating Node information

The node is where the work happens. The API server is where the intent and recorded state live.

The cast of characters

If you SSH into a node and inspect the important moving parts, these are the names you should know.

Component	What it is	Main responsibility	Why it exists
`kubelet`	Kubernetes node agent	Makes actual state match Pod specs	The node needs a local Kubernetes brain
`containerd` or `CRI-O`	Container runtime / CRI implementation	Pulls images, manages sandboxes and containers	kubelet needs a runtime it can talk to consistently
`runc`	OCI runtime	Creates the containerized Linux process	Somebody must do the final `clone`, `setns`, `mount`, and `exec` work
`kube-proxy`	Node networking agent	Programs service routing rules	Services need local packet steering
`crictl`	Troubleshooting CLI	Debugs CRI-compatible runtimes	Humans need a runtime-level inspection tool

Let's make each one concrete.

kubelet — the node's brain

If the node had to be explained in one sentence, it would be this:

kubelet is the process that turns a Pod spec into reality.

It is responsible for:

watching for Pods assigned to the node
creating Pod sandboxes through the runtime
starting and stopping containers
mounting volumes
running health probes
reporting container and Pod status back to the API server
publishing node conditions such as Ready, MemoryPressure, and DiskPressure

The kubelet is not the scheduler. It does not decide which node should run the Pod. That decision was already made elsewhere. The kubelet only handles: "This Pod is now your problem."

`CRI-O` and `containerd` — the CRI layer

Kubelet needs a runtime, but it does not want runtime-specific code for every vendor and implementation. That is where the Container Runtime Interface (CRI) comes in.

CRI defines a standard contract for operations such as:

create a Pod sandbox
pull an image
start a container
stop a container
list containers
fetch logs and status

containerd and CRI-O are two popular runtimes that implement this contract.

Important nuance: a node usually runs one CRI implementation, not both. In practice you will typically see either:

containerd: broadly used, general-purpose, widely adopted
CRI-O: purpose-built for Kubernetes, tightly aligned with CRI and OCI

From kubelet's perspective, both solve the same problem:

"I need a thing behind a socket that knows how to create and manage containers."

`runc` — the OCI runtime

If containerd or CRI-O is the manager, runc is the mechanic that actually starts the engine.

It consumes an OCI bundle: a filesystem plus a config.json describing mounts, namespaces, environment variables, capabilities, cgroups, and the command to execute.

Then it asks the Linux kernel to make the process real.

That is an important mental model:

kubelet does not start your process
containerd / CRI-O usually does not directly start your process either
runc performs the low-level launch
the kernel is the only thing that can enforce the isolation and limits

`kube-proxy` — the network plumber

Pods get IP addresses, but Kubernetes networking is not only about Pod-to-Pod reachability. Services also need a way to route traffic to the correct backend Pods.

That is where kube-proxy typically enters the picture.

On many clusters, kube-proxy runs on every node and programs network rules using:

iptables
or ipvs

Its job is to translate a stable Service virtual IP into one of the real backend Pod IPs.

So if a packet arrives for 10.96.12.4:80, kube-proxy may rewrite or steer it toward one of the actual Pod endpoints behind that Service.

It is the node's packet plumber.

Small accuracy note: modern clusters sometimes replace kube-proxy behavior with eBPF-based systems such as Cilium. But kube-proxy is still the default mental model most Kubernetes users should start with.

`crictl` — the debugging CLI, not a dependency

crictl is often misunderstood.

It is not required for Kubernetes to run.

Instead, it is a human-facing troubleshooting tool that can talk directly to a CRI-compatible runtime. If kubectl shows something odd, crictl helps you inspect what the runtime believes is happening.

Examples of what crictl is good for:

listing Pod sandboxes
listing containers
checking image state
reading container status when kubelet output is not enough

Think of crictl as runtime-level stethoscope, not a piece of the production dependency chain.

The dependency chain

Now we can connect the pieces.

When a Pod starts on a node, the control flow usually looks like this:

Rendering diagram…

In compact form:

Rendering diagram…

This chain explains a lot of real-world debugging:

If kubelet is unhealthy, Pods may never start.
If the CRI runtime is down, kubelet has nobody to delegate to.
If runc cannot set up the container, the container never becomes a process.
If the kernel cannot provide namespaces, cgroups, mounts, or networking, everything above it fails.

If you only remember one startup debugging heuristic, use this order:

kubelet → CRI runtime → OCI runtime → kernel primitives

The gRPC socket between kubelet and CRI

One of the cleanest architectural decisions in Kubernetes is that kubelet and the runtime usually communicate over a local Unix socket using gRPC.

This gives Kubernetes a stable abstraction boundary.

Kubelet can say things like:

RunPodSandbox
CreateContainer
StartContainer
StopPodSandbox

without caring whether the implementation behind the socket is containerd or CRI-O.

That decoupling is what makes Kubernetes modular.

What the node does NOT own

A very common beginner mistake is to mentally place every Kubernetes component on the node.

That is not how the architecture works.

Here are the major pieces the node does not own.

Component	Lives where conceptually	What it does
`etcd`	Control plane	Stores cluster state
`kube-apiserver`	Control plane	Exposes the Kubernetes API
`kube-scheduler`	Control plane	Chooses which node should run a Pod
`kube-controller-manager`	Control plane	Runs reconciliation loops for cluster-level controllers

The node does not schedule itself. It does not store cluster truth. It does not make global placement decisions.

It is the execution worker, not the cluster authority.

One caveat: in local learning setups or single-node clusters, these components may happen to run on the same machine. But logically they still belong to the control plane, not the node execution stack.

Key diagram — the vertical stack

Here is the simplest useful diagram for this entire post:

Rendering diagram…

Read the diagram bottom to top when asking, "What makes the container possible?"

Read it top to bottom when asking, "How does a Pod spec become a process?"

Concepts introduced

Before moving to Part 2, make sure these three ideas are solid.

1. Container Runtime Interface (CRI)

CRI is the contract between kubelet and the container runtime.

It exists so Kubernetes can delegate container operations without being tightly coupled to one runtime implementation.

2. Open Container Initiative (OCI)

OCI defines standards around container image and runtime formats.

In practice, when people say OCI runtime, they usually mean a runtime like runc that knows how to launch a container from an OCI-compliant spec.

3. gRPC socket between kubelet and the runtime

The kubelet talks to the runtime over a local gRPC API, typically exposed through a Unix domain socket.

That local socket is the handoff point between "Kubernetes orchestration" and "container lifecycle management."

Final mental model

If you remember only one thing from this post, remember this:

A Kubernetes node is not one thing. It is a stack.

The kubelet manages desired vs actual state.
The CRI runtime manages container lifecycle.
The OCI runtime launches the process.
The kernel provides the isolation and limits.
The hardware provides the real resources.

Once that layering clicks, a lot of Kubernetes starts to feel less magical.

In Part 2, we will go one layer deeper: the Linux primitives, TLS bootstrapping, and the security handshake that allows a node to join the cluster in the first place.

Next: Part 2 — Bootstrap and the secret handshake

Kubernetes Node Internals — Part 1: Anatomy

Series roadmap

Scope note — what this part intentionally skips

What is a Kubernetes node?

The layered stack

Control loops at a glance

1. Hardware

2. Linux kernel

3. OCI runtime

4. Container runtime / CRI implementation

5. kubelet

6. API server

The cast of characters

kubelet — the node's brain

CRI-O and containerd — the CRI layer

runc — the OCI runtime

kube-proxy — the network plumber

crictl — the debugging CLI, not a dependency

The dependency chain

The gRPC socket between kubelet and CRI

What the node does NOT own

Key diagram — the vertical stack

Concepts introduced

1. Container Runtime Interface (CRI)

2. Open Container Initiative (OCI)

3. gRPC socket between kubelet and the runtime

Final mental model

`CRI-O` and `containerd` — the CRI layer

`runc` — the OCI runtime

`kube-proxy` — the network plumber

`crictl` — the debugging CLI, not a dependency