Cilium Native HostGateway Deep Dive: Kind Deployment, Traffic Paths, and tc/Iptables Verification

Cilium Native HostGateway forwards cross-node Pod traffic through Linux native routing instead of tunnel encapsulation. It fits Kubernetes clusters where the underlay network can directly reach Pod CIDRs. This article focuses on deployment, the differences between ClusterIP and NodePort handling, and observability through tc and iptables. Keywords: Cilium, eBPF, Native Routing.

The technical specification snapshot captures the lab setup

Parameter Description
Project Cilium Native HostGateway
Core Languages Go, C, eBPF
Runtime Environment Kubernetes 1.27.3, Kind
Network Mode Native Routing / HostGateway
Service Handling ClusterIP is load-balanced by Cilium at tc, while NodePort still depends on kube-proxy
Key Configuration routingMode=native, autoDirectNodeRoutes=true, kubeProxyReplacement=false
Core Dependencies Helm, Kind, kubectl, tcpdump, bpftool
Upstream Protocols / Mechanisms Linux Routing, tc, iptables, ARP, veth
Official Documentation https://docs.cilium.io/en/stable/network/concepts/routing/
GitHub Stars Cilium GitHub: approximately 20k+ stars

Cilium Native HostGateway effectively returns cross-node forwarding to Linux routing

You can think of Native Routing as HostGateway: packets destined for non-local Pods do not use VXLAN or Geneve encapsulation. Instead, the kernel routing table handles them directly.

This means the underlay network must be able to forward Pod CIDRs. If nodes share the same Layer 2 network, you can enable autoDirectNodeRoutes=true. If nodes span Layer 3 boundaries, you typically need a BGP control plane to distribute routes.

This mode solves two practical problems

It mainly addresses two pain points: the encapsulation overhead and troubleshooting complexity introduced by overlay networks, and the lack of transparency in the data path. In Native mode, packet flow stays much closer to the standard Linux networking stack, which makes step-by-step verification with tcpdump, ip route, iptables, and bpftool much easier.

helm install cilium cilium/cilium \
  --namespace kube-system \
  --version 1.17.15 \
  --set routingMode=native \
  --set kubeProxyReplacement=false \
  --set autoDirectNodeRoutes=true \
  --set ipam.mode=kubernetes \
  --set ipv4NativeRoutingCIDR="10.0.0.0/8"

This configuration enables Native Routing while keeping kube-proxy responsible for NodePort and related capabilities.

image AI Visual Insight: The image shows the Linux networking stack and traffic-processing layers. It highlights the order of NIC ingress and egress, tc, the protocol stack, and kernel forwarding, which helps explain where Cilium performs Service translation at the tc layer.

image AI Visual Insight: The image further illustrates the interaction between eBPF/Cilium hook points and the traditional kernel path. It shows that a request may have its destination rewritten by a tc program before it enters the IP layer, which directly affects what tcpdump and iptables can observe.

You can reproduce the Native HostGateway scenario quickly in Kind

This lab uses Kind to create a two-node cluster, disables the default CNI, and then installs Cilium 1.17.15. That keeps variables to a minimum and makes it easier to focus on Native-mode datapath behavior.

The minimal workflow creates the cluster and installs Cilium

#!/bin/bash
kind create cluster --name=cilium-kubeproxy --image=kindest/node:v1.27.3 --config=- <<'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true
nodes:
- role: control-plane
- role: worker
EOF

controller_node_ip=$(kubectl get node -o wide --no-headers | awk '/control-plane/{print $6}')

helm repo add cilium https://helm.cilium.io
helm repo update

helm install cilium cilium/cilium \
  --namespace kube-system \
  --set k8sServiceHost=$controller_node_ip \
  --set k8sServicePort=6443 \
  --set routingMode=native \
  --set kubeProxyReplacement=false \
  --set autoDirectNodeRoutes=true \
  --set ipam.mode=kubernetes \
  --set ipv4NativeRoutingCIDR="10.0.0.0/8" \
  --version 1.17.15

This script creates a Kind cluster without a default CNI and installs Cilium in Native Routing mode.

The key observation is that ClusterIP and NodePort follow two different paths

When kubeProxyReplacement=false, Cilium does not fully replace kube-proxy, but it still performs per-packet load balancing for ClusterIP traffic inside the cluster at the tc layer.

In practice, this means when a Pod accesses a ClusterIP, Cilium rewrites the destination Service IP to a backend Pod IP at tc. In contrast, NodePort traffic still relies primarily on kube-proxy iptables chains.

The “unexpected” iptables counters have a clear explanation

The lab results show that after a Node or Pod accesses a NodePort, the counters on KUBE-SVC and KUBE-SEP increase, which confirms that kube-proxy DNAT is still active.

However, when a Pod accesses a ClusterIP, the request succeeds while the corresponding iptables counters do not increase. The issue is not that the rules are broken. The packet has already been rewritten by Cilium tc eBPF to the real Pod IP before it ever reaches iptables.

kubectl exec -it -n kube-system cilium-tjsld -- cilium monitor --type trace --from 2234 -v
# Key observation: the destination changes from 10.96.94.95 to 10.244.1.124

This command directly proves that the ClusterIP-to-PodIP translation happens at the tc/BPF stage.

Packet capture results only make sense when you account for the hook point

If you capture packets on a Pod eth0 or on the host-side lxc* veth, you will often still see PodIP -> ClusterIP. That is because the tcpdump capture point sits before tc, so it sees the original packet before BPF rewrites it.

But if you move the capture to the host eth0, you will see that the destination address has already become the backend Pod IP, because the packet has already passed through tc, IP routing, and the forwarding decision.

This is the easiest place to misread Native mode behavior

Many people still see ClusterIP in packet captures taken from the Pod side and assume Cilium did not perform load balancing. In reality, the tc rewrite happens later in the path. You need to combine cilium monitor with packet captures from the host side to interpret the flow correctly.

docker exec -it cilium-kubeproxy-control-plane tcpdump -pnei eth0 host 10.244.1.124
# Capture host egress traffic and confirm that the destination has become the real Pod IP

This command verifies the final forwarding destination after the tc rewrite at the host egress.

cilium_host and veth/ARP behavior explain why the gateway looks local

A Pod routing table usually shows the default gateway pointing to cilium_host, such as 10.244.0.26. But if you inspect the ARP table, the MAC address actually resolves to the host-side lxc* veth.

That means the Pod believes it is sending traffic to a gateway, while in reality it enters the host network namespace directly through a veth pair, where the kernel and Cilium BPF programs take over.

Local-backend delivery behaves differently when the selected Pod is on the same node

If a ClusterIP ultimately selects a local Pod, the packet may not traverse a traditionally visible forwarding path. Cilium can enforce policy and perform redirection through a BPF program on cilium_host egress, and it may even use bpf_redirect() to send the packet directly to the target endpoint, which is not always easy to observe with tcpdump.

bpftool net show | grep cilium_host
bpftool prog dump xlated id 5602 | tail
# The core logic shows a bpf_redirect call

These commands help confirm the eBPF program attached to cilium_host and the final redirection action.

In practice, you should validate the configuration in a specific order

First, check whether Routing: Native is active in cilium status. Second, confirm that Pod CIDRs are reachable between nodes. Third, test both NodePort and ClusterIP access paths, then cross-check the results with iptables and cilium monitor.

If NodePort works, ClusterIP works, but ClusterIP-related iptables counters do not increase, that is expected under this configuration and should not be treated as a fault.

The conclusion is that Native HostGateway acts like an observable enhancement to the native network

It does not replace Linux routing. Instead, it uses tc/eBPF at critical points to handle Service translation, policy, and forwarding logic early in the path. To understand it, do not memorize commands in isolation. Focus on where the packet gets rewritten, on which interface it remains visible, and at which point in the path it is no longer visible.

FAQ

1. Are Cilium Native HostGateway and Host-Routing the same thing?

No. Native Routing means cross-node packets are forwarded by the underlay routing layer. Host-Routing usually refers to optimizations in the host networking path. The concepts are related, but they are not identical.

2. Why does Pod-to-ClusterIP traffic succeed while iptables counters do not increase?

Because with kubeProxyReplacement=false, Cilium already performs ClusterIP-to-PodIP load balancing at the tc layer, so the destination address is rewritten before the packet reaches iptables.

3. Why can’t I capture the rewritten destination Pod IP on the Pod network interface?

Because the tcpdump capture point usually sits before tc. To see the complete translation path, combine packet capture on host eth0 with cilium monitor and bpftool.

[AI Readability Summary]

This article reconstructs the deployment and validation workflow for Cilium Native HostGateway mode. It explains how the mode handles Pod routing, ClusterIP, and NodePort differently, and uses tcpdump, iptables, cilium monitor, and bpftool to restore the tc eBPF forwarding path. Keywords: Cilium, eBPF, Kubernetes.