The Linux networking stack is a layered architecture that spans from POSIX socket system calls down to hardware transmit queues. Each layer has a well-defined interface so that protocol implementations, device drivers, and packet-processing extensions can evolve independently. This page walks through the major components from top to bottom.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/deelerdev/linux/llms.txt
Use this file to discover all available pages before exploring further.
Socket layer overview
User-space applications interact with the network through the BSD socket API —socket(2), bind(2), connect(2), send(2), recv(2). Inside the kernel, each socket maps to a struct socket (the VFS-facing object) and a struct sock (the protocol-specific transport endpoint).
proto_ops is the dispatch table for protocol families (AF_INET, AF_INET6, AF_UNIX, …). When user space calls send(2), the VFS routes through the file descriptor to socket->ops->sendmsg, which in turn calls the transport-layer send function on struct sock.
Network device interface
Every network interface — physical NIC, virtual device, tunnel, or loopback — is represented bystruct net_device defined in include/linux/netdevice.h. Drivers allocate a net_device with alloc_netdev(), fill in the net_device_ops function table, and register it with register_netdev().
Key fields in net_device_ops:
ndo_start_xmit is the transmit hot path: the kernel passes a fully built sk_buff to the driver, which places it on the hardware queue. The driver must return NETDEV_TX_OK unless the queue is genuinely full, in which case it calls netif_stop_queue() before returning NETDEV_TX_BUSY.
sk_buff: the socket buffer
struct sk_buff (socket buffer) is the central data structure for every packet in the kernel. It contains a pointer to the packet payload, a ring of metadata headers, and reference counts for zero-copy paths.
head, data, tail, and end delimit the buffer. Protocol headers are pushed towards head during transmit and pulled away during receive:
skb_clone() creates a copy of the metadata struct without copying the payload data, enabling zero-copy forwarding paths.
Packet receive path
Hardware interrupt
The NIC raises a hardware interrupt. The driver’s interrupt handler acknowledges the interrupt, then schedules a NAPI poll by calling
napi_schedule(&adapter->napi).NAPI polling
The kernel’s softirq handler calls the driver’s
napi_poll callback, which dequeues packets from the ring buffer, allocates sk_buff structures, and calls napi_gro_receive() or netif_receive_skb().Protocol demultiplexing
netif_receive_skb() delivers the frame to registered packet_type handlers. Ethernet frames are dispatched by eth_type_trans() to the appropriate L3 protocol handler (IPv4, IPv6, ARP, …).Packet transmit path
Socket send
User space calls
send(2). The socket layer calls into the transport protocol, which segments data, builds sk_buff objects with protocol headers, and calls ip_queue_xmit() (TCP) or udp_send_skb() (UDP).Routing and neighbour lookup
The IP layer looks up the route via
ip_route_output_flow(). If the next-hop MAC address is not cached, the neighbour subsystem (ARP) resolves it.Queueing discipline (qdisc)
The packet enters the queueing discipline attached to the transmit queue of the
net_device. The default is pfifo_fast; tc commands can replace it with fq, htb, or other schedulers.Netfilter and iptables hooks
Netfilter inserts five hook points into the packet path where registered callbacks can inspect, modify, or drop packets.iptables, nftables, and connection tracking all use these hooks.
| Hook | Location | Typical use |
|---|---|---|
NF_INET_PRE_ROUTING | Before routing decision | DNAT, raw table |
NF_INET_LOCAL_IN | After routing, destined for local socket | INPUT chain |
NF_INET_FORWARD | Forwarded packets | FORWARD chain |
NF_INET_LOCAL_OUT | Locally generated outbound packets | OUTPUT chain |
NF_INET_POST_ROUTING | After routing, before transmission | SNAT, MASQUERADE |
nf_register_net_hook() return one of NF_ACCEPT, NF_DROP, NF_STOLEN, NF_QUEUE, or NF_REPEAT.
Connection tracking (
nf_conntrack) requires NF_INET_PRE_ROUTING and NF_INET_LOCAL_OUT hooks to track the full flow state. Disabling those hooks or nf_conntrack entirely will break stateful NAT and many firewall rules.eBPF and XDP
eXpress Data Path (XDP) attaches eBPF programs to a hook in the device driver, before the kernel allocates ansk_buff. This makes it possible to drop, redirect, or modify packets at wire speed with minimal per-packet overhead.
- Native (driver): the program runs inside the NIC driver’s receive function, before
sk_buffallocation. Requires driver support and offers the highest performance. - Generic (skb): the program runs on
sk_buffobjects, available on all drivers but with higher overhead. - Offloaded: some SmartNICs can execute the eBPF bytecode directly on the NIC hardware.
Netlink for kernel-userspace communication
Netlink is a socket-based IPC mechanism that the kernel uses to expose configuration and monitoring interfaces to user space. It is the primary transport foriproute2 (ip, ss, tc), ethtool, and many other network utilities.
NETLINK_GENERIC) allows subsystems to register named families and commands without consuming a fixed Netlink protocol number. Most modern kernel-userspace interfaces — including nl80211 (Wi-Fi), devlink, and ethtool netlink — use Generic Netlink.
Further reading
Memory management
Learn how sk_buff allocations interact with the slab allocator and GFP flags.
Locking and concurrency
Understand the spinlocks and RCU patterns used throughout the network stack.
