The Linux networking stack is a layered architecture that processes packets from hardware up through protocol handlers to userspace sockets, and back down in reverse for transmission. Understanding how packets move through the stack — and where to intercept or accelerate them — is essential for network driver development, performance tuning, and packet filtering.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DeelerDev/linux/llms.txt
Use this file to discover all available pages before exploring further.
The socket buffer: sk_buff
Every packet in the Linux networking stack is represented by astruct sk_buff (skb). This is the central data structure that carries a packet and its metadata through every layer of the stack.
head, data, tail, end — define a linear buffer. Protocol headers are prepended or stripped by adjusting data. Additional data can be stored in page fragments (skb_shinfo(skb)->frags[]) to support zero-copy I/O.
skb_put() extends the data area toward end; skb_push() extends it toward head (prepending a header). skb_pull() removes bytes from the front (stripping a header during receive).Protocol layers
The stack follows the classic layered model. On receive, each layer strips its header and passes the skb upward; on transmit, each layer prepends its header and passes the skb downward.L2 — Ethernet and the link layer
L2 — Ethernet and the link layer
The network driver delivers received skbs to VLAN tags are handled here;
netif_receive_skb(). The Ethernet header is examined and the appropriate L3 handler is called based on skb->protocol (ETH_P_IP, ETH_P_IPV6, ETH_P_ARP, etc.).skb_vlan_tag_present() checks whether an 802.1Q tag was offloaded by the NIC or is still inline in the header.L3 — IP processing
L3 — IP processing
ip_rcv() is the entry point for IPv4. It validates the IP header, performs routing lookups via ip_route_input(), and either delivers the packet locally (ip_local_deliver()) or forwards it (ip_forward()).L4 — TCP and UDP
L4 — TCP and UDP
TCP (
tcp_v4_rcv()) locates the matching socket via a hash table lookup, validates the segment, and inserts it into the socket’s receive queue. The TCP state machine handles SYN/ACK, retransmission timers, and congestion control.UDP (udp_rcv()) is simpler: the socket is found and the datagram is enqueued. If no socket is found, an ICMP port-unreachable message is sent.Netfilter hooks and iptables/nftables
Netfilter inserts five hook points into the packet path. Packet filtering frameworks (iptables, nftables, conntrack) register callbacks at these hooks:| Hook | Location |
|---|---|
NF_INET_PRE_ROUTING | After L2, before routing decision |
NF_INET_LOCAL_IN | After routing, for locally-destined packets |
NF_INET_FORWARD | For packets being forwarded |
NF_INET_LOCAL_OUT | Locally generated packets, before routing |
NF_INET_POST_ROUTING | After routing, before transmission |
nf_conntrack) maintains a table of active connections. Each packet for a tracked flow is associated with a nf_conn entry that records the connection state, enabling stateful filtering and NAT.
eBPF and XDP
eBPF (extended Berkeley Packet Filter) and XDP (eXpress Data Path) enable programmable packet processing without modifying the kernel. XDP runs eBPF programs at the earliest possible point in the receive path — either in the NIC driver (native XDP) or just after the skb is allocated (generic XDP). This makes it the fastest packet processing option available in-kernel.XDP_DROP discards the packet immediately; XDP_PASS continues normal processing; XDP_TX retransmits on the same interface; XDP_REDIRECT sends to another interface or CPU.
TC eBPF programs attach at the traffic control layer (after skb allocation) and can inspect or modify packets in both ingress and egress directions with full access to the skb.
Network device and NAPI polling
Network devices are registered and managed throughstruct net_device. Drivers call alloc_netdev() to allocate the structure, then register_netdev() to make it visible to the system.
Traffic control
The Linux traffic control (tc) subsystem implements queueing disciplines (qdiscs) that control how packets are enqueued and dequeued on a network device’s TX path. Common qdiscs:| Qdisc | Use case |
|---|---|
pfifo_fast | Default; three-band priority FIFO |
fq_codel | Flow-aware fair queuing with AQM; reduces bufferbloat |
htb | Hierarchical Token Bucket; rate limiting and shaping |
tbf | Token Bucket Filter; simple rate limiting |
netem | Network emulation; adds delay, loss, and reordering |
Key networking syscalls
The POSIX socket API maps to in-kernel operations as follows:| Syscall | Kernel entry | Purpose |
|---|---|---|
socket() | __sys_socket() | Create a socket and allocate a struct socket |
bind() | __sys_bind() | Assign a local address |
connect() | __sys_connect() | Initiate a connection (TCP) or set remote addr (UDP) |
listen() | __sys_listen() | Mark socket as passive; set backlog |
accept() | __sys_accept4() | Dequeue a completed connection |
send() / sendmsg() | sock_sendmsg() | Transmit data |
recv() / recvmsg() | sock_recvmsg() | Receive data |
sendmsg() with MSG_ZEROCOPY allows the kernel to DMA data directly from userspace buffers, avoiding a copy into kernel memory for large transmissions on supported NICs.
Memory management
How the kernel allocates and frees socket buffers and network data structures.
Locking primitives
RCU usage in routing tables and spinlocks in the socket and netdev layer.
Filesystems
Socket file descriptors, the VFS file object model, and splice/sendfile internals.
Scheduling
Softirq scheduling, NAPI budget interaction with the CPU scheduler.
