Linux Kernel Networking Stack Architecture

The Linux networking stack is a layered architecture that spans from POSIX socket system calls down to hardware transmit queues. Each layer has a well-defined interface so that protocol implementations, device drivers, and packet-processing extensions can evolve independently. This page walks through the major components from top to bottom.

Socket layer overview

User-space applications interact with the network through the BSD socket API — socket(2), bind(2), connect(2), send(2), recv(2). Inside the kernel, each socket maps to a struct socket (the VFS-facing object) and a struct sock (the protocol-specific transport endpoint).

/* include/linux/net.h */
struct socket {
    socket_state            state;
    short                   type;
    unsigned long           flags;
    struct file            *file;
    struct sock            *sk;
    const struct proto_ops *ops;
    struct socket_wq        wq;
};

proto_ops is the dispatch table for protocol families (AF_INET, AF_INET6, AF_UNIX, …). When user space calls send(2), the VFS routes through the file descriptor to socket->ops->sendmsg, which in turn calls the transport-layer send function on struct sock.

Network device interface

Every network interface — physical NIC, virtual device, tunnel, or loopback — is represented by struct net_device defined in include/linux/netdevice.h. Drivers allocate a net_device with alloc_netdev(), fill in the net_device_ops function table, and register it with register_netdev(). Key fields in net_device_ops:

/* include/linux/netdevice.h (abbreviated) */
struct net_device_ops {
    int         (*ndo_open)(struct net_device *dev);
    int         (*ndo_stop)(struct net_device *dev);
    netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
                                  struct net_device *dev);
    int         (*ndo_set_mac_address)(struct net_device *dev, void *addr);
    void        (*ndo_get_stats64)(struct net_device *dev,
                                   struct rtnl_link_stats64 *stats);
    int         (*ndo_change_mtu)(struct net_device *dev, int new_mtu);
};

ndo_start_xmit is the transmit hot path: the kernel passes a fully built sk_buff to the driver, which places it on the hardware queue. The driver must return NETDEV_TX_OK unless the queue is genuinely full, in which case it calls netif_stop_queue() before returning NETDEV_TX_BUSY.

sk_buff: the socket buffer

struct sk_buff (socket buffer) is the central data structure for every packet in the kernel. It contains a pointer to the packet payload, a ring of metadata headers, and reference counts for zero-copy paths.

/* Simplified — see include/linux/skbuff.h for full definition */
struct sk_buff {
    struct sk_buff  *next;
    struct sk_buff  *prev;

    struct sock     *sk;
    struct net_device *dev;

    unsigned char   *head;      /* start of allocated buffer */
    unsigned char   *data;      /* start of valid data */
    unsigned char   *tail;      /* end of valid data */
    unsigned char   *end;       /* end of allocated buffer */

    unsigned int    len;        /* length of actual data */
    unsigned int    data_len;   /* bytes in paged frags */
    __u16           protocol;
    __u8            pkt_type;
    /* ... headers, checksum info, timestamps, etc. ... */
};

The four pointers head, data, tail, and end delimit the buffer. Protocol headers are pushed towards head during transmit and pulled away during receive:

/* Reserve headroom for headers during allocation */
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);

/* Push a header into the headroom */
struct ethhdr *eth = (struct ethhdr *)skb_push(skb, ETH_HLEN);

/* Pull a header off the front during receive */
struct ethhdr *eth = (struct ethhdr *)skb_pull(skb, ETH_HLEN);

skb_clone() creates a copy of the metadata struct without copying the payload data, enabling zero-copy forwarding paths.

Packet receive path

Hardware interrupt

The NIC raises a hardware interrupt. The driver’s interrupt handler acknowledges the interrupt, then schedules a NAPI poll by calling napi_schedule(&adapter->napi).

NAPI polling

The kernel’s softirq handler calls the driver’s napi_poll callback, which dequeues packets from the ring buffer, allocates sk_buff structures, and calls napi_gro_receive() or netif_receive_skb().

Protocol demultiplexing

netif_receive_skb() delivers the frame to registered packet_type handlers. Ethernet frames are dispatched by eth_type_trans() to the appropriate L3 protocol handler (IPv4, IPv6, ARP, …).

Transport layer

The L3 handler performs routing, then passes the packet to the L4 handler (TCP, UDP, …), which enqueues it on the socket’s receive buffer and wakes any blocked recv(2) callers.

Packet transmit path

Socket send

User space calls send(2). The socket layer calls into the transport protocol, which segments data, builds sk_buff objects with protocol headers, and calls ip_queue_xmit() (TCP) or udp_send_skb() (UDP).

Routing and neighbour lookup

The IP layer looks up the route via ip_route_output_flow(). If the next-hop MAC address is not cached, the neighbour subsystem (ARP) resolves it.

Queueing discipline (qdisc)

The packet enters the queueing discipline attached to the transmit queue of the net_device. The default is pfifo_fast; tc commands can replace it with fq, htb, or other schedulers.

Driver transmit

dev_hard_start_xmit() calls ndo_start_xmit. The driver maps the sk_buff fragments for DMA, writes the descriptor ring, and kicks the hardware.

Netfilter and iptables hooks

Netfilter inserts five hook points into the packet path where registered callbacks can inspect, modify, or drop packets. iptables, nftables, and connection tracking all use these hooks.

Hook	Location	Typical use
`NF_INET_PRE_ROUTING`	Before routing decision	DNAT, raw table
`NF_INET_LOCAL_IN`	After routing, destined for local socket	INPUT chain
`NF_INET_FORWARD`	Forwarded packets	FORWARD chain
`NF_INET_LOCAL_OUT`	Locally generated outbound packets	OUTPUT chain
`NF_INET_POST_ROUTING`	After routing, before transmission	SNAT, MASQUERADE

Callbacks registered with nf_register_net_hook() return one of NF_ACCEPT, NF_DROP, NF_STOLEN, NF_QUEUE, or NF_REPEAT.

Connection tracking (nf_conntrack) requires NF_INET_PRE_ROUTING and NF_INET_LOCAL_OUT hooks to track the full flow state. Disabling those hooks or nf_conntrack entirely will break stateful NAT and many firewall rules.

eBPF and XDP

eXpress Data Path (XDP) attaches eBPF programs to a hook in the device driver, before the kernel allocates an sk_buff. This makes it possible to drop, redirect, or modify packets at wire speed with minimal per-packet overhead.

/* An XDP program returns one of these verdicts */
enum xdp_action {
    XDP_ABORTED  = 0,   /* drop and increment error counter  */
    XDP_DROP     = 1,   /* drop silently                     */
    XDP_PASS     = 2,   /* hand to normal networking stack   */
    XDP_TX       = 3,   /* retransmit out the same interface */
    XDP_REDIRECT = 4,   /* redirect to another interface/CPU */
};

/* Minimal XDP program that drops all ICMP packets */
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>

SEC("xdp")
int xdp_prog(struct xdp_md *ctx)
{
    void *data     = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return XDP_PASS;

    if (eth->h_proto != __constant_htons(ETH_P_IP))
        return XDP_PASS;

    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end)
        return XDP_PASS;

    if (ip->protocol == IPPROTO_ICMP)
        return XDP_DROP;

    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

XDP programs run in three modes:

Native (driver): the program runs inside the NIC driver’s receive function, before sk_buff allocation. Requires driver support and offers the highest performance.
Generic (skb): the program runs on sk_buff objects, available on all drivers but with higher overhead.
Offloaded: some SmartNICs can execute the eBPF bytecode directly on the NIC hardware.

Netlink for kernel-userspace communication

Netlink is a socket-based IPC mechanism that the kernel uses to expose configuration and monitoring interfaces to user space. It is the primary transport for iproute2 (ip, ss, tc), ethtool, and many other network utilities.

/* Opening a NETLINK_ROUTE socket from user space */
int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);

struct sockaddr_nl addr = {
    .nl_family = AF_NETLINK,
    .nl_pid    = getpid(),
    .nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR,
};
bind(fd, (struct sockaddr *)&addr, sizeof(addr));

Inside the kernel, Generic Netlink (NETLINK_GENERIC) allows subsystems to register named families and commands without consuming a fixed Netlink protocol number. Most modern kernel-userspace interfaces — including nl80211 (Wi-Fi), devlink, and ethtool netlink — use Generic Netlink.

Memory management

Learn how sk_buff allocations interact with the slab allocator and GFP flags.

Locking and concurrency

Understand the spinlocks and RCU patterns used throughout the network stack.

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Linux Kernel Networking Stack Architecture

Socket layer overview

Network device interface

sk_buff: the socket buffer

Packet receive path

Packet transmit path

Netfilter and iptables hooks

eBPF and XDP

Netlink for kernel-userspace communication

Further reading

Memory management

Locking and concurrency

Build docs developers (and LLMs) love

Get Started

Development Guide

Kernel Internals

Driver Development

Administration

Contributing

Documentation Index

​Socket layer overview

​Network device interface

​sk_buff: the socket buffer

​Packet receive path

​Packet transmit path

​Netfilter and iptables hooks

​eBPF and XDP

​Netlink for kernel-userspace communication

​Further reading

Memory management

Locking and concurrency

Build docs developers (and LLMs) love

Socket layer overview

Network device interface

sk_buff: the socket buffer

Packet receive path

Packet transmit path

Netfilter and iptables hooks

eBPF and XDP

Netlink for kernel-userspace communication

Further reading