The Virtual File System (VFS) is the software layer in the kernel that presents a uniform filesystem interface to user-space programs and provides an abstraction within the kernel that allows different filesystem implementations — ext4, btrfs, xfs, tmpfs, NFS, and many others — to coexist. System calls such asDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/deelerdev/linux/llms.txt
Use this file to discover all available pages before exploring further.
open(2), read(2), write(2), and stat(2) always go through the VFS, which dispatches them to the appropriate filesystem-specific code.
Core data structures
The VFS is built around four primary objects. Each maps to a concept in a Unix filesystem:struct super_block — the mounted filesystem
struct super_block — the mounted filesystem
A
super_block represents one mounted instance of a filesystem. It holds the block size, filesystem flags, a pointer to the root dentry, and the super_operations dispatch table. When a filesystem is unmounted, the VFS calls put_super on the superblock to allow cleanup.struct inode — a filesystem object
struct inode — a filesystem object
An
inode represents a file, directory, symbolic link, device node, FIFO, or socket. It stores the object’s permissions, size, timestamps, and a pointer to inode_operations. A single inode may be referenced by multiple dentry objects (hard links). Inodes for block-device filesystems are cached in memory and written back to disk when dirty.struct dentry — a directory entry
struct dentry — a directory entry
A
dentry is the kernel’s in-memory representation of a pathname component. The dentry cache (dcache) translates a pathname like /home/user/file.c into a chain of dentries terminating at an inode. Dentries are never written to disk; they exist purely as a performance cache.struct file — an open file description
struct file — an open file description
A
file object is created when a process calls open(2). It holds the current file offset, open flags, and a pointer to file_operations. Closing the last file descriptor referencing a file object calls release and drops the reference on the underlying dentry and inode.super_operations
super_operations is the dispatch table that the VFS uses to manage a mounted filesystem instance:
inode_operations
inode_operations describes how the VFS manipulates individual inodes:
lookup is among the most important methods — it is called whenever the VFS needs to resolve a pathname component against a parent directory inode. A successful lookup must call d_add() to install the found inode into the dentry.
file_operations
file_operations defines how the VFS operates on open file descriptions. As of kernel 4.18+:
read_iter/write_iter rather than read/write because iov_iter supports scatter-gather I/O and integrates with the io_uring subsystem.
Registering a filesystem
To make a filesystem mountable, you register it with the VFS usingregister_filesystem:
struct file_system_type describes the filesystem to the VFS:
init_fs_context is the modern entry point for mounting. It populates a fs_context with filesystem-specific state, then the VFS calls get_tree to obtain or create a super_block. You can see all registered filesystems in /proc/filesystems.
Define file_system_type
Populate the
name, fs_flags, init_fs_context, and kill_sb fields. Set owner to THIS_MODULE.Implement init_fs_context
Allocate a private context structure, assign
fc->ops to your fs_context_operations, and set fc->fs_private.Implement get_tree
Call one of the helpers —
get_tree_bdev (block device filesystem), get_tree_nodev (pseudo filesystem), or get_tree_single (singleton mount) — to fill fc->root.Major filesystem implementations
- ext4
- btrfs
- xfs
- tmpfs
ext4 is the default Linux filesystem for most distributions. It is a journalling filesystem descended from ext2 and ext3, supporting extents (contiguous block ranges), online defragmentation, delayed allocation, and large volumes. The journal writes metadata changes to a circular log before applying them, ensuring consistency after a crash.Key features: extents,
dir_index (htree directories), flex_bg, 64bit mode for volumes over 16 TB, inline_data for small files, metadata_csum.Page cache and writeback
The page cache is the kernel’s unified buffer for file data. When a process reads from a file, the VFS checks the page cache first; if the page is present and up to date, no I/O occurs. When data is written, the page is marked dirty in the cache and the write returns to user space immediately. A background thread (bdflush/pdflush, now replaced by per-backing-device writeback threads managed by bdi_writeback) periodically scans for dirty pages and writes them back to storage. The vm.dirty_ratio and vm.dirty_background_ratio sysctl parameters control how much dirty data is allowed before writeback is triggered.
Further reading
Memory management
The page cache is backed by the memory management subsystem; understanding zones and reclaim is essential.
Locking and concurrency
VFS operations use a mix of mutexes, spinlocks, and RCU to protect shared data structures.
