A gentle introduction to making and mounting Linux filesystems
For some time in my life as a Linux user, I have felt confused about the meaning of "mounting" filesystems. I copy and pasted the commands alright, but it always felt somewhat arbitrary. If you feel this way, or are curious to learn more about playing with devices, filesystems, and mounts, then this article may be of interest to you.
When using a Linux system, the main abstraction we interact with is a single unified hierarchy of files growing tree-like from the root or /. What this hides is that there can be multiple storage devices managed by multiple filesystems. The indirection of having the kernel associate "mount points" in the filesystem hierarchy to the filesystem mounted there decouples the contents of a filesystem on a device from its location in the hierarchy. This significantly increases the flexibility of how we can organize our file systems. This becomes especially desirable when we start to factor in things like NFS or NBD which can be shared by multiple client systems. If the location it was accessible at were somehow baked into each filesystem, that would make using such systems relatively more cumbersome.
Mounting also serves a second, simpler purpse. Fundamentally, filesystems serve to manage block devices--devices that can be accessed in fixed size blocks. A filesystem is an abstraction implemented in the bits of the block device, and as we write to it, it persists as a structure on the device. Whether or not the computer is even turned on, the bit pattern representing the filesystem is present. However, until we mount the filesystem, it is in a latent state. Mounting triggers the kernel to create all sorts of ephemeral state like hints at where free space might be, a list of pages to writeback, background threads for deferred operations and much more. Mounting transitions the filesystem to a living state where its files are available for use via filesystem APIs.
Basic Tour
With mounting's dual purposes in mind, let's consider a simple example system with a handful of devices and filesystems from a few different perspectives. This will hopefully shed light on the more practical implications of this model of filesystems. We will modify the state of the devices, mounts and filesystems on the system, while examining how these changes play out in the views we have of the system's state.
the block devices
Suppose that the block devices on a system (a qemu VM on my development machine) currently look like:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 253:0 0 10G 0 disk /The only device is "vda" and it is mounted at the root of the filesystem. So (setting aside fake filesystems like /proc) every file we interact with is on this device. We can't create a new filesystem on top of this existing one, so to experiment, we will need to create some new devices. Just to show off the options for doing so, we can create a loop device (a fake device backed by a file on some filesystem), and carve it up into two logical volumes with LVM (Logical Volume Management, a userspace system taking advantage of Linux's "device mapper" to create virtual block devices on top of other block devices.)
# create a "slightly bigger than 2G" file to serve as storage for our loop device
$ truncate -s 2052M /tmp/backing-file
# create a loop device using the file
$ losetup -f backing-file
# inspect the loop device we created
$ losetup
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop0 0 0 0 0 /tmp/backing-file 0 512
# observe that it appears as a new block device
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 2G 0 loop
vda 253:0 0 10G 0 disk /
# create an LVM volume group to further cut up this new device
$ vgcreate vg0 /dev/loop0
# create two 1G LVM logical volumes in the volume group
$ lvcreate -l1G vg0 -n lv0
$ lvcreate -l1G vg0 -n lv1
$ lsblk
$ lsblk
Tue 5 3:49AM
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 2G 0 loop
├─vg0-lv0 252:0 0 1G 0 lvm
└─vg0-lv1 252:1 0 1G 0 lvm
vda 253:0 0 10G 0 disk /the "raw" filesystems
Now that we have some blank devices, we can create filesystems on them.
$ mkfs.btrfs -f /dev/vg0/lv0
btrfs-progs v5.14.1
See http://btrfs.wiki.kernel.org for more information.
Detected a SSD, turning off metadata duplication. Mkfs with -m dup if you want to force metadata duplication.
Label: (null)
UUID: 4eb2c574-fdd2-4399-b2ca-f773b489f251
Node size: 16384
Sector size: 4096
Filesystem size: 1.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: single 8.00MiB
System: single 4.00MiB
SSD detected: yes
Zoned device: no
Incompat features: extref, skinny-metadata
Runtime features:
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 1.00GiB /dev/vg0/lv0
$ mkfs.ext4 /dev/vg0/lv1
mke2fs 1.46.4 (18-Aug-2021)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: b11223b2-14dc-4b8b-b9c9-10b2787cfbd0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: doneWe haven't yet mounted these filesystems, so they are currently pretty bare. They will contain just the basic structures put on the disk by mfks. However, they are valid filesystems and we can interact with them directly, just not with any files on them, yet. For example, we can see some high level information about the btrfs filesystem on /dev/vg0/lv0:
btrfs filesystem show /dev/vg0/lv0
Label: none uuid: 4eb2c574-fdd2-4399-b2ca-f773b489f251
Total devices 1 FS bytes used 128.00KiB
devid 1 size 1.00GiB used 20.00MiB path /dev/mapper/vg0-lv0mounts
With our filesystems instantiated on the two devices, we can start to play with mounting. First let's look at the current mounted filesystems by reading the special file /proc/mounts. Another useful tool for visualizing them in a more human friendly way is findmnt:
# see all mounts, mostly virtual file systems implemented by Linux.
$ cat /proc/mounts
/dev/root / ext4 rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=4062532k,nr_inodes=1015633,mode=755 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sys /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
run /run tmpfs rw,nosuid,nodev,relatime,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0
0
shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
tracefs /sys/kernel/debug/tracing tracefs rw,relatime 0 0
none /sys/fs/cgroup cgroup2 rw,relatime 0 0
$ findmnt
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
├─/dev devtmpfs devtmpfs
rw,relatime,size=4062532k,nr_inodes=1015633,mode=755
│ ├─/dev/pts devpts devpts
rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ └─/dev/shm shm tmpfs rw,nosuid,nodev,relatime
├─/proc proc proc
rw,nosuid,nodev,noexec,relatime
├─/sys sys sysfs
rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup none cgroup2 rw,relatime
│ └─/sys/kernel/debug debugfs debugfs rw,relatime
│ └─/sys/kernel/debug/tracing tracefs tracefs rw,relatime
├─/run run tmpfs
rw,nosuid,nodev,relatime,mode=755
└─/tmp tmpfs tmpfs rw,nosuid,nodev,relatime
$ findmnt --real
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
$ findmnt -t ext4
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
$ findmnt -t btrfsLet's mount our two new filesystems and see what happens. A mount roots the files stored by the filesystem at a point in the file hierarchy, called the mount point. So if a file system has a directory d with files f1 and f2 in it, and is mounted at /mnt/m1, then we can see those files with ls /mnt/m1/d. So to mount the filesystems, we need to pick a mount point, typically in /mnt for filesystems besides the root filesystem.
# make the mountpoints (in the ext4 root filesystem)
$ mkdir -p /mnt/m0
$ mkdir -p /mnt/m1
# mount the filesystems
$ sudo mount -o noatime /dev/vg0/lv0 /mnt/m0
$ sudo mount /dev/vg0/lv1 /mnt/m1And now, the new mounts look like this:
$ cat /proc/mounts
/dev/root / ext4 rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=4062532k,nr_inodes=1015633,mode=755 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sys /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
run /run tmpfs rw,nosuid,nodev,relatime,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0
0
shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
tracefs /sys/kernel/debug/tracing tracefs rw,relatime 0 0
none /sys/fs/cgroup cgroup2 rw,relatime 0 0
/dev/mapper/vg0-lv0 /mnt/m0 btrfs
rw,noatime,ssd,space_cache,subvolid=5,subvol=/ 0 0
/dev/mapper/vg0-lv1 /mnt/m1 ext4 rw,relatime 0 0
$ findmnt
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
├─/dev devtmpfs devtmpfs
rw,relatime,size=4062532k,nr_inodes=1015633,mode=755
│ ├─/dev/pts devpts devpts
rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
│ └─/dev/shm shm tmpfs
rw,nosuid,nodev,relatime
├─/proc proc proc
rw,nosuid,nodev,noexec,relatime
├─/sys sys sysfs
rw,nosuid,nodev,noexec,relatime
│ ├─/sys/fs/cgroup none cgroup2 rw,relatime
│ └─/sys/kernel/debug debugfs debugfs rw,relatime
│ └─/sys/kernel/debug/tracing tracefs tracefs rw,relatime
├─/run run tmpfs
rw,nosuid,nodev,relatime,mode=755
├─/mnt/m1 /dev/mapper/vg0-lv1 ext4 rw,relatime
├─/mnt/m0 /dev/mapper/vg0-lv0 btrfs
rw,noatime,ssd,space_cache,subvolid=5,subvol=/
└─/tmp tmpfs tmpfs
rw,nosuid,nodev,relatime
$ findmnt --real
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
├─/mnt/m1 /dev/mapper/vg0-lv1 ext4 rw,relatime
└─/mnt/m0 /dev/mapper/vg0-lv0 btrfs
rw,noatime,ssd,space_cache,subvolid=5,subvol=/
$ findmnt -t ext4
TARGET SOURCE FSTYPE OPTIONS
/ /dev/vda ext4 rw,relatime
└─/mnt/m1 /dev/mapper/vg0-lv1 ext4 rw,relatime
$ findmnt -t btrfs
TARGET SOURCE FSTYPE OPTIONS
/mnt/m0 /dev/mapper/vg0-lv0 btrfs
rw,noatime,ssd,space_cache,subvolid=5,subvol=/the file hierarchy
Finally, let's see how mounting and the file hierarchy interact by writing some actual files on the filesystems, then moving the mounts around.
# nothing in there yet
$ ls /mnt/m0
$ ls /mnt/m1
lost+found
# write 10 directories with 10 files each into both.
$ for fs in $(seq 0 1); do for i in $(seq 0 9); do for j in $(seq 0 9); do mkdir -p /mnt/m$fs/d$i; echo "$fs $i $j" > /mnt/m$fs/d$i/f$j; done; done; done
# read a couple files
$ cat /mnt/m0/d4/f3
0 4 3
$ cat /mnt/m1/d2/f8
1 2 8
# unmount the ext4 fs
$ umount /mnt/m1
# try to read the file, with the mount point gone, Linux will once again look
# in the root filesystem, where this path does not exist.
$ cat /mnt/m1/d2/f8
cat: /mnt/m1/d2/f8: No such file or directory
$ ls /mnt/m1
# mount the ext4 fs in a new spot
$ mkdir /mnt/m0/m1
$ mount /dev/vg0/lv1 /mnt/m0/m1
# the file is back, relative to the new mount point!
$ cat /mnt/m0/m1/d2/f8
1 2 8Notice how the structure of the filesystem on /dev/vg0/lv1 persisted on the device while the filesytem was unmounted (and inaccessible to normal filesystem apis), and reappeared at the new location when it was once again mounted.
Finally, to hammer home the point that the filesystem exists when it isn't mounted, we will find where a file's data lives on the device, unmount the file system, and read it directly off the device:
# list the extents of the file; units are 512 bytes (historical block size)
$ xfs_io -c fiemap /mnt/m1/d0/f0
0: [0..7]: 1314816..1314823
# unmount the filesystem
$ umount /mnt/m1
# can't read it directly, via the filesystem any more
$ cat /mnt/m1/d0/f0
cat: /mnt/m1/d0/f0: No such file or directory
# read the 6 bytes of the file from that offset in the device via the device file
$ dd if=/dev/vg0/lv1 bs=1 count=6 skip=$((1314816 * 512)) 2>/dev/null
1 0 0A little deeper
With the basics under our belt, we can now consider some less straightforward aspects of the mounting abstraction, and hopefully they will make decent sense in context. The exceptions prove the rule, after all.
interesting failure cases
- If we try to mount a device without making a filesystem on it first, the kernel doesn't know what to do with the device, and it fails.
mount: /mnt/m1: wrong fs type, bad option, bad superblock on /dev/mapper/vg0-lv1, missing codepage or helper program, or other error. - If we keep a file in the filesystem busy, perhaps by having it open for reading or writing by this silly application:
then we are unable to unmount the filesystem, until the file is closed, since we can't safely tear down the live state of the filesystem while a userspace process is still using it.
#include <fcntl.h> #include <unistd.h> int main() { int fd = open("/mnt/m0/d0/f0", O_RDONLY); while (1) { sleep(60); } return 0; }$ ./keep-open & [1] 4186 $ umount /mnt/m0 umount: /mnt/m0: target is busy. # so kill the thing keeping it busy $ kill 4186 # now we can unmount $ umount /mnt/m0
the root filesystem
I intentionally crafted an example where we created some toy filesystems from scratch that we could mount and unmount at will. However, you might wisely be wondering how this works for the root filesystem. We can't exactly run /usr/bin/mount when / is not even mounted yet! This is resolved by the bootloader and kernel cooperating during boot to give the root filesystem special treatment. An extra wrinkle is that most systems now boot with an initrd, so in practice the process is:
- boot into initrd fs on a ram disk block device.
- the initrd prepares the real root mount at some mountpoint like "/newroot"
- the initrd starts the real system by using a special "switch root" operation to change the root to the system's "real" root.
remount
As we saw above, you can't unmount a file system that's actively in use, but luckily, filesystems have a special codepath for "remounting". This is typically done to change the mount options on a live filesystem. Naturally, not every transition is necessarily supported, as it can be difficult to enable a feature on a mounted filesystem with live data structures.
$ < /proc/mounts grep /mnt/m0
/dev/mapper/vg0-lv0 /mnt/m0 btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/ 0 0
$ mount -o remount,ro /mnt/m0
$ < /proc/mounts grep /mnt/m0
/dev/mapper/vg0-lv0 /mnt/m0 btrfs ro,relatime,ssd,space_cache,subvolid=5,subvol=/ 0 0
$ mount -o remount,rw /mnt/m0
$ < /proc/mounts grep /mnt/m0
/dev/mapper/vg0-lv0 /mnt/m0 btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/ 0 0bind mounts
We can create additional mounts of the same single live filesystem at various places in the hierarchy. This is called bind mounting.
$ mount /dev/vg0/lv0 /mnt/m0
$ cat /mnt/m0/d4/f3
0 4 3
$ mkdir /mnt/m0-2
$ mount /dev/vg0/lv0 /mnt/m0-2
$ cat /mnt/m0-2/d4/f3
0 4 3
$ touch /mnt/m0-2/foo
$ ls /mnt/m0/foo
/mnt/m0/fooAdditionally, we can do this with mounted files directly using mount --bind
$ mkdir /mnt/d0-2
$ mount --bind /mnt/m0/d0 /mnt/d0-2
$ touch /mnt/m0/d0/foo
$ ls /mnt/d0-2/foo
/mnt/d0-2/foo