ZFS on Linux — Thyago Seugling

I lost data once. Not catastrophically — a single ext4 volume quietly developing bad sectors while I assumed everything was fine, because nothing was loudly failing. By the time I noticed, the damage was done. That experience sent me down a rabbit hole that ended at ZFS, and I haven't looked at another filesystem for storage since.

Why ZFS?

Most filesystems trust that the data written to disk is the data you'll read back. ZFS doesn't. It checksums every block, every read, and every write. Silent data corruption — the kind that accumulates invisibly over years — is caught and, with redundancy, corrected automatically. For a home lab storing years of photos, documents, and media, this matters.

Copy-on-Write: data is never overwritten in-place, eliminating torn writes
Checksumming: every block is verified on read; corruption is detected, not silently passed through
Snapshots: atomic, near-instantaneous, and space-efficient — the best backup primitive available
Compression: transparent LZ4 compression is essentially free on modern hardware
Self-healing: with redundancy, ZFS scrubs find and repair errors automatically

Pool Setup

I run a mirror pool — two drives, fully redundant. Not the storage efficiency of RAIDZ, but the fastest reads and the simplest recovery path if a drive fails.

# Install ZFS (Ubuntu/Debian)
sudo apt install zfsutils-linux

# Create a mirror pool using device IDs (never /dev/sdX — they shift)
sudo zpool create tank mirror \
  /dev/disk/by-id/ata-WDC_WD40EFRX-XXXXXX \
  /dev/disk/by-id/ata-WDC_WD40EFRX-YYYYYY

# Verify
zpool status tank

Always use /dev/disk/by-id paths when creating pools. Device names like /dev/sdb are assigned at boot and can change. Your pool should survive a kernel update without your drives getting shuffled.

Enable Compression

# Enable LZ4 compression on the pool root
sudo zfs set compression=lz4 tank

# Verify compression ratio after some data is written
zfs get compressratio tank

Datasets & Snapshots

Datasets are ZFS's way of organizing storage within a pool. Think of them like mountpoints that each inherit or override pool-level properties. I create a dataset per service rather than a single monolithic volume — it makes snapshots surgical.

# Create datasets per service
sudo zfs create tank/nextcloud
sudo zfs create tank/media
sudo zfs create tank/backups

# Take a manual snapshot
sudo zfs snapshot tank/nextcloud@before-upgrade

# List snapshots
zfs list -t snapshot

# Roll back if something went wrong
sudo zfs rollback tank/nextcloud@before-upgrade

For automated snapshots, I use sanoid — a policy-driven snapshot manager that handles retention automatically. Fifteen-minute snapshots during the day, hourly for 24 hours, daily for 30 days. The peace of mind is disproportionate to the setup effort.

Performance Notes

ZFS is memory-hungry by design — the ARC (Adaptive Replacement Cache) will consume available RAM to cache frequently-accessed data. On a dedicated NAS this is purely a benefit. On a machine running other workloads, set an ARC maximum:

# Limit ARC to 4GB (add to /etc/modprobe.d/zfs.conf)
options zfs zfs_arc_max=4294967296

The initial learning curve is steeper than ext4 or xfs. The mental model is different. But once it clicks, going back feels like removing your seatbelt. I run ZFS on every machine where data integrity matters — which, increasingly, is all of them.

ZFS on Linux: Why I Made the Switch and Never Looked Back

Why ZFS?

Pool Setup

Enable Compression

Datasets & Snapshots

Performance Notes