Skip to content

Troubleshooting

Never use pool until root issue is resolved and understood

Look at the failure. If spread across all devices it likely represents a common failure across the system and not a disk itself. Memory, controller cards, power are all suspect here.

Individual disk failures will be evident.

ZFS-9000-8A One or more devices has experienced an error

ZFS has detected data corruption in the pool.

status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A

Check SMART status

# Always run smartctl to verify disk health of pool. Look at the attributes and
# determine if anything is close to, or over, thresholds. Also compare to
# similar disks in the pool.
lsblk
smartctl -a /dev/{DEVICE}
> ...
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> ...
> SMART Error Log Version: 1
> No Errors Logged

Run Memtest86

Run memtest from USB disk and stress test memory. Errors should be immediately apparent (either freezing or reported errors). Complete at least one full test cycle in the conditions that created the error.

Run mprime

Run mprime from StresKit USB disk. mprime is headless prime95. Tests will explicitly say completed and should not freeze.

Rollback updates

Updates containing microcode updates, AGESA changes, etc. may cause issues. Rollback updates and re-test to confirm they do not cause issues.

Run through Crash troubleshootin.

Resolve ZFS error

Determine what files are affected

Either delete or restore these files from backup or previous snapshots.

Encrypted pools must be mounted for files to be shown

zpool status
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>   corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>   entire pool from backup.
>    see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
> ...
> errors: 2 data errors, use '-v' for a list

# Encrypted pools require unlocking.
$ zpool status -v
> ...
> errors: List of errors unavailable: permission denied

zfs mount -a -l

zpool status -v
> ...
> errors: Permanent errors have been detected in the following files:
>
>         /hundo/some_file

rm /hundo/some_file

See syncing datasets to restore dataset to a previous snapshot.

Re-scrub ZFS pool.

Never use pool until root issue is resolved and understood

# Run the scrub and verify no additional errors are found.
zpool clear hundo
zpool scrub hundo

# If clean and root issued resolve, remount the pool.
zfs mount -a -l