Case study: recovery of a corrupted 12 TB multi-device pool

Article URL: https://github.com/kdave/btrfs-progs/issues/1107 Comments URL: https://news.ycombinator.com/item?id=47656303 Points: 108 # Comments: 51

Case study: recovery of a corrupted 12 TB multi-device pool
Case study: recovery of a corrupted 12 TB multi-device pool Photo: Hacker News

Hello, and thanks in advance for reading.

A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks) left the extent tree and free space tree in a state that no native repair path could resolve.

A subsequent btrfs check --repair run entered an infinite loop of 46,000+ commits with zero net progress, rotating the 4 backup_roots slots past any pre-crash rollback point.

Recovery eventually succeeded through a set of 14 custom C tools built against the internal btrfs-progs API, with a final data loss of about 7.2 MB out of 4.59 TB (0.00016 percent).

The pool is now fully operational.

I wrote the case up in a structured way that covers environment, timeline, root cause classification, the bulletproof safety criterion we derived empirically, and 9 specific areas where a relatively small upstream change would have prevented the need for most of the custom tooling.

https://github.com/msedek/btrfs_fixes/blob/main/INCIDENT-ANALYSIS.md
The nine proposed improvement areas, in order of expected impact on operators hitting similar cases:
All 14 custom tools, along with the single-line patch to alloc_reserved_tree_block , are published in GPL-2.0 form at:
Every tool has a read-only scan mode by default and a --write mode that is opt-in.

The README.md explains the execution order used during recovery.

I am not proposing these as upstream patches directly.

Most of the proposals above are not single function changes and getting any of them accepted would require a design discussion with people more familiar than me with the subsystems involved.

Sharing the reference implementation felt more useful than opening nine separate pull requests without context.

How I would like this to be received
Please treat this as input, not as a demand.

If any single observation or proposal is worth pursuing, I am happy to expand the analysis, provide additional evidence from the session logs, or test any proposed patch against the class of damage we hit.

If none of it is useful, no problem, and thanks for the tool set that got us most of the way there.

Source: This article was originally published by Hacker News

Read Full Original Article →

Share this article

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

Maximum 2000 characters