Btrfs:RAID 5 Rsync Freeze ========================= :author: Aaron Ball :email: nullspoon@iohq.net My server's _/home/_ directory is a btrfs RAID 5, spanning three drives (I did a blog post about it Btrfs:RAID_Setup[here]). Everything worked fine, until I used rsync to sync my files from my laptop to my server. At that point, the sync would go well for a little while and then slow to a crawl. I couldn't cancel the sync with a ctrl+c. If I could get on my server over ssh, I'd find that one of my cpus was pegged at 100%. Sometimes though it got so bogged down I couldn't even get to the server at all. If I were already on the server and I did a kill -9 on rsync, it'd go defunct. I checked my logs after trying to umount /home/ and found... ---- Nov 03 12:01:18 zion kernel: device label home devid 1 transid 1173 /dev/sdb Nov 03 12:01:19 zion kernel: btrfs: disk space caching is enabled Nov 03 12:11:53 zion kernel: INFO: task umount:1668 blocked for more than 120 seconds. Nov 03 12:11:53 zion kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 03 12:11:53 zion kernel: umount D ffff880037afbc60 0 1668 1653 0x00000000 Nov 03 12:11:53 zion kernel: ffff880037afbbd0 0000000000000086 0000000000014500 ffff880037afbfd8 Nov 03 12:11:53 zion kernel: ffff880037afbfd8 0000000000014500 ffff8800aa0caa30 0000000000000010 Nov 03 12:11:53 zion kernel: 000000000d6fffff ffff880037afbb98 ffffffff8113a911 ffff8800afedb728 Nov 03 12:11:53 zion kernel: Call Trace: Nov 03 12:11:53 zion kernel: [<ffffffff8113a911>] ? free_pcppages_bulk+0x3b1/0x3f0 Nov 03 12:11:53 zion kernel: [<ffffffff81132700>] ? filemap_fdatawait+0x30/0x30 Nov 03 12:11:53 zion kernel: [<ffffffff814e1029>] schedule+0x29/0x70 Nov 03 12:11:53 zion kernel: [<ffffffff814e12cf>] io_schedule+0x8f/0xe0 Nov 03 12:11:53 zion kernel: [<ffffffff8113270e>] sleep_on_page+0xe/0x20 Nov 03 12:11:53 zion kernel: [<ffffffff814ddb5b>] __wait_on_bit_lock+0x5b/0xc0 Nov 03 12:11:53 zion kernel: [<ffffffff8113284a>] __lock_page+0x6a/0x70 Nov 03 12:11:53 zion kernel: [<ffffffff81084800>] ? wake_atomic_t_function+0x40/0x40 Nov 03 12:11:53 zion kernel: [<ffffffff81141fa3>] truncate_inode_pages_range+0x613/0x660 Nov 03 12:11:53 zion kernel: [<ffffffff81142005>] truncate_inode_pages+0x15/0x20 Nov 03 12:11:53 zion kernel: [<ffffffffa07df172>] btrfs_evict_inode+0x42/0x380 [btrfs] Nov 03 12:11:53 zion kernel: [<ffffffff811b97b0>] evict+0xb0/0x1b0 Nov 03 12:11:53 zion kernel: [<ffffffff811b98e9>] dispose_list+0x39/0x50 Nov 03 12:11:53 zion kernel: [<ffffffff811ba56c>] evict_inodes+0x11c/0x130 Nov 03 12:11:53 zion kernel: [<ffffffff811a1cc8>] generic_shutdown_super+0x48/0xe0 Nov 03 12:11:53 zion kernel: [<ffffffff811a1f22>] kill_anon_super+0x12/0x20 Nov 03 12:11:53 zion kernel: [<ffffffffa07a8ee6>] btrfs_kill_super+0x16/0x90 [btrfs] Nov 03 12:11:53 zion kernel: [<ffffffff811a22fd>] deactivate_locked_super+0x3d/0x60 Nov 03 12:11:53 zion kernel: [<ffffffff811a28e6>] deactivate_super+0x46/0x60 Nov 03 12:11:53 zion kernel: [<ffffffff811bdeaf>] mntput_no_expire+0xef/0x150 Nov 03 12:11:53 zion kernel: [<ffffffff811bf0b1>] SyS_umount+0x91/0x3b0 Nov 03 12:11:53 zion kernel: [<ffffffff814ea5dd>] system_call_fastpath+0x1a/0x1f ---- The only way to solve the problem was to perform a restart. After that, the problem would come back as soon as I started rsync again. [[the-solution]] == The Solution I hunted around for a while until I finally just searched for the name of the pegged process, **btrfs-endio-wri**, and cpu time. It turns out, the btrfs folks have https://btrfs.wiki.kernel.org/index.php/Gotchas[a page] detailing a list of current "gotchas" btrfs has. This issue was one of them. They describe it as <pre> Files with a lot of random writes can become heavily fragmented (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM. ... Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files. </pre> One of the best parts of rsync is that is syncs deltas instead of resyncing the entire file. What does that result in? Lots of little random writes. Sounds like a match to me. **To fix this**, I defragged all of /home/ (with _compression=lzo_ of course :) ), and remounted using the *autodefrag* option. Now I can run rsync with no problems. One last thing to note. Their gotchas page says that once they've worked out a few potential kinks with the autodefrag mount option, they'll make it the default, which should prevent this from being an issue in future versions. Category:Linux Category:Btrfs Category:Storage Category:RAID // vim: set syntax=asciidoc: