summaryrefslogtreecommitdiff
path: root/src/Btrfs:RAID_5_Rsync_Freeze.ascii
blob: bcdcb84e09200d7173033edc33d6016915b3569e (plain)
    1 Btrfs:RAID 5 Rsync Freeze
    2 =========================
    3 :author: Aaron Ball
    4 :email: nullspoon@iohq.net
    5 
    6 
    7 == {doctitle}
    8 
    9 My server's _/home/_ directory is a btrfs RAID 5, spanning three drives (I did
   10 a blog post about it Btrfs:RAID_Setup[here]). Everything worked fine, until I
   11 used rsync to sync my files from my laptop to my server.  At that point, the
   12 sync would go well for a little while and then slow to a crawl. I couldn't
   13 cancel the sync with a ctrl+c. If I could get on my server over ssh, I'd find
   14 that one of my cpus was pegged at 100%.  Sometimes though it got so bogged down
   15 I couldn't even get to the server at all. If I were already on the server and I
   16 did a kill -9 on rsync, it'd go defunct.
   17 
   18 I checked my logs after trying to umount /home/ and found...
   19 
   20 ----
   21 Nov 03 12:01:18 zion kernel: device label home devid 1 transid 1173 /dev/sdb
   22 Nov 03 12:01:19 zion kernel: btrfs: disk space caching is enabled
   23 Nov 03 12:11:53 zion kernel: INFO: task umount:1668 blocked for more than 120 seconds.
   24 Nov 03 12:11:53 zion kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   25 Nov 03 12:11:53 zion kernel: umount D ffff880037afbc60 0 1668 1653 0x00000000
   26 Nov 03 12:11:53 zion kernel: ffff880037afbbd0 0000000000000086 0000000000014500 ffff880037afbfd8
   27 Nov 03 12:11:53 zion kernel: ffff880037afbfd8 0000000000014500 ffff8800aa0caa30 0000000000000010
   28 Nov 03 12:11:53 zion kernel: 000000000d6fffff ffff880037afbb98 ffffffff8113a911 ffff8800afedb728
   29 Nov 03 12:11:53 zion kernel: Call Trace:
   30 Nov 03 12:11:53 zion kernel: [<ffffffff8113a911>] ?  free_pcppages_bulk+0x3b1/0x3f0
   31 Nov 03 12:11:53 zion kernel: [<ffffffff81132700>] ? filemap_fdatawait+0x30/0x30
   32 Nov 03 12:11:53 zion kernel: [<ffffffff814e1029>] schedule+0x29/0x70
   33 Nov 03 12:11:53 zion kernel: [<ffffffff814e12cf>] io_schedule+0x8f/0xe0
   34 Nov 03 12:11:53 zion kernel: [<ffffffff8113270e>] sleep_on_page+0xe/0x20
   35 Nov 03 12:11:53 zion kernel: [<ffffffff814ddb5b>] __wait_on_bit_lock+0x5b/0xc0
   36 Nov 03 12:11:53 zion kernel: [<ffffffff8113284a>] __lock_page+0x6a/0x70
   37 Nov 03 12:11:53 zion kernel: [<ffffffff81084800>] ?  wake_atomic_t_function+0x40/0x40
   38 Nov 03 12:11:53 zion kernel: [<ffffffff81141fa3>] truncate_inode_pages_range+0x613/0x660
   39 Nov 03 12:11:53 zion kernel: [<ffffffff81142005>] truncate_inode_pages+0x15/0x20
   40 Nov 03 12:11:53 zion kernel: [<ffffffffa07df172>] btrfs_evict_inode+0x42/0x380 [btrfs]
   41 Nov 03 12:11:53 zion kernel: [<ffffffff811b97b0>] evict+0xb0/0x1b0
   42 Nov 03 12:11:53 zion kernel: [<ffffffff811b98e9>] dispose_list+0x39/0x50
   43 Nov 03 12:11:53 zion kernel: [<ffffffff811ba56c>] evict_inodes+0x11c/0x130
   44 Nov 03 12:11:53 zion kernel: [<ffffffff811a1cc8>] generic_shutdown_super+0x48/0xe0
   45 Nov 03 12:11:53 zion kernel: [<ffffffff811a1f22>] kill_anon_super+0x12/0x20
   46 Nov 03 12:11:53 zion kernel: [<ffffffffa07a8ee6>] btrfs_kill_super+0x16/0x90 [btrfs]
   47 Nov 03 12:11:53 zion kernel: [<ffffffff811a22fd>] deactivate_locked_super+0x3d/0x60
   48 Nov 03 12:11:53 zion kernel: [<ffffffff811a28e6>] deactivate_super+0x46/0x60
   49 Nov 03 12:11:53 zion kernel: [<ffffffff811bdeaf>] mntput_no_expire+0xef/0x150
   50 Nov 03 12:11:53 zion kernel: [<ffffffff811bf0b1>] SyS_umount+0x91/0x3b0
   51 Nov 03 12:11:53 zion kernel: [<ffffffff814ea5dd>] system_call_fastpath+0x1a/0x1f
   52 ----
   53 
   54 The only way to solve the problem was to perform a restart. After that, the
   55 problem would come back as soon as I started rsync again.
   56 
   57 
   58 [[the-solution]]
   59 == The Solution
   60 
   61 I hunted around for a while until I finally just searched for the name of the
   62 pegged process, **btrfs-endio-wri**, and cpu time. It turns out, the btrfs
   63 folks have https://btrfs.wiki.kernel.org/index.php/Gotchas[a page] detailing a
   64 list of current "gotchas" btrfs has. This issue was one of them. They describe
   65 it as <pre> Files with a lot of random writes can become heavily fragmented
   66 (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of
   67 CPU load on systems with an SSD or large amount a RAM. ... Symptoms include
   68 btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes,
   69 possibly triggered by syncs). You can use filefrag to locate heavily fragmented
   70 files. </pre>
   71 
   72 One of the best parts of rsync is that is syncs deltas instead of resyncing the
   73 entire file. What does that result in? Lots of little random writes. Sounds
   74 like a match to me.
   75 
   76 **To fix this**, I defragged all of /home/ (with _compression=lzo_ of course :)
   77 ), and remounted using the *autodefrag* option.
   78 
   79 Now I can run rsync with no problems.
   80 
   81 One last thing to note. Their gotchas page says that once they've worked out a
   82 few potential kinks with the autodefrag mount option, they'll make it the
   83 default, which should prevent this from being an issue in future versions.
   84 
   85 Category:Linux
   86 Category:Btrfs
   87 Category:Storage
   88 Category:RAID
   89 
   90 
   91 // vim: set syntax=asciidoc:

Generated by cgit