blob: bcdcb84e09200d7173033edc33d6016915b3569e (
plain)
1 Btrfs:RAID 5 Rsync Freeze
2 =========================
3 :author: Aaron Ball
4 :email: nullspoon@iohq.net
5
6
7 == {doctitle}
8
9 My server's _/home/_ directory is a btrfs RAID 5, spanning three drives (I did
10 a blog post about it Btrfs:RAID_Setup[here]). Everything worked fine, until I
11 used rsync to sync my files from my laptop to my server. At that point, the
12 sync would go well for a little while and then slow to a crawl. I couldn't
13 cancel the sync with a ctrl+c. If I could get on my server over ssh, I'd find
14 that one of my cpus was pegged at 100%. Sometimes though it got so bogged down
15 I couldn't even get to the server at all. If I were already on the server and I
16 did a kill -9 on rsync, it'd go defunct.
17
18 I checked my logs after trying to umount /home/ and found...
19
20 ----
21 Nov 03 12:01:18 zion kernel: device label home devid 1 transid 1173 /dev/sdb
22 Nov 03 12:01:19 zion kernel: btrfs: disk space caching is enabled
23 Nov 03 12:11:53 zion kernel: INFO: task umount:1668 blocked for more than 120 seconds.
24 Nov 03 12:11:53 zion kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
25 Nov 03 12:11:53 zion kernel: umount D ffff880037afbc60 0 1668 1653 0x00000000
26 Nov 03 12:11:53 zion kernel: ffff880037afbbd0 0000000000000086 0000000000014500 ffff880037afbfd8
27 Nov 03 12:11:53 zion kernel: ffff880037afbfd8 0000000000014500 ffff8800aa0caa30 0000000000000010
28 Nov 03 12:11:53 zion kernel: 000000000d6fffff ffff880037afbb98 ffffffff8113a911 ffff8800afedb728
29 Nov 03 12:11:53 zion kernel: Call Trace:
30 Nov 03 12:11:53 zion kernel: [<ffffffff8113a911>] ? free_pcppages_bulk+0x3b1/0x3f0
31 Nov 03 12:11:53 zion kernel: [<ffffffff81132700>] ? filemap_fdatawait+0x30/0x30
32 Nov 03 12:11:53 zion kernel: [<ffffffff814e1029>] schedule+0x29/0x70
33 Nov 03 12:11:53 zion kernel: [<ffffffff814e12cf>] io_schedule+0x8f/0xe0
34 Nov 03 12:11:53 zion kernel: [<ffffffff8113270e>] sleep_on_page+0xe/0x20
35 Nov 03 12:11:53 zion kernel: [<ffffffff814ddb5b>] __wait_on_bit_lock+0x5b/0xc0
36 Nov 03 12:11:53 zion kernel: [<ffffffff8113284a>] __lock_page+0x6a/0x70
37 Nov 03 12:11:53 zion kernel: [<ffffffff81084800>] ? wake_atomic_t_function+0x40/0x40
38 Nov 03 12:11:53 zion kernel: [<ffffffff81141fa3>] truncate_inode_pages_range+0x613/0x660
39 Nov 03 12:11:53 zion kernel: [<ffffffff81142005>] truncate_inode_pages+0x15/0x20
40 Nov 03 12:11:53 zion kernel: [<ffffffffa07df172>] btrfs_evict_inode+0x42/0x380 [btrfs]
41 Nov 03 12:11:53 zion kernel: [<ffffffff811b97b0>] evict+0xb0/0x1b0
42 Nov 03 12:11:53 zion kernel: [<ffffffff811b98e9>] dispose_list+0x39/0x50
43 Nov 03 12:11:53 zion kernel: [<ffffffff811ba56c>] evict_inodes+0x11c/0x130
44 Nov 03 12:11:53 zion kernel: [<ffffffff811a1cc8>] generic_shutdown_super+0x48/0xe0
45 Nov 03 12:11:53 zion kernel: [<ffffffff811a1f22>] kill_anon_super+0x12/0x20
46 Nov 03 12:11:53 zion kernel: [<ffffffffa07a8ee6>] btrfs_kill_super+0x16/0x90 [btrfs]
47 Nov 03 12:11:53 zion kernel: [<ffffffff811a22fd>] deactivate_locked_super+0x3d/0x60
48 Nov 03 12:11:53 zion kernel: [<ffffffff811a28e6>] deactivate_super+0x46/0x60
49 Nov 03 12:11:53 zion kernel: [<ffffffff811bdeaf>] mntput_no_expire+0xef/0x150
50 Nov 03 12:11:53 zion kernel: [<ffffffff811bf0b1>] SyS_umount+0x91/0x3b0
51 Nov 03 12:11:53 zion kernel: [<ffffffff814ea5dd>] system_call_fastpath+0x1a/0x1f
52 ----
53
54 The only way to solve the problem was to perform a restart. After that, the
55 problem would come back as soon as I started rsync again.
56
57
58 [[the-solution]]
59 == The Solution
60
61 I hunted around for a while until I finally just searched for the name of the
62 pegged process, **btrfs-endio-wri**, and cpu time. It turns out, the btrfs
63 folks have https://btrfs.wiki.kernel.org/index.php/Gotchas[a page] detailing a
64 list of current "gotchas" btrfs has. This issue was one of them. They describe
65 it as <pre> Files with a lot of random writes can become heavily fragmented
66 (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of
67 CPU load on systems with an SSD or large amount a RAM. ... Symptoms include
68 btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes,
69 possibly triggered by syncs). You can use filefrag to locate heavily fragmented
70 files. </pre>
71
72 One of the best parts of rsync is that is syncs deltas instead of resyncing the
73 entire file. What does that result in? Lots of little random writes. Sounds
74 like a match to me.
75
76 **To fix this**, I defragged all of /home/ (with _compression=lzo_ of course :)
77 ), and remounted using the *autodefrag* option.
78
79 Now I can run rsync with no problems.
80
81 One last thing to note. Their gotchas page says that once they've worked out a
82 few potential kinks with the autodefrag mount option, they'll make it the
83 default, which should prevent this from being an issue in future versions.
84
85 Category:Linux
86 Category:Btrfs
87 Category:Storage
88 Category:RAID
89
90
91 // vim: set syntax=asciidoc:
|