升级至4.19及以上内核时BTRFS文件系统出现损坏报错,4.9内核可正常运行的问题求助
升级至4.19及以上内核时BTRFS文件系统出现损坏报错,4.9内核可正常运行的问题求助
大家好,我遇到了一个奇怪的BTRFS相关问题,想请各位帮忙分析下:
我的服务器上,当使用内核版本≤4.9时,BTRFS文件系统一切正常;但只要升级到≥4.19的内核版本,整个系统就会直接崩溃,内核日志里会出现大量文件系统损坏相关的报错:
[Fri Oct 27 23:44:22 2023] BTRFS info (device dm-1): using crc32c (crc32c-intel) checksum algorithm [Fri Oct 27 23:44:22 2023] BTRFS info (device dm-1): disk space caching is enabled [Fri Oct 27 23:44:22 2023] BTRFS info (device dm-1): bdev /dev/mapper/vg0-root errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 [Fri Oct 27 23:44:23 2023] BTRFS info (device dm-1): checking UUID tree [Fri Oct 27 23:44:25 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=7800258560 slot=0 ino=22544639, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:25 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 7800258560 mirror 1 [Fri Oct 27 23:44:26 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=86444384256 slot=1 ino=8651007, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:26 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 86444384256 mirror 1 [Fri Oct 27 23:44:26 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=86032809984 slot=41 ino=24641791, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:26 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 86032809984 mirror 1 [Fri Oct 27 23:44:26 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=8081629184 slot=0 ino=8651599, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:26 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 8081629184 mirror 1 [Fri Oct 27 23:44:26 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=86032809984 slot=41 ino=24641791, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:26 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 86032809984 mirror 1 [Fri Oct 27 23:44:26 2023] BTRFS critical (device dm-1): corrupt leaf: root=5 block=7800258560 slot=0 ino=22544639, invalid inode transid: has 140710402240512 expect [0, 9501327] [Fri Oct 27 23:44:26 2023] BTRFS error (device dm-1): read time tree block corruption detected on logical 7800258560 mirror 1
同时,使用内核≥5版本下的btrfs check工具检查该分区时,也会报出大量错误:
root@rescue ~ # btrfs check /dev/mapper/vg0-root Opening filesystem to check... Checking filesystem on /dev/mapper/vg0-root UUID: 18c05fd2-0569-420b-bb74-676b2327d3d5 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 5 inode 8651007 errors 100000, invalid inode generation or transid root 5 inode 8651008 errors 100000, invalid inode generation or transid root 5 inode 8651599 errors 100000, invalid inode generation or transid root 5 inode 8651600 errors 100000, invalid inode generation or transid root 5 inode 22544639 errors 100000, invalid inode generation or transid root 5 inode 24641791 errors 100000, invalid inode generation or transid root 5 inode 40419460 errors 200, dir isize wrong root 5 inode 44021300 errors 200, dir isize wrong ERROR: errors found in fs roots found 187456020480 bytes used, error(s) found total csum bytes: 179106780 total tree bytes: 2667675648 total fs tree bytes: 2107199488 total extent tree bytes: 305291264 btree space waste bytes: 643077002 file data blocks allocated: 496942690304 referenced 175125786624
挂载分区后,目录内容也会出现大量异常:
root@rescue ~ # ll /mnt/ ls: cannot access '/mnt/proc': Input/output error ls: cannot access '/mnt/run': Input/output error ls: cannot access '/mnt/sys': Input/output error ls: cannot access '/mnt/etc': Input/output error ls: cannot access '/mnt/lib64': Input/output error ls: cannot access '/mnt/usr': Input/output error total 32K drwxr-xr-x 1 root root 2.0K Oct 27 20:18 bin drwxr-xr-x 1 root root 0 Sep 27 2014 boot drwxr-xr-x 1 root root 0 Sep 27 2014 dev d????????? ? ? ? ? ? etc drwxr-xr-x 1 root root 12 Dec 23 2014 home lrwxrwxrwx 1 root root 31 Oct 26 11:04 initrd.img -> boot/initrd.img-5.10.0-26-amd64 lrwxrwxrwx 1 root root 30 Oct 26 17:24 initrd.img.old -> boot/initrd.img-4.9.0-13-amd64 -rw-r----- 1 root root 659 Sep 27 2014 installimage.conf -rw-r----- 1 root root 9.8K Sep 27 2014 installimage.debug drwxr-xr-x 1 root root 420 Oct 26 17:22 lib d????????? ? ? ? ? ? lib64 drwx------ 1 root root 0 Sep 12 2012 lost+found drwxr-xr-x 1 root root 22 Sep 12 2012 media drwxr-xr-x 1 root root 0 Apr 2 2019 mnt drwxr-xr-x 1 root root 24 Dec 27 2019 opt d????????? ? ? ? ? ? proc drwx------ 1 root root 560 Oct 27 22:47 root d????????? ? ? ? ? ? run drwxr-xr-x 1 root root 3.1K Oct 27 20:18 sbin drwxr-xr-x 1 root root 0 Sep 12 2012 srv d????????? ? ? ? ? ? sys drwxrwxrwt 1 root root 106 Oct 27 23:35 tmp d????????? ? ? ? ? ? usr drwxr-xr-x 1 root root 118 Sep 28 2014 var lrwxrwxrwx 1 root root 28 Oct 26 11:04 vmlinuz -> boot/vmlinuz-5.10.0-26-amd64 lrwxrwxrwx 1 root root 27 Oct 26 17:24 vmlinuz.old -> boot/vmlinuz-4.9.0-13-amd64
但神奇的是,只要我切回4.9内核,所有这些报错都会消失,文件的读写操作完全正常——我反复验证过多次,文件可以正常读取、写入、重读和重写,没有任何问题。
另外,我还有其他使用BTRFS的服务器,升级到新内核后都没有出现类似问题,所以这应该不是普遍存在的兼容性问题。
希望有大佬能帮忙分析下原因,或者给出排查方向,非常感谢!
备注:内容来源于stack exchange,提问作者gmelis




