MTU阈值触发ping丢包的异常问题排查求助
MTU阈值触发ping丢包的异常问题排查求助
各位大佬好,最近遇到一个非常奇怪的MTU相关ping丢包问题,折腾了好几天都没找到原因,想请大家帮忙分析下:
一、默认MTU(1500)下的正常情况
hosta的默认MTU为1500:
hosta$ ifconfig eth0 | grep mtu eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
此时从hostb发送1500字节负载的ping请求,能正常收到回复:
hostb$ ping -s 1500 -c 2 hosta PING hosta (hosta) 1500(1528) bytes of data. 1508 bytes from hosta: icmp_seq=1 ttl=64 time=0.273 ms 1508 bytes from hosta: icmp_seq=2 ttl=64 time=0.314 ms --- hosta ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1025ms rtt min/avg/max/mdev = 0.273/0.293/0.314/0.020 ms
hosta上的tcpdump显示包正常收发:
12:01:40.237047 IP hostb > hosta: ICMP echo request, id 3052, seq 1, length 1480 12:01:40.237048 IP hostb > hosta: icmp 12:01:40.237116 IP hosta > hostb: ICMP echo reply, id 3052, seq 1, length 1480
二、MTU降至1488及以上时仍正常
我把hosta的MTU降到1488及以上(比如1488、1490)时,用同样的ping -s 1500 -c 2 hosta命令依然能正常收到回复。
三、关键阈值:MTU=1487时出现完全丢包
当把hosta的MTU设置为1487后:
hosta $ ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1487
此时从hostb发送1500字节负载的ping请求,完全收不到回复:
hosb $ ping -s 1500 -c 2 hosta PING hosta (hosta) 1500(1528) bytes of data. --- hosta ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1032ms
hosta上的tcpdump只能看到两条模糊的icmp记录,没有完整的请求包:
12:01:07.421196 IP hostb > hosta: icmp 12:01:08.443698 IP hostb > hosta: icmp
四、相关系统参数检查
我检查了hosta上的IPv4相关sysctl参数,确认没有禁用PMTU探测或ICMP响应:
net.ipv4.ip_forward_use_pmtu = 0 net.ipv4.ip_no_pmtu_disc = 0 net.ipv4.route.min_pmtu = 552 net.ipv4.route.mtu_expires = 600 net.ipv4.tcp_mtu_probe_floor = 48 net.ipv4.tcp_mtu_probing = 0 net.ipv4.icmp_echo_ignore_all = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.icmp_errors_use_inbound_ifaddr = 0 net.ipv4.icmp_ignore_bogus_error_responses = 1 net.ipv4.icmp_msgs_burst = 50 net.ipv4.icmp_msgs_per_sec = 1000 net.ipv4.icmp_ratelimit = 1000 net.ipv4.icmp_ratemask = 6168
五、补充测试细节
- 尝试使用ping的
-M选项(do/want/dont)强制控制分片行为,结果只要hosta的MTU≤1487,不管用哪个选项都无法收到回复; - 进一步抓包分析:
- MTU=1487时,hosta只收到一个分片包,显示为:
这个包长度562字节,但没有收到第一个分片;1 2023-02-15 22:40:24.095129 10.50.107.83 10.50.107.129 IPv4 562 Fragmented IP protocol (proto=ICMP 1, off=1480, ID=1fb9) - 当MTU改回1500时,能收到完整的分片并正常回复,抓包显示:
2 2023-02-15 22:40:42.093639 10.50.107.83 10.50.107.129 IPv4 1514 Fragmented IP protocol (proto=ICMP 1, off=0, ID=2c62) [Reassembled in #3] 3 2023-02-15 22:40:42.093639 10.50.107.83 10.50.107.129 ICMP 562 Echo (ping) request id=0x1004, seq=1/256, ttl=64 (reply in 5) 4 2023-02-15 22:40:42.093698 10.50.107.129 10.50.107.83 IPv4 1514 Fragmented IP protocol (proto=ICMP 1, off=0, ID=fe1a) [Reassembled in #5] 5 2023-02-15 22:40:42.093717 10.50.107.129 10.50.107.83 ICMP 562 Echo (ping) reply id=0x1004, seq=1/256, ttl=64 (request in 3)
- MTU=1487时,hosta只收到一个分片包,显示为:
- 对比分片数据,发现MTU=1487时收到的分片和正常情况下的分片数据不匹配(ping负载是00-ff的循环),怀疑要么发送方没发送第一个分片,要么接收方没收到;
- 环境说明:两台服务器是配置完全相同的Ubuntu虚拟机,而且从我的笔记本通过VPN/防火墙能正常ping通MTU=1487的hosta,排除了hosta本身的问题。
实在搞不懂为什么MTU降到1487及以下就会触发这种丢包,明明没有修改任何分片相关的配置。有没有大佬遇到过类似问题,或者有排查思路的?感谢大家!
备注:内容来源于stack exchange,提问作者Srikanth




