You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

MTU阈值触发ping丢包的异常问题排查求助

MTU阈值触发ping丢包的异常问题排查求助

各位大佬好,最近遇到一个非常奇怪的MTU相关ping丢包问题,折腾了好几天都没找到原因,想请大家帮忙分析下:

一、默认MTU(1500)下的正常情况

hosta的默认MTU为1500:

hosta$ ifconfig eth0 | grep mtu
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

此时从hostb发送1500字节负载的ping请求,能正常收到回复:

hostb$ ping -s 1500 -c 2 hosta
PING hosta (hosta) 1500(1528) bytes of data.
1508 bytes from hosta: icmp_seq=1 ttl=64 time=0.273 ms
1508 bytes from hosta: icmp_seq=2 ttl=64 time=0.314 ms

--- hosta ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1025ms
rtt min/avg/max/mdev = 0.273/0.293/0.314/0.020 ms

hosta上的tcpdump显示包正常收发:

12:01:40.237047 IP hostb > hosta: ICMP echo request, id 3052, seq 1, length 1480
12:01:40.237048 IP hostb  > hosta: icmp
12:01:40.237116 IP hosta > hostb: ICMP echo reply, id 3052, seq 1, length 1480

二、MTU降至1488及以上时仍正常

我把hosta的MTU降到1488及以上(比如1488、1490)时,用同样的ping -s 1500 -c 2 hosta命令依然能正常收到回复。

三、关键阈值:MTU=1487时出现完全丢包

当把hosta的MTU设置为1487后:

hosta $ ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1487

此时从hostb发送1500字节负载的ping请求,完全收不到回复:

hosb $ ping -s 1500 -c 2 hosta
PING hosta (hosta) 1500(1528) bytes of data.

--- hosta ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1032ms

hosta上的tcpdump只能看到两条模糊的icmp记录,没有完整的请求包:

12:01:07.421196 IP hostb > hosta: icmp
12:01:08.443698 IP hostb > hosta: icmp

四、相关系统参数检查

我检查了hosta上的IPv4相关sysctl参数,确认没有禁用PMTU探测或ICMP响应:

net.ipv4.ip_forward_use_pmtu = 0
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.route.min_pmtu = 552
net.ipv4.route.mtu_expires = 600
net.ipv4.tcp_mtu_probe_floor = 48
net.ipv4.tcp_mtu_probing = 0
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168

五、补充测试细节

  1. 尝试使用ping的-M选项(do/want/dont)强制控制分片行为,结果只要hosta的MTU≤1487,不管用哪个选项都无法收到回复;
  2. 进一步抓包分析:
    • MTU=1487时,hosta只收到一个分片包,显示为:
      1   2023-02-15 22:40:24.095129  10.50.107.83    10.50.107.129   IPv4    562 Fragmented IP protocol (proto=ICMP 1, off=1480, ID=1fb9)
      
      这个包长度562字节,但没有收到第一个分片;
    • 当MTU改回1500时,能收到完整的分片并正常回复,抓包显示:
      2   2023-02-15 22:40:42.093639  10.50.107.83    10.50.107.129   IPv4    1514    Fragmented IP protocol (proto=ICMP 1, off=0, ID=2c62) [Reassembled in #3]
      3   2023-02-15 22:40:42.093639  10.50.107.83    10.50.107.129   ICMP    562 Echo (ping) request  id=0x1004, seq=1/256, ttl=64 (reply in 5)
      4   2023-02-15 22:40:42.093698  10.50.107.129   10.50.107.83    IPv4    1514    Fragmented IP protocol (proto=ICMP 1, off=0, ID=fe1a) [Reassembled in #5]
      5   2023-02-15 22:40:42.093717  10.50.107.129   10.50.107.83    ICMP    562 Echo (ping) reply    id=0x1004, seq=1/256, ttl=64 (request in 3)
      
  3. 对比分片数据,发现MTU=1487时收到的分片和正常情况下的分片数据不匹配(ping负载是00-ff的循环),怀疑要么发送方没发送第一个分片,要么接收方没收到;
  4. 环境说明:两台服务器是配置完全相同的Ubuntu虚拟机,而且从我的笔记本通过VPN/防火墙能正常ping通MTU=1487的hosta,排除了hosta本身的问题。

实在搞不懂为什么MTU降到1487及以下就会触发这种丢包,明明没有修改任何分片相关的配置。有没有大佬遇到过类似问题,或者有排查思路的?感谢大家!

备注:内容来源于stack exchange,提问作者Srikanth

火山引擎 最新活动