Troubleshooting TailScale Network
source link: https://nyan.im/p/troubleshooting-tailscale-network-en
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Troubleshooting TailScale Network
中文版：一次TailScale网络问题的调试过程 – Frank’s Weblog
As mentioned in an earlier post, I used TailScale to create a mesh network that connects all of my devices, and I used a cloud server located in AliCloud Beijing as an exit node, in order to access geographically restricted network services.
However, I noticed that I could not access the Internet at all when using that exit node. I thought it was a network connectivity issue with the relays, so I didn’t worry too much about it. But afterward, I noticed some other services on that server was not functioning, so I looked into it and found out that the problem was not that simple.
First I noticed that I couldn’t access the internet at all from the server, but
curl the IP address was working, which indicated the problem with DNS resolution.
resolvectl status showed that there were two DNS servers. I assumed this was the DNS server for the TailScale internal network (actually not, will elaborate later) since the IPs started with 100.100,
Link 2 (eth0) ...... Current DNS Server: 100.100.2.136 DNS Servers: 100.100.2.136 100.100.2.138
dig @100.100.2.136 baidu.com to check the response from the DNS server and got
connection timed out: no servers could be reached. The response from the command became normal after shutting down TailScale. So probably TailScale somehow affected the DNS resolution on the system.
Changing the DNS configuration on the server will work around this problem. Edit
/etc/netplan/99-netcfg.yaml, add a public DNS into
nameserver section under
network: version: 2 renderer: networkd ethernets: eth0: dhcp4: yes dhcp6: no nameservers: addresses: [22.214.171.124]
sudo netplan apply to apply changes, then
dig baidu.com returns the correct response.
However, modifying the DNS server allows the server to access the Internet, but many services inside AliCloud still require internal DNS resolution. For example, AliCloud’s internal
apt mirror (
mirrors.cloud.aliyuncs.com) and products such as cloud databases. Configuring apt sources to public mirrors can be a workaround for the apt mirror issue.
To locate the problem, we need to find the reason why the IP address 100.100.2.136 is not reachable. I thought these two DNS servers were IPs in TailScale’s internal network, but they were inaccessible by all means. After some searching, I found that 100.100.2.136 and 100.100.2.138 are actually internal DNS servers provided by AliCloud. There are also some AliCloud internal services that use similar IPs, for example, the
apt mirror whose IP is 100.100.2.148, which is also not able to connect using
We can therefore draw a preliminary conclusion that TailScale somehow affected access to the 100.100.x.x IP range.
My first thought was that TailScale was routing the entire 100.100.x.x IP range. However, according to the TailScale documentation, TailScale only routes the assigned IP address, not the entire CIDR.
ip route list also confirms this.
ip route list table 52 100.69.x.x dev tailscale0 100.90.x.x dev tailscale0 100.96.x.x dev tailscale0 100.98.x.x dev tailscale0 100.100.100.100 dev tailscale0 100.104.x.x dev tailscale0 100.121.x.x dev tailscale0 100.127.x.x dev tailscale0
ip route get 100.100.2.136 returns the following result, indicating that the packet will be routed to the
eth0 interface. This indicates that the routing table is correct and that the problem is not with the routing.
100.100.2.136 via 172.24.63.253 dev eth0 src 172.24.4.100 uid 0 cache
Another thing that may interfere the packets traveling is iptables.
iptables -S reveals the following entries related to TailScale.
-A ts-forward -i tailscale0 -j MARK --set-xmark 0x40000/0xffffffff -A ts-forward -m mark --mark 0x40000 -j ACCEPT -A ts-forward -s 100.64.0.0/10 -o tailscale0 -j DROP -A ts-forward -o tailscale0 -j ACCEPT -A ts-input -s 100.92.187.56/32 -i lo -j ACCEPT -A ts-input -s 100.115.92.0/23 ! -i tailscale0 -j RETURN -A ts-input -s 100.64.0.0/10 ! -i tailscale0 -j DROP
The last entry of these rules drops the packets to the entire
100.64.0.0/10 CIDR. The problem was solved after removing the rule using
After some searching, I found there are issues already posted earlier this year:
 tailscale drops 100.64.0.0/10 on firewall when ipv4 is disabled · Issue #3837 · tailscale/tailscale · GitHub
 FR: netfilter CGNAT mode when non-Tailscale CGNAT addresses should be allowed · Issue #3104 · tailscale/tailscale · GitHub
To sum up, the problem was caused by a firewall rule set by TailScale to block traffic to
100.64.0.0/10 CIDR, therefore some services on AliCloud’s internal network were blocked because they reside in this IP range. According to TailScale CLI documentation, adding
--netfilter-mod=off parameter when starting TailScale can avoid this rule from being set. However, this poses some security risks.
TailScale set this rule because the IP range (
100.64.0.0/10) it uses for the TailScale network is reserved for Carrier Grade NAT (CGNAT) and was assumed not to be used by the private networks. However, AliCloud uses this IP range for their internal services, thus causing conflict.
Aggregate valuable and interesting links.
Joyk means Joy of geeK