How to make iproute2 multiple uplinks work with masquerading

Question

I've the following problem with routing + NAT: If I've two ISP and I'm using two nexthop in default route with MASQUERADE on both ISP links, I see routing cache regenerated, but sometimes packets sent to a new link (after cache regeneration) uses wrong source address for masquerading.

Here is the config.

I've two links to outside via two different providers: eth1 and eth2 eth0 is the LAN

$ ip a (part of output, since we have 3 more interfaces disabled)
2: eth1: mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.1.254/24 brd 192.168.1.255 scope global eth1
3: eth2: mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.2.254/24 brd 192.168.2.255 scope global eth2
6: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.5.1/24 brd 192.168.5.255 scope global eth0

Roting tables:

$ ip r 192.168.5.0/24 dev eth0 proto kernel scope link src 192.168.5.1
192.168.2.0/24 dev eth2 proto kernel scope link src 192.168.2.254
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.254
default nexthop via 192.168.1.1 dev eth1 weight 1
nexthop via 192.168.2.1 dev eth2 weight 1

$ ip r s t eth1
default via 192.168.1.1 dev eth1

$ ip r s t eth2
default via 192.168.2.1 dev eth2

Rules:

$ ip ru
0: from all lookup local
32450: from 192.168.2.254 lookup eth2
32717: from 192.168.5.124 lookup eth1
32766: from all lookup main
32767: from all lookup default

Q1: if I do pings from two PC in LAN: 5.137 and 5.147, to the same IP (195.60.1.1) how can they go via different links (ping 195.60.1.1 is run on both computers)?

$ ip r g 195.60.1.1 from 192.168.5.137 iif eth0
195.60.169.6 from 192.168.5.137 via 192.168.1.1 dev eth1 src 192.168.5.1
cache mtu 1500 advmss 1460 hoplimit 128 iif eth0

$ ip r g 195.60.1.1 from 192.168.5.147 iif eth0
195.60.169.6 from 192.168.5.147 via 192.168.2.1 dev eth2 src 192.168.5.1
cache mtu 1500 advmss 1460 hoplimit 128 iif eth0

The routing in my case should be the same for all users. it should send packets to the same destination via the same link always (even if the source IP is different). isn't it?

Q2: Sometimes I see in tcpdump on external interfaces that the routing cache was regenerated. This can be forced by ip r f t cache. This sometimes results in change of the link for my pings. But one of two machines suddenly looses connection. From tcpdump I found that this happens because the routing has decided to use another link, but the MASQUERADE was not updated according:

$ tcpdump -i eth1
IP 192.168.2.254 > 195.60.1.1: ICMP echo request, id 10677, seq 242, length 64
IP 192.168.1.254 > 195.60.1.1: ICMP echo request, id 37387, seq 244, length 64 IP 195.60.1.1 > 192.168.1.254: ICMP echo reply, id 37387, seq 244, length 64

The second and third packets are request-reply from/to 5.137

The first packet is the request from .5.147 with wrong source address on that interface due to MASQUERADE not updated after the routing cache purge - hence, no reply, since the source address of the MASQUERADEd packet is wrong.

Here is my MASQUERADE setting

$ iptables -L -t nat
Chain POSTROUTING (policy ACCEPT 752K packets, 48M bytes)
pkts bytes target prot opt in out source destination
2840K 256M MASQUERADE all -- any eth1 192.168.5.0/24 anywhere
2491K 229M MASQUERADE all -- any eth2 192.168.5.0/24 anywhere

I understand that I can use conntrack to mark packets, but it is a little bit more complicated. I would prefer to use destination IP as the key for routing. What is wrong in this scenario? why routing cache purges do not notify NAT-engine about changes in routing?

PoltoS PoltoS · Accepted Answer · 2010-12-17T13:06:55

Ok, the answer was finally found using search engines.

This particular behaviour is a bug in the Linux kernel known at least since 2005. Julian Anastasov has written a patch to workaround this error (see http://www.ssi.bg/~ja/#routes)

Anyway, it was found that the chosen scenario with load balancing and NAT is not good, since it may break authorization on some sites and makes Jabber and Skype flicker due to recache of routes resulting in changes of routes for each destination (since we use NAT, the external IP changes too and Skype and other services sees you as logged in from another computer).

Much better way to share multiple links over a big office is to split users by channels. So, we assigned a preferred channel for each computer in our network and if that channel is not up we choose any other channel for the computer. This strategy keeps the same external IP (after NATing) for every computer in our network. Setting up preferred channel allows us to send critical employee via faster channels, while employee dealing with big files over low cos wide but slow channels. We use 4 channels, since all ISP in our region goes down at least 2 times a week for several hours.

How to make iproute2 multiple uplinks work with masquerading

1 Answers