8
votes

When a main node fails, its IP (IPv6) floats to standby node. The standby node is supposed to provide service henceforth on that IP.

Given that both these nodes co-exist in the same LAN, often it is seen that the standby node becomes unreachable. The interface is UP and RUNNING with the IPv6 address assigned, but all the IP operations are stopped.

One possibility is Duplicate Address Detection (DAD) is kicking in when the IP is getting configured on standby. The RFC says all IP operations must be stopped.

My question is regarding the specifics in Linux kernel IPv6 implementation. Previously, from kernel code, I supposed the sysctl variable "disable_ipv6" must be getting set. But the kernel is not disabling IPv6, it is just stops all IP operations on that interface.

Can anyone explain what Linux kernel IPv6 does when it "disables these IP operations" on DAD failure? Can this be reset to normal without doing the interface DOWN & UP? Any pointers in the code will be very helpful.

2
An OpenStack bug on similar lines: bugs.launchpad.net/nova/+bug/1011134 - user31986
This is just a gentle reminder to bring up the post: Would really appreciate if there are any insights from the community. The BOUNTY ends in 1 hour. Thanks! - user31986
does dmesg show the duplicate address detected message? - ssnobody
Yes, it does, similar to one in net/ipv6/addrconf.c pointing DAD failure. - user31986
If anyone is aware, is a solution for this bug thought of? bugs.launchpad.net/nova/+bug/1011134 - user31986

2 Answers

2
votes

This article elaborates the specification and behavior w.r.t. what really is happening in the kernel w.r.t. IPv6 implementation and the floating IP configuration. It also suggests a solution: http://criticalindirection.com/2015/06/30/ipv6_dad_floating_ips/

It mentions for "user-assigned link-local", the IPv6 allocation gets stuck in tentative state, marked by IFA_F_TENTATIVE in the kernel. This state implies DAD is in progress and the IP is not yet validated. For "auto-assigned link-local", if the DAD fails it retries accept_dad times (with new auto-generated IP each time), and after that it disables IPv6 on that interface.

Solution it suggests is: Disable DAD before configuring the floating IP and enable it back when it is out of the tentative state.

For more details refer above link.

0
votes

This is related to a bug in nova, bug #101134

The documentation for accept_dad says:

accept_dad - INTEGER Whether to accept DAD (Duplicate Address Detection). 0: Disable DAD 1: Enable DAD (default) 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate link-local address has been found.

So you can use sysctl -w net.ipv6.conf.default.accept_dad=0 to workaround the bug and disable DAD.

Alternatively, you can fix this bug by implementing the proposing patches to nova/virt/libvirt/firewall.py from that same bug report.

If it is not already present in the NWFilterFirewall class, add the following staticmethod:

def nova_no_nd_reflection_filter(self):
    """This filter protects false positives on IPv6 Duplicate Address
    Detection(DAD).
    """
    uuid = self._get_filter_uuid('nova-no-nd-reflection')
    return '''<filter name='nova-no-nd-reflection' chain='ipv6'>
              <!-- no nd reflection -->
              <!-- drop if destination mac is v6 mcast mac addr and
                   we sent it. -->
              <uuid>%s</uuid>
              <rule action='drop' direction='in'>
                  <mac dstmacaddr='33:33:00:00:00:00'
                       dstmacmask='ff:ff:00:00:00:00' srcmacaddr='$MAC'/>
              </rule>
              </filter>''' % uuid

Then, add this filter to your filter lists in _ensure_static_filters() by adding:

self._define_filter(self.nova_no_nd_reflection_filter())

after filter_set is defined.