by Craig Miller
Previously I wrote about resurrecting the old forgotten routing protocol,
RIPng. In a small network of more than one router, you need a routing protocol to share information between the routers. I used RIPng for about six months, turned it on, and pretty much forgot that it was running. Worked like a charm in my wired network.
I moved to a new (to me) house this summer, and thought it was a good opportunity to try out a routing protocol which not only handles wired networks but also wireless. Babel seemed just the thing for this environment.
Enter Babel
Babel is a loop-avoiding distance-vector routing protocol that is robust and efficient both in ordinary wired networks and in wireless mesh networks. Based on the loss of
hellos the cost of wireless links can be increased, making sketchy wireless links less preferred.
RFC 6126 standardizes the routing protocol.There are two implementations which are supported on OpenWrt routers,
babeld
and
bird
Creating a network with redundant paths
Like anything in networking, it starts with the physical layer (wireless is a form of physical layer). I attached the wireless links of the
backup link router to the
production and
test routers. Thus creating redundant path of connectivity within my house.
Running BIRD with Babel
I chose
bird6
(the IPv6 version of
bird
on OpenWrt) because I already had it installed on the routers for
RIPng. It was merely a matter of commenting out the RIP section in the
/etc/bird6.conf
file, and enabling Babel.
The
Bird Documentation provides an example. Add the following to
/etc/bird6.conf
get Babel running in
bird6
protocol babel {
interface "wlan0", "wlan1" {
type wireless;
hello interval 1;
rxcost 512;
};
interface "br-lan" {
type wired;
};
import all;
export all;
}
In the example above,
wlan0
is the 2.4 Ghz radio, and
wlan1
is the 5 Ghz radio.
Checking the path of connectivity
When determining the connectivity path,
traceroute6
(the IPv6 version) is your friend. Checking between the laptop and the DNS server, the path is:
$ traceroute6 6dns
traceroute to 6lilikoi.hoomaha.net (2001:db8:ebbd:4118::1) from 2001:db8:ebbd:bac0:d999:cd8a:cd9b:2037, port 33434, from port 49819, 30 hops max, 60 bytes packets
1 2001:db8:ebbd:bac0::1 (2001:db8:ebbd:bac0::1) 4.561 ms 0.510 ms 0.487 ms
2 2001:db8:ebbd:4118::1 (2001:db8:ebbd:4118::1) 2.562 ms 2.193 ms 1.927 ms
$
The traceroute is showing the path going clockwise through the 2.4 Ghz wireless link.
Network Failure!
To test how well Babel can automatically route around failed links, I started a ping to the DNS server from the laptop and disabled the 2.4 Ghz radio, thus blocking the link the pings were using, and waited...
$ ping6 6dns
PING 6dns(2001:db8:ebbd:4118::1) 56 data bytes
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=1 ttl=63 time=3.54 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=2 ttl=63 time=1.64 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=3 ttl=63 time=2.02 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=4 ttl=63 time=1.64 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=5 ttl=63 time=1.51 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=6 ttl=63 time=1.65 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=7 ttl=63 time=1.58 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=8 ttl=63 time=5.80 ms
From 2001:db8:ebbd:bac0::1 icmp_seq=33 Destination unreachable: No route
From 2001:db8:ebbd:bac0::1 icmp_seq=34 Destination unreachable: No route
...
From 2001:db8:ebbd:bac0::1 icmp_seq=48 Destination unreachable: No route
From 2001:db8:ebbd:bac0::1 icmp_seq=49 Destination unreachable: No route
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=101 ttl=61 time=2.12 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=102 ttl=61 time=3.42 ms
64 bytes from 2001:db8:ebbd:4118::1: icmp_seq=103 ttl=61 time=3.16 ms
As you can see the outage was
93 seconds (101 - 8). Not a record time, OSPF would converge much faster, but still it did
fix itself without human intervention.
Checking the connectivity path with
traceroute6
:
$ traceroute6 6dns
traceroute to 6lilikoi.hoomaha.net (2001:db8:ebbd:4118::1) from 2001:db8:ebbd:bac0:d999:cd8a:cd9b:2037, port 33434, from port 47725, 30 hops max, 60 bytes packets
1 2001:db8:ebbd:bac0::1 (2001:db8:ebbd:bac0::1) 0.541 ms 0.445 ms 0.437 ms
2 2001:db8:ebbd:2080::1 (2001:db8:ebbd:2080::1) 1.705 ms 1.832 ms 1.817 ms
3 2001:db8:ebbd:2000::1 (2001:db8:ebbd:2000::1) 2.273 ms 1.891 ms 2.584 ms
4 2001:db8:ebbd:4118::1 (2001:db8:ebbd:4118::1) 2.348 ms 2.822 ms 2.289 ms
$
The path can now be seen to be traveling counter-clockwise around the circle via the 5 Ghz link. The Babel routing protocol is routing packets around the failure.
Wireless is great, except ...
As more and more things come online using wireless there will be more interference and contention for bandwidth, especially in the 2.4 Ghz band. Babel can enables routing of packets around sketchy wireless links due to interference in a crowded wifi environment.
Your Metric may vary
Because wireless is variable, Babel applies differing metrics to routes as the wireless signal changes. An unfortunate side effect of this is that the network is continuously converging (or changing). The route that may have been used last minute to the remote host, my be invalid the next minute.
I noticed this as my previously very stable IPv6-only servers were now disconnecting, or worse, not reachable.
Route Flapping!
As I looked at the OpenWrt syslog (using the
logread
command) I could see that the routes were continually changing.
Tue Jul 24 14:46:45 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:46:45 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:46:46 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:01 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:01 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:02 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:33 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:33 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:34 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:47:49 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:49 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:47:50 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
Tue Jul 24 14:48:53 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:48:53 2018 daemon.info odhcpd[778]: Raising SIGUSR1 due to default route change
Tue Jul 24 14:48:54 2018 daemon.info odhcpd[778]: Using a RA lifetime of 1800 seconds on br-lan
...
The problem with this route flapping is that it was being propagated to the other routers which were busy adding and removing routes, causing
unreachable to parts of my network. Not a desired behaviour.
Settling things down
To rid my network of the route churn, I changed the Babel
wireless interfaces to
wired, giving them a stable metric, no longer tied to the variability of the wireless signal quality (signal to noise).
The
/etc/bird6.conf
now looks like:
protocol babel {
interface "wlan0", "wlan1" {
type wired;
hello interval 5;
};
interface "br-lan" {
type wired;
};
import all;
export all;
}
Restarting
bird6
, and looking at the syslog, a brief activity can be seen, then the route churn stops, and the network is stable.
My ssh connection was dropped as the network did an initial reconverge, and then I was able log back in and examine the syslog.
Babel, still a work in progress
Babel is still being actively developed, and has a more modern approach to wireless links (something that was near non-existent when RIPng was being standardized back in 1997). Like RIPng, it is easy to set up without having to understand the complexities of OSPF. It is easy to setup on OpenWrt routers and provides redundancy in your network. That said the wireless functionality as implemented by Bird (v 1.63) is not quite there. Fortunately, there is Bird v2.0 out, and I look forward to giving it a try when it comes to OpenWrt.
Postscript
Although the route churn has subsided, I re-measured the convergence time for Babel, and it was quite long,
317 seconds, probably due to the hello timer being set to 5 seconds.
In the end, I reverted my house network to
RIPng. Running the same convergence test yielded an outage of only
11 seconds with no route churn.
Perhaps many of the Babel issues are just Bird's implementation. And there may be tweaks to reduce network converge times. I'd happily give Babel another chance, but for now, I'll stick with good ol' RIPng.
** if you are running a firewall, the default on OpenWrt/LEDE, you will need to put in a rule to accept IPv6 UDP port 6696
Originally posted (with more detail) on www.makikiweb.com