Mesh Gateway Settings and Bridge Loop Avoidance
Connecting a SECN Network to a LAN
Authors: T Gillett, Elektra
http://www.open-mesh.org/projects/batman-adv/wiki/Doc-overview http://www.open-mesh.org/projects/batman-adv/wiki/Gateways http://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance
When setting up a SECN mesh network and interconnecting it a LAN using mesh gateway devices to provide access to the Internet or LAN resources, care must be taken with how the two networks are configured.
This article covers two areas that need to be considered in the design - Mesh Gateway Settings, and Bridge Loop Avoidance.
Mesh Gateway Settings
The batman-adv protocol used to provide the mesh network in SECN firmware provides a facility to set each node on the mesh into one of three modes:
- Off - Client - Server
The intention of these settings is to allow batman-adv to work out the most appropriate way to route traffic around the mesh, based on the mode of each node and the quality of available links to each node.
In SECN firmware, the mesh Gateway mode is set to Off when a device is first flashed. In this mode, data traffic will travel around the mesh and to an attached LAN, but not necessarily in the most efficient manner.
If one or more nodes is attached to a LAN, the intention is that these nodes would normally be set to mesh gateway Server mode so that the batman-adv routing algorithm can use this information to assist in routing data packets.
Similarly, all other devices on the mesh would normally be set to mesh gateway Client mode.
Each Client node then is assigned to the most appropriate gateway Server node by the batman-adv algorithm, and you can see this assignment in the "Available Gateways" box in the SECN Wireless Status screen, where all the Gateway devices will be listed and the assigned Gateway will be signified by the " => " symbol on the left hand side.
Note that devices set to mesh Gateway Off mode will list the available gateways but not show any assignment, and will generally be able to access the LAN resources, with one significant exception - DHCP.
Batman-adv uses DHCP requests as the basis of its routing algorithm and intercepts these requests in Gateway nodes. The result of this is that if a client device such as a PC is attached to a mesh node, and wants to get an IP address from a DHCP server on the LAN, then this will only be possible if the relevant nodes are set up as Client and Server respectively, or if both nodes are set to Off.
Note particularly that if all the mesh nodes attached to the LAN are set to mesh gateway Server mode, and a PC is attached to a mesh node that is set to mesh gateway mode of Off, then the PC will not be able to acquire a DHCP IP address from the LAN.
Bridge Loop Avoidance
Bridge Loops are potentially created when multiple batX interfaces on different hosts are bridged into the same Ethernet segment (e.g. LAN).
This problem is not unique to batman-adv; it also exists in other protocols.
For a good description of the problem and how batman-adv handles it, see the reference above.
The batman-adv designers have developed a facility known as Bridge Loop Avoidance which is available on batman-adv 2012.2 and later software.
SECN-2 firmware is based on the stable release of batman-adv 2012.3, and Bridge Loop Avoidance is enabled by default.
This should permit complex mesh network topologies to work satisfactorily, including the provision of multiple gateway nodes connected to a given LAN.
This is a common requirement for avoiding a single point of failure in providing, for example, reliable Internet access from the mesh via a LAN and its router.
In SECN-1.1 firmware, the mesh routing is based on batman-adv 2011.2.0, and so the Bridge Loop Avoidance facility is not available.
If no other steps were taken, connecting two SECN-1.1 nodes to a LAN would result in a Bridge Loop condition, rendering thh LAN and mesh network unusable as it is flooded with looping data traffic.
In order to prevent this problem, SECN-1.1 firmware has Spanning Tree Protocol (STP) enabled by default on the br-lan bridge network interface.
STP is designed to prevent similar looping issues in conventional networks.
However, the operation of STP can interfere with the operation of the batman-adv mesh routing algorithm, resulting in less than optimum routing, and potentially some weird behaviour. For an explanation of this effect, see the section below which Elektra has written.
So SECN-1.1 firmware is a compromise. It has STP enabled for safety to prevent accidental loop conditions, but at the potential cost of interaction with batman-adv.
With SECN-1.1 devices, it is best to avoid using multiple gateway nodes connected to the same LAN, but if you do use them, the presence of STP will prevent a loop condition from arising and disturbing the LAN and mesh operation.
If you have a single SECN-1.1 gateway node connected to a LAN, then you can disable STP to prevent any possible interaction with batman-adv, but you run the risk of creating a loop condition should someone unwittingly connect a second node to the LAN.
Mixing STP and batman-adv
The idea is to mix two routing protocols, managing connectivity in the same network on the same layer. Lets have a brief look at the two:
STP has a simple logic: If there are redundant connections between ports, break all redundant connections except one, to build a tree.
Conversely, the logic of Batman is: If there are redundant connections, we are happy because our network is resilient and we select the best one, to build a well performing mesh. Managing multiple links between LANs or nodes and selecting the best is its business.
Since in all SECN devices all interfaces except the ad-hoc transport interface are part of a bridge, batman-adv will transport the STP packets to all nodes, simply because STP will enter the mesh via bat0. Hence, STP and batman-adv will both make routing decisions.
STP will try to set up a spanning tree in complete ignorance of the actual topology. It also assumes that all links are equal, since it is unaware of packet loss.
STP believes that all bat0 interfaces are simply part of the same physical bridge. To STP the wireless part of the mesh looks like a single physical switch with N ports (N = number of nodes that have a bat0 interface).
The mesh is a black switch box to STP. This is not a problem, as long as STP does not detect a redundant connection...
Lets assume a simple topology where we have 2 LANs, each having 2 SECN devices interconnected locally via Ethernet and the mesh. In this case we are taking chances.
STP will detect that there are four ports of the "big mesh switch" interconnecting the two LANs, i.e. in each LAN there are two bat0 interfaces providing a link to the mesh. STPs job is to break redundant connections, in order to break loops.
Hence STP will detect two possible loops while there is actually none, since Batman-adv is selecting the best path. STP is not aware that another MAC routing protocol is already doing its job. Hence, STP detects a call for action and blocks one connection between a SECN device and the LAN.
If we have bad luck STP will decide to close the wrong wireless link between the LANs. For example, Batman-adv selects SECN device #1 as connection to the LAN, while STP has decided to block it, and only allows connection to the LAN via SECN device #2.
=> So, whether the link between two LANs via the mesh works or breaks is a matter of luck. Wireless packet loss might mitigate the problem, but there is no guarantee. We can have a perfectly working mesh, but no working connection between two LANs.
STP can also decide to route traffic in the local wired LAN via bat0 and break the link via the wire in order to break the loop, since it is not aware of the properties of the links.
Both SECN devices in the same LAN can either communicate via bat0 (mesh) or eth0. A redundant path to STP! Hence STP will have to break one!
Now it is again a matter of luck whether SECN devices connected to the same LAN will communicate via LAN or over the air!
To top it all off, SECN device #1 and #2 may actually need a multihop path in the mesh to be able to connect to each other via WiFi.
While this doesn't cause any immediately obvious problems to the user if STP decides to cut the LAN connection (things seem to work, albeit terribly slow...) we are wasting our precious wireless bandwidth.
The batman-adv developers have learned this lesson and are dropping STP packets (patches from April 2012) that are trying to get flooded over the mesh. Hence bat0 is not transporting STP messages anymore. STP's effect is limited to the local LAN segment.
However, routing paths in a LAN with batman-adv and STP combined can still be weird. If STP decides to break the path between the bat0 interfaces of SECN devices in the same LAN, it is fine.
Because wireless links have packet loss STP could be more likely choose to break the mesh link. However, there is no guarantee.
So, not enabling STP will immediately cause bridge loops in the LAN segment, which is connected to two or more SECN nodes with batman-adv.
If the SECN networks that people build have no SECN devices interconnected via LAN, STP might be OK, at the risk of routing the LAN traffic via the mesh.