Introduction
Twice per year there is an event in the middle of Sweden. It’s a large festival where visitors bring their computers and play with each other, this is in our industry called a LAN-party. This will be a write up by two members of the team that builds network and infrastructure for this event and will give in-depth information about the struggles but also how we have solved an issue that has come up when building worlds largest LAN-party.
To start with we want to emphasize how happy we are that we get to work with latest hardware from Juniper when building this network. This post does not have anything to do with performance of the hardware, this is a write up on what we have done and how we have planned the network. We are aware that some of the technology is very early state and that there is a possibility for bugs but that is also what makes it fun for us.
This write-up will reflect on one recurring problem we have come by the last years of running with MX series routers from Juniper. And the reason we do this is that we quite fast noticed that there is very little information about troubles in MC-AE when you search for it and with this we hope to help some other network engineer who ends up having similar problems as we have faced. Before this we used Cisco ASR as our core routers but when the chance of working with Juniper came up we got very happy and it has given the team possibility to use completely new hardware and software. The environment consists of multiple different vendors today.
We also want to make clear, that the way we have built the network is not the only way, and we do not claim that it’s the right way either.
POP: https://i.imgur.com/UVDrkki.jpg (Current event DreamHack Summer 2019)
MX10003, QFX5110 & SRX4200: https://i.imgur.com/1IOSCxC.jpg (Current event DreamHack Summer 2019)
Setup
During the past years we’ve had a couple of different routers, but the basic setup is the same.
We have two MX series routers as core, with separate L2 interconnections for the switching and L3 for routing. Connected to these routers are the distribution switches for services and participants, and access switches to them.
https://i.imgur.com/LMks9FF.jpg
The routers terminate our internet connection and run BGP+BFD towards the ISP. These routers are standalone but run MC-AE with ICCP for synchronization, no virtual chassis. We have a couple of VRFs, so we run MP-BGP with MPLS L3-VPN to distribute prefixes between the routers, which handle all layer 3 in the network.
Connected to these core routers are the distribution switches. Each switch has one connection to each router in a LACP AE. The routers have one virtual switch per distribution switch since some VLAN IDs are reused, and use one VLAN in a QinQ over the interlink for vswitch to vswitch communication.
The picture below try to visualize how all these logical parts connect to each other.
https://i.imgur.com/rVvCsS5.png
Problem
The problem we have identified every time is that clients have problems getting IPv4 addresses assigned from DHCP. This issue then varies between networks (VLAN’s) in the network, some of them do not get DHCP at all and some of them get addresses directly when sending DHCP requests.
The problem seems to arise when both of the unicast DHCP OFFERs happens to be load shared to the “wrong” router from the service distribution switch, the packet with destination IP of R1’s gateway is sent to R2 which in turn routes it out via the IRB to the local subnet and switches it via the vswitch interlink to R1, which drops it.
Solutions
Active/Backup LACP from DHCP Server
By setting the AE facing the DHCP server in LACP active/backup mode, we force both replies towards one of the routers, one will always fail and one will always succeed. The drawback of this is that half of the link capacity is lost.
Announce local /32
By default, each address configured on the routers has a /32 entry in the local routing table from the protocol Local, with the BGP community NoReadvrt to avoid sending it to any peers. By setting “set routing-instances VPN_DH routing-options interface-routes family inet export lan” we remove this community and the /32 prefixes are distributed to the other router. So when a DHCP reply arrives at the “wrong” router and it performs its routing lookup, it will use this prefix instead of the /25 Direct one and route it over MPLS, instead of routing it to the local subnet and sending it over the switching link.
Here’s a part of the routing table after the fix:
77.80.129.128/25 *[Direct/0] 3d 01:27:59> via irb.104
[BGP/170] 2d 23:02:30, localpref 100, from 10.255.0.2
AS path: I, validation-state: unverified> to 10.255.1.2 via ae1.2, Push 16
77.80.129.129/32 *[Local/0] 3d 01:27:56
Local via irb.104
[BGP/170] 2d 23:02:30, localpref 100, from 10.255.0.2
AS path: I, validation-state: unverified> to 10.255.1.2 via ae1.2, Push 16
77.80.129.130/32 *[Local/0] 3d 01:27:59
Local via irb.104
77.80.129.131/32 *[BGP/170] 2d 23:02:30, localpref 100, from 10.255.0.2 <<<<<< This the Local /32 advertised after the fix.
AS path: I, validation-state: unverified> to 10.255.1.2 via ae1.2, Push 16
MPC5E line cards
We’ve only experienced this problem on MX960 MPC4E (MPC4E 3D 32XGE) line cards and MX10003 (LC2103 with MIC1-MACSEC), not MX960 MPC5E (MPC5E 3D 24XGE+6XLGE).
We tried multiple softwares when troubleshooting MPC4E, including 15.1R6-S3 and 17.3r1.10.
The end
This article is written by
Oscar Ekeroth
@zmegolaz
Markus Viitamäki
@suom1