Enabling Anycast DNS with Esgob

May 15th, 2014 by

Nat Morris, UK Network Operators Forum director recently gave a presentation to DNS Operations, Analysis and Research centre, which included this remarkably nice slide:

Screen Shot 2014-05-12 at 12.25.22

What is Anycast?

Normally a server has a globally unique IP address, and the Internet knows how to send traffic from any other machine in the world to that IP address. With Anycast we share a single address across multiple machines, and your traffic is sent to the nearest machine with that address. This means that UK customers can be answered from a server in the UK and Australian customers from a server in Australia allowing you to have very fast responses to things like DNS queries because you’re always served by a server that’s close by, rather than your query having to travel half way around the world.

To set up an Anycast network, you need your own address space, your own network number (ASN), multiple BGP-aware routers that can announce your address space, and multiple servers that can answer the queries. Typically this would require a pretty hefty budget, but if you’re Nat Morris and you know what you’re doing with software routing on Linux, and you know all the right providers then you can bring up a global Anycast network with 10+ servers and sites on an annual budget of well under $1,000.

The key to doing this is finding ISPs, ideally well-connnected ISPs in key internet hubs, who will provide you with a BGP feed to your hosted server. That’s where a UK clueful hosting company comes into the picture having excellent connectivity, inexpensive virtual machines (VMs) and a willingness to support customers with more unusual configurations.

Quick introduction to BGP and routing

Normally when you have a VM you get a default route, which looks like this:

# ip route 
...
default via 93.93.128.1 dev eth1 

which says that to get to anywhere on the internet, send packets to our router at 93.93.128.1.

Over BGP, instead we send you the whole routing table:

# ip route 
...
1.0.7.0/24 via 5.57.80.128 dev eth3.4  proto zebra  metric 1 
1.0.20.0/23 via 93.93.133.46 dev eth6.220  proto zebra  metric 142080 
... 
500,000 more lines like this

For every block on the whole internet you have a different gateway depending on what you’ve decided is the preferred route. At today’s count this is about 490,000 entries in the routing table. Don’t type ‘route’ if you’re logged in over 3G!

So for this VM, instead of having a default route, Nat has four full BGP sessions, two to each of our two routers to the site. On each router, one session provides 490,000 IPv4 routes, the other provides 18,000 IPv6 routes, and the VM gets to decide which router to send data to.

The other side of the BGP relationship, and the important bit for Anycasting, is that we receive an advert from Nat’s VM for his /24 of IPv4 space and /48 of IPv6 space, which we then advertise out to the world. The 10+ other providers in this Anycast setup will do the same, and hosts will direct traffic to whichever is nearest.

Filtering

As Paul Vixie pointed out in the first question to Nat, the main customers of VMs with BGP are spammers who hijack address space for nefarious usage. At Mythic Beasts we filter our announcements and our customer routes, so if Nat messes up his configuration and accidentally announces that his VM is responsible for the whole of Youtube we’ll drop the announcement rather than expecting one very small VM to handle one fifth of the internet.

BGP on a virtual or dedicated server

If you’re a DNS provider or a content delivery network, you’ll probably want to have an Anycast setup at some point. At Mythic Beasts we remember what it was like to be the little guy which is why we offer full BGP routing (including IPv6 BGP) as an option to any virtual server, dedicated server, colocated server or router. Providing you own your own ASN and IP space we can transit it for you and we can keep the start-up costs very low and scale with you. You can locate your VM or server directly with us in Telecity, mere tens of metres from LINX and LoNAP for minimal latency and maximal available bandwidth.

If you’ve no idea what an ASN, BGP, LIR, RIPE are, we can help arrange your ASN, IP space and BGP config.

Router fails, no packets dropped.

January 29th, 2014 by

This morning one of our routers in our Cambridge data centre stopped reporting bandwidth data to our billing system. We investigated and whilst it was still routing packets without issue, it appeared to be experiencing hardware failure.

We’ve powered the router down, pending full investigation on our data centre visit this afternoon. Currently all traffic from our Cambridge site is being handled by our other router. This seamlessly failed over with no customer impact.

Depending on your choice of terminology ‘Redundancy has been reduced to N’, or ‘The network is at-risk’. In Mythic Beasts we like to speak English so this translates to, if something else fails before the router is restored to service, there is a risk of a network outage to our Cambridge data centre.

Update : Friday 31st we fully restored our network to it’s usual redundant configuration by replacing the router with a similarly over specified replacement. Customers may have received free bandwidth for some of this period.

More bits

January 10th, 2014 by

At the end of last year we took the decision to significantly upgrade our two connections to LINX – our busiest connections to the outside world.

This turned out to be a good plan as Mythic Beasts got a Christmas present in the form of a new company bandwidth record, thanks to two customers, Blinkbox Music and Raspberry Pi getting a substantial spike in hits as people unwrapped their Christmas presents.

And it seems that the excitement of all the presents hasn’t worn off, as the Christmas day record has just been toppled by a new all time high yesterday. With the Blinkbox apps very high in the free music app charts, we’re not expecting it to stand for long.

IPv6, End users starting to care

September 17th, 2013 by

We’ve had an IPv6 aware network for quite some time, and we’ve been gradually rolling it out to our services with the aim of eventually having every service we offer fully available over IPv6 and IPv4. We host the Raspberry Pi website which has an IPv6 only internal network, IPv6 only virtual machines and IPv4 on the front end to help out those of you with the ‘legacy’ internet.

A quick skim over the logfiles suggests that about 96% of you still access the site through the legacy IPv4 network – about 4% of hosts are now connecting over IPv6 which is starting to become a non trivial fraction of the traffic. Of course this is much higher than typical sites, Raspberry Pi users are much more technically aware than the general population.

Yesterday we had our first real connectivity problem to investigate – an end user within Ja.net (the UK academic network) was unable to access files from the Raspberry Pi download server on about half of the occasions. Further investigation showed that they could access the load balancers in our Sovereign House site with connectivity via the London Internet Exchange Juniper LAN, but not the load balancers in our Harbour Exchange site with connectivity over the London Internet Exchange Extreme LAN.

When we started investigating we confirmed that it seemed to be a problem with the Extreme LAN, if we forced the connectivity via the Juniper LAN it worked from both sites, if we forced it via the Extreme LAN it failed from both sites. Odder and odder though, a packet dump on our LINX interface didn’t show us passing the packets on.

Our IPv4 peering worked fine, this was IPv6 specific.

We then started looking at the routing table on the router. Over IPv4 it looks like

131.111.0.0/16 via 195.66.236.15 dev eth0

and over IPv6 it looks like

2001:630::/32 via fe80::5e5e:abff:fe23:2fc2 dev eth0

That gives us the netblock, and the next hop to send the packet to.

So the next step is to check you can reach the gateway happily enough.

# ping 195.66.236.15
PING 195.66.236.15 (195.66.236.15) 56(84) bytes of data.
64 bytes from 195.66.236.15: icmp_seq=1 ttl=64 time=0.220 ms

and

# ping6 fe80::5e5e:abff:fe23:2fc2
connect: Invalid argument

Odd. Then I realised that fe80:: in IPv6 means a link local address – the address is specific to the network card so to ping it you have to specify the destination address and the network interface.

# ping6 fe80::5e5e:abff:fe23:2fc2 -I eth9
PING fe80::5e5e:abff:fe23:2fc2(fe80::5e5e:abff:fe23:2fc2) from fe80::21b:21ff:fe65:a4c5 eth9: 56 data bytes
64 bytes from fe80::5e5e:abff:fe23:2fc2: icmp_seq=1 ttl=64 time=0.451 ms

Then the penny dropped. The routing table has eth0 in it but we’re actually connected to eth9. Under IPv4 this is fine because the next-hop address is globally unique and only accessible over eth9 so we send the packets out of eth9 and they go to the correct destination. Under IPv6 it’s a link local address and therefore valid over any interface, so we obey the routing table and throw the packets out of eth0 whereupon they fall onto the floor because there’s no fibre connected.

Fixing the config to put the right interface description in made it all work, and our end user is happily able to access all the load balancers on all the v6 addresses in all of the buildings.

Obviously if you’re a Mythic Beasts customer and you don’t already have an IPv6 allocation for your real or virtual server, drop us an email and we’ll hand you your own address space to play with.

New DNS resolvers

August 28th, 2013 by

We’ve upgraded our DNS resolvers in our SOV and HEX data centres. New features include DNSSEC validation and IPv6.

The addresses are,

SOV : 2a00:1098:0:80:1000::12 / 93.93.128.2
HEX : 2a00:1098:0:82:1000::10 / 93.93.130.2

They’re now DNSSEC aware and validating resolvers. That means if a site has correctly configured DNSSEC and we receive an answer that fails the security check we will return no answer rather than an incorrect/forged one.

To demonstrate the difference,

a non dns sec validating resolver :
# dig +short sigfail.verteiltesysteme.net
134.91.78.139

a mythic beasts server using our resolvers
# dig +short sigfail.verteiltesysteme.net
<no answer>
#

and on the DNS server it logs an error,

debug.log:28-Aug-2013 15:44:57.565 dnssec: info: validating @0x7fba880b69e0: sigfail.verteiltesysteme.net A: no valid signature found

and correctly drops the reply.

Googles DNS servers on 8.8.8.8 work the same as ours so we’re fairly confident that there will be no compatibility issues.

Downstream ASN

August 12th, 2013 by

With a customer of ours we have set them up their own full BGP network, split across two of our London sites. With advice from us we have

  • Helped them join RIPE as an LIR
  • Helped them apply for an IPv6 /32 and an ASN
  • Set up a full BGP IPv6 only network
  • Helped them apply for a final /22 of IPv4 space
  • Configured this in the global routing table

They have the option to now cable or fibre direct to peering exchanges and other ISPs should they wish to do so on individual machines hosted within our rackspace. In the mean time they’re taking advantage of our co-location, out of band access to their routers via serial and our IPv4 and IPv6 transit.

Joining the London Internet Exchange

August 7th, 2013 by

We’ve now joined the London Internet Exchange and are present on both of their peering LANs for redundancy. We’re connected to the Juniper LAN in Sovereign House and the Extreme LAN in Harbour Exchange. We’re now connected to three peering exchanges – Edge-ix,LoNAP and LINX-juniper in Sovereign House, and two – LINX-extreme and LoNAP in Harbour Exchange.

You can see the current traffic over the LINX public exchanges here

which is best described as rather a lot. We’re in the process of setting up more direct peers in addition to the route servers which provided immediate peering with hundreds of ISPs and tens of thousands of routes. So many UK destinations are now a few hops shorter – which probably won’t be very noticeable – but we have improved redundancy and increased capacity.

Dark Fibre

August 5th, 2013 by

Over the last twelve months we’ve made a series of networking changes and completely failed to blog about them. Our first announcement is that we now have a dark fibre ring around our core London sites.

This isn’t actually true. We now have a lit fibre ring around our core London sites. It’s currently running at 10Gbps and connects all of our routers together. All our routers connect to the local networks at 10Gbps so our entire network core is now 10Gbps. We also have some direct customer connections who are using our fibre as a layer 2 interlink between Telecity Sovereign House, Telecity Meridian Gate and Telecity Habour Exchange 6/7. Our standard is to offer a pair of ports in each site on redundant switches (so 6 x 1Gbps ports) with unlimited traffic between them.

As a result of our upgrade we’re able to continue to offer free traffic between all London hosted servers irrespective of the building the machines are in or which customer owns them – we bill only for traffic that leaves our network. Upgrading to progressively higher bandwidths is now straightforward as we can add CWDM / DWDM as required to increment in multiples of 10Gbits, or to 40Gbits or multiples of 40Gbits.

For those of you that are interested, the fibre lengths are

  • MER <-> SOV : 1672ns (or 1122ft)
  • SOV <-> HEX : 6423ns (or 4310ft)
  • HEX <-> MER : 5456ns (or 3687ft)

and the latencies across the network from core router to core router (average over 10 pings) are

  • MER <-> SOV : 0.096ms
  • SOV <-> HEX : 0.082ms
  • HEX <-> MER : 0.076ms

and from customer machine in SOV to customer machine in HEX, passing through at least two routers – 0.5ms.

10 GigE networking

August 5th, 2010 by

In May we upgraded our Telecity Meridian Gate site to have 10 Gigabit at the core. Early this week we upgraded the core network in Telecity Sovereign House to run at 10 Gigabit. We are planning to upgrade Telecity Harbour Exchange in the near future and to continue the rollout of 10GigE from the core switches. This means we’ve plenty of spare capacity for very high bandwidth customers in our docklands data centres.

Power failure in Telehouse North

July 22nd, 2010 by

Yesterday we believe was a power failure in Telehouse North. Mythic Beasts don’t have any equipment located in Telehouse but the effects were quite noticeable.

Two internet exchanges, LONAP and LINX were affected. The LONAP looking glass and traffic graph tell us that LONAP saw a all of the peers located in Telehouse North disconnect.

Lonap Traffic Graph



We don’t believe that LINX was directly affected by the power failure, but all sessions on the Brocade LAN were reset and brought up slowly over the course of about an hour, as you can see from the looking glass.

LINX Looking glass for the Brocade LAN

whereas the Extreme LAN wasn’t affected at all.

LINX Looking glass for the Extreme LAN

LINX Traffic Graph



Mythic Beasts saw no overall change in our traffic levels; we escaped unscathed.

Mythic Beasts Total Traffic



but we did see a brief drop on LONAP as various high bandwidth peers disconnected in Telehouse North.

Mythic Beasts LONAP Traffic



we didn’t see any measurable effect over Edge-IX (this traffic pattern is normal for this time of day)

Mythic Beasts Edge-IX Traffic



Mythic Beasts doesn’t currently peer directly on LINX, but we have two partial transit suppliers that do. Partial transit suppliers provide us with routes only from their peering partners so when they lose contact with a peer, we stop being able to route traffic to that network through them.

This partial transit supplier has 10G into the LINX Brocade LAN, 1G into the LINX Extreme LAN and 2G into LONAP plus private peers.

Mythic Beasts Transit 1



This partial transit supplier has 10G into LINX Brocade, 10G into LINX Extreme, 10G into AMSIX, 2.5G into Decix, 2G into LONAP and 1G into Edge-ix plus private peers.

Mythic Beasts Transit 2



We take partial transit from two suppliers, one in Telecity HEX 6/7, one in Telecity MER. Whilst this is more expensive than a single supplier or joining LINX ourselves, we’ve always felt that the additional redundancy was worth paying extra for. We discovered today that one partial transit supplier has almost no redundancy in the event of a failure of the LINX Brocade LAN. We’ve brought this up with the transit in question and will be pressuring them to add resiliency to their partial transit service. We do intend to join LINX, but when we do so we’ll join both the peering LANs from different data centres to maximise our resiliency.