Although the DNS protocol has a number of attributes (such as the stateless nature of the UDP-based DNS transport protocol, and automatic failover between nameservers) which make it possible to sustain 100% uptime — something which we’re proud to have achieved for well over two decades — it’s by no means easy, and requires significant investment and effort to develop and maintain a DNS platform which can support hundreds of top-level domains. We have invested a lot of time and effort into building and maintaining our anycast DNS infrastructure, so that it could expand and grow at the same time as the TLDs that run on our registry platform have grown.
As with any mission-critical, large-scale technology platform, effective monitoring and analysis is essential to maintaining our DNS system. We make use of a huge range of monitoring tools and systems, that give us insights into almost every aspect of the system: from low-level things like how much storage, memory and compute resources are being used, through to high-level metrics such as overall service availability and performance, zone update speed, and DNSSEC chain-of-trust validity.
Many of the tools we use are the standard monitoring tools used by many organisations, such as Checkmk, LibreNMS, Pingdom, and DNSperf. However, some have been developed internally by our engineering teams: either because our requirements are sufficiently specialised that an off-the-shelf solution doesn’t exist (there aren’t a lot of organisations who need to monitor a Whois server, for example), or because existing solutions can’t scale to the size of our infrastructure.
These files are then transferred to a centralised aggregator, which analyses them and writes statistical information into a database. Many organisations use Hedgehog as a GUI to present this data to operators - and again, for many years, CentralNic did too.
However, as the number of domains on our platform grew, and the volume of query traffic exploded, DSC stopped being a feasible solution for doing DNS analytics. The volume of data that was captured exceeded the capacity of the aggregator to process. As a result we ended up with enormous backlogs of XML files on both the aggregator and on the anycast nodes, resulting in filled disks, analytics reports which were out of date by the time they could be created, and other issues, which created a lot of pain for our operations teams.
As a result, we decided to retire our DSC-based analytics system and decided to build our own. What follows is a description of what we built, and why we built it.
As it happens, at the time we made this decision, many other DNS operators were having similar issues, and in the last year a number of developments have occurred in the DNS monitoring space, many of which use similar technologies to those we used when building our system. We’ll talk a bit about those later on in this article.
Requirements for a new DNS analytics system
We built our new analytics system to the following requirements:
- No impact on authoritative DNS performance: the new system should not interfere with the performance of the DNS system.
- Real-time analytics: we don’t want to wait for offline processing. The system should provide information that is updated in real-time (or as close to real-time as we can manage).
- Only measure what we care about: to ensure the privacy of end users, and reduce storage requirements, the system should be customizable so that we only capture what we need, and discard everything else (for example, EDNS client subnet information).
- Make data available to our stakeholders: we wanted to create a system that would let us provide access to analytics data to our registry partners. This means, for example, that the system needs to make it easy to segregate data by TLD, since the registry operator for .xyz should not be able to see data about .site (and vice versa).
- API-driven design: while a pretty dashboard is always nice, the real value of this system will come from an API, which will allow us to integrate the system with other systems (and with our registry partner’s systems).
Smart collector vs smart aggregator
A key decision in the design process was whether to have a smart collector or a smart aggregator. In any large-scale distributed data processing system, there is a choice to be made about where to process data elements: if packets are analysed on the anycast nodes themselves, compute resources will be used that would otherwise be available to answer DNS queries, but if the packets are analysed on the aggregator, then the workload might exceed the available capacity, which might result in data loss, or the additional cost/complexity associated with scaling the aggregator system across multiple servers.
In the end, we decided to split the difference, and have the collector do the DNS packet parsing, while the aggregator would do the statistical analysis. This allows us to leverage the compute resources of our anycast network, which only uses less than 0.5% of its total capacity during normal operations.
Adaptive sampling
Our existing solution, DSC, captured every single DNS query that was handled by our anycast nodes. Since we handle billions of queries every day, that resulted in a pretty enormous amount of data, even though DNS packets are usually relatively small, which was aggravated by DSC’s use of XML (a rather verbose format) to serialise the data being sent to the aggregator.
We decided that our new analytics system should implement adaptive sampling. By default, only a fraction of all the DNS queries seen by the server would be captured for processing. The system would allow us to vary the sample rate, so we could (if necessary) reduce the sampling rate to reduce the load on the anycast nodes, or increase the sample rate to obtain more accurate data.
We also designed the system so that we could implement two different sampling styles: a deterministic style, where every nth packet is captured (and the rest discarded), and a random style, where the probability of an individual packet being captured is 1/n. This design choice also allows us to tune the configuration of the system in order to get the right balance between data quality and system performance (since random sampling provides better data quality but requires more computing power to implement, and consumes entropy that may impact on cryptographic operations happening elsewhere on the system).
Data serialisation
DSC uses daemon running on the DNS server which writes DNS packet data to disk, serialised in XML format. These files are then copied onto the collector. This has a number of disadvantages that we wanted to avoid with the new system.
Rather than write data to disk, our system would directly transmit that data to the aggregation server using a persistent UDP socket. UDP was chosen as the transport since we’re not especially concerned if some of the packets get lost - we’re already sampling a relatively small fraction of the DNS traffic, so a modest amount of packet loss is acceptable and would not materially affect the accuracy of the system.
To avoid packet fragmentation, we therefore needed a data serialisation that was much more compact than a verbose syntax such as XML, or even JSON. A binary format such as Protocol Buffers or CBOR seemed more appropriate, and in the end, we chose CBOR since it is schemaless like JSON, and is well supported by many different programming languages.
To provide a modicum of confidentiality, the UDP packets are encrypted using a symmetric AES cipher. Strong guarantees of confidentiality are not required since an on-path attacker would already be able to see the DNS packets from which the CBOR packets are derived.
Phase 1 (prototype): capd and aggd
The initial prototype for the analytics system was written and deployed in 2019. It consisted of two components:
- capd, which ran on each anycast node and captured DNS packets;
- aggd, which ran on the central collector and received analytics data from the anycast nodes.
Both components were written in Perl, which offers excellent support for DNS, libpcap, and CBOR. The capd component used the Net::Pcap, NetPacket and Net::DNS libraries to capture and parse network packets, and then construct an encrypted CBOR packet (containing the data we wanted to analyse) which it then sent to the aggregator. It also used the MaxMind GeoIP2 database to determine the country of origin of each query, based on the destination address of the response packet.
The aggd component listened on a UDP port, received packets from the anycast nodes, decrypted them, and then wrote the raw data into an SQLite database. By handing off the packets into a staging database, we were able to reduce the amount of work aggd had to do to process each incoming packet, which maximised the overall throughput of the system.
A third component, which was run every few minutes from crond, extracted the raw packet data from the SQLite database, and updated a MariaDB database, which hosted the time-series tables containing the analytics data. Each time series tracked a specific metric, whether global, per-node or per-zone. This script could digest five minutes’ worth of data in a few seconds. We implemented multiple backends: one to feed data into our Registry DNS Analytics system, and one for our Managed DNS environment.
Phase 2: (ready for production): pycapd and pyaggd
Once the prototype had successfully been deployed and tested, it was passed to the Registry operations team to implement as a permanent feature of the Anycast DNS system. At the time, the Operations Team had recently decided to standardize on Python for its internal tooling. This meant that the original capd and aggd would be rewritten in Python, while maintaining backwards compatibility with the original Perl prototypes.
Analytics front-ends: Grafana and the Registry DNS Analytics API and Dashboard
We use Grafana to provide a global, system-wide dashboard of DNS system performance for internal use. The screenshot below provides an example of what the dashboard looks like:
The Dashboard gives our DNS Operations Team real-time access to a variety of metrics, allowing us to quickly detect and respond to unexpected changes in query volume and other KPIs.
We also implemented a simple analytics front-end for our registry partners, including a dashboard and an REST API, allowing them to access key performance metrics for their TLDs, including overall query rate, query rate by source country, query type, transport, protocol, and response code, and a report on the most commonly queried domain names. This data is extremely useful in a number of areas, including identifying potential abusive, premium or popular domain names in a given TLD.
Future work
While the internal Registry DNS Analytics Dashboard is already being used by our DNS operations team, we still have some work to do to make the external Dashboard available to our Registry partners: we hope to make it available as a beta version in early 2021.
Around the community
In the time that we’ve been working on our own solution, a number of open source projects have been working on improving the state-of-the-art of DNS analytics. dnstap is now widely supported by many open source DNS servers, including Knot, Unbound, BIND, NSD, and PowerDNS, and Jerry Lundström of DNS-OARC has also added dnstap support to DSC. dnstap packets (which are encoded using Protocol Buffers rather than CBOR) can be transmitted to a receiver over UNIX sockets to a local process, or remotely over TCP or UDP (just like how our capd transmits packets) to a central aggregator.