The DNS’s Operations, Analysis and Research Center (DNS-OARC) was created nearly a decade ago in recognition that the DNS’s key position in the Internet’s architecture made it at significant risk of being both a victim and vehicle for various types of misuse. DNS-OARC was however inspired by a vision that co-operation, together with data gathering, sharing and analysis between the operator and research communities could both protect against such misuse, and lead to a deeper understanding of the DNS and Internet operations.
Since OARC’s founding in 2004, these issues have only become more critical, and the organization has grown from a project within ISC to an independent, neutral, non-profit, membership organization with dedicated staff and over 70 members. Its mission is to improve the security, stability, and understanding of the Internet’s DNS infrastructure.
As well as running twice-yearly workshops, various public benefit tools and inter-member co-operation platforms, OARC operates a number of large-scale data gathering initiatives, which collect data from its members’ infrastructure. One of these, initiated in 2004 in co-operation with CAIDA and funded by the NSF, is a “Day in the Life of the Internet“” (DITL). This gathers detailed data-sets of DNS queries to root and top-level DNS operators for a 48-hour period at least once a year. The idea is to have a baseline data archive which can be compared year-on-year, and data has also been gathered during significant change points in the global DNS, such as the IPv6 delegation and DNSSEC signing of the root. Over the past decade, OARC has accumulated a data-set in excess of 40Tb of DITL queries.
A critical component of OARC’s capabilities and contribution is the availability of raw operational data. Many underlying principles engineering and failure modes of Internet traffic are poorly understood, and it’s important to note not all the threats to reliable, secure Internet operation are malicious. Studies performed by OARC partners such as CAIDA demonstrate that a significant amount of unwanted DNS traffic and operational problems are caused by misconfiguration of DNS or applications that depend on it. The only way to improve this state of affairs is application of the scientific method to the study of these on the large scale.
ICANN has been a committed supporter of OARC since becoming a member in 2008, and has worked with OARC as L-Root operator supplying DNS data, support of various joint events and service infrastructure, and recently providing a Board member.
During 2012, a potential new obstacle on the path to deployment of ICANN’s new TLDs became apparent to the ICANN SSAC (Security and Stability Advisory Committee). A risk was identified that some of the proposed new TLDs were already in widespread internal-only use within enterprises, and on top of this, SSL certificates which had only ever been intended for such internal use had already been issued to these organizations. This could lead to a risk of collisions between valid internal use of these TLDs, and potentially malicious misuse of these certificates on the global Internet.
Clearly this was a potentially significant problem, with a tension between the interests of new TLD operators who want to see their new domains deployed as quickly as possible, versus some very real risks of abusive activity, or even just unintended consequences, either or both of which could have global impact.
When determining policies on how to proceed in such situations, it’s important to have data to base them upon. Given the tight deployment timescales, gathering new data from scratch could have been a significant and time-consuming exercise. Fortunately, it was quickly identified that OARC’s DITL data-set could contain evidence needed to help determine if the SSAC’s concerns were real ones in practice, and if so the extent of their severity. The log of queries to the root and TLD servers contain not just valid top-level domain strings, but also “leakage” of strings intended for internal-only use but which escape into the wider Internet due to various mis-configurations. It is exactly these kind of unintended consequences which can lead to the concerns expressed in the study, making the data gathered a useful sample of what could go astray or be exploited.
While OARC’s DITL data set was recognized as being of high relevance for this particular need, it is however important to understand that it is only one view of the DNS, and by no means a definitive or complete view: for example it only includes some queries to some root operators for a small time, and not for example to many other TLD operators or ISPs providing DNS resolver services to their subscribers. It is probably impossible to get a complete view of the DNS by traffic gathering techniques, and the value of multiple different approaches should not be overlooked.
Having identified the problem and the data-set which could be a solution, ICANN engaged Interisle and their subcontractors RTFM to perform the analysis. In the short term, this work was started by loaning CAIDA computing capacity located at OARC to produce the initial report.
In the meantime, however, a number of requirements needed to be tackled to perform further analysis of the data:
- DNS data submitted to OARC from jurisdictions across the world is potentially sensitive, and held in trust by OARC under strict confidentiality terms. This allows data submission by a much wider community than otherwise possible. However, these terms prevent the copying of the data from OARC’s archive to 3rd party systems.
- While the systems hosting OARC’s growing data set have been regularly updated over the years, much of its supporting infrastructure, including computing resources for doing in-situ data analysis by members and researchers, had not been upgraded since the original NSF bootstrap funding a decade earlier, and was in sore need of upgrading.
- Many new TLD operators wanted to become OARC members in order to both support its mission and carry out their own analysis of the DITL data sets at OARC, independently of the ICANN-sponsored work that had been carried out by Interisle/RTFM.
- This was all happening in the context of the pressing timescales of new TLD deployment and ICANN’s analysis and comments timescales.
Fortunately, as a result of a re-development plan committed to by OARC’s Board earlier in 2013, a major hardware and software refresh was already under way, and at the time the Collisions Strings study requirement was identified, OARC’s new Systems Engineer, William Sotomayor, was ready to deploy the new compute resources needed.
OARC was thus quickly able to take delivery of, and bring into service, the significant equipment donation of 4 x Dell r820-grade servers from ICANN, in addition to other similar servers donated by interested OARC members. These are very high spec machines, with 64-core processors and at least 48Gb of RAM. They take OARC’s analysis capability firmly into the present, and will be of immense value not just for ongoing Collisions studies, but for general-purpose needs of OARC member and researchers for some years into the future.
OARC has been in the business of “Big Data” for much of its existence, but it is only recently that the value of such large-scale data gathering has been widely defined and recognized. With this major contribution, and further donations of equipment and space to host it pending, OARC looks forward to participating in the innovation revolution of Cloud Computing and Big Data.
OARC’s ability to provide a solution to a problem that was not envisaged at its founding underlines the value of neutral general-purpose data gathering from the DNS in the wider context of “Internet Science”.
OARC wishes to gratefully acknowledge ICANN’s generosity for this equipment donation, and we look forward to continuing to work with ICANN, our other members, partners and the research community to continue meeting this need.
Making this happen quickly took the committed help of a number of parties to which OARC is grateful. We’d like in particular to thank Terry Manderson, ICANN’s new Director of DNS operations and his team for procuring the servers, CAIDA for lending compute capacity to allow Interisle/RTFM to progress their work in the meantime, and the Operations team at ISC, OARC’s hosting provider, for prompt remote hands efforts to get our servers up and running.