This is the virtual conference schedule for PAM 2023.
All times are in Central European Time (CET).
The conference proceedings are available on Springer and open access will be available from 14th March 2023 until 14th April 2023 for all papers.
Recordings of the talks will be available on the PAM YouTube channel after the conference.
Program at a glance
|Time||Tuesday, 21 March 2023||Wednesday, 22 March 2023||Thursday, 23 March 2023|
|14:00||Opening session + Award announcements||Keynote II: Andra Lutu (Telefónica Research)||Security & Privacy (3 papers)|
|14:30||Keynote I: Roya Ensafi (University of Michigan)|
|15:00||Measurement tools (3 papers)||Topology II (2 papers, 40 min)|
|15:30||VPN and Infrastructure (3 papers)||20 min break|
|16:00||30 min break||DNS (3 papers)|
|16:30||30 min break||Networks performance (3 papers)|
|17:00||TLS (3 papers)||Web (3 papers) + Closing remarks|
|17:30||Topology I (2 papers) + Wrap up day!|
|18:00||Applications (2 papers) + Wrap up day!|
Tuesday, 21 March 2023
- 14:00 - 14:30 - Opening session + Award announcements - Anna Brunström, Marcel Flores and Marco Fiore
- 14:30 - 15:30 - Keynote I: Protecting Users from Adversarial Networks, Roya Ensafi (University of Michigan) - Session Chair: Marcel Flores (Edgio)
Abstract: The Internet has become a hostile place for users traffic. Network-based actors, including ISPs and governments, increasingly practice sophisticated forms of censorship, content injection, and traffic throttling, as well as surveillance and other privacy violations. My work attempts to expose these threats and develop technologies to better safeguard users. Detecting and defending against adversarial networks is challenging, especially at global scale, due to the Internet’s vast size and heterogeneity, the powerful capabilities of in-network threat actors, and the lack of ground-truth on the counterfactual traffic that would exist in the absence of interference. Overcoming these challenges requires new techniques and systems, both for collecting and interpreting evidence of hostile networks and for building defensive tools that effectively meet user needs. In this talk, I’ll first cover my approach to monitoring Internet censorship. I introduced an entirely new family of censorship measurement techniques, based on network side-channels, that can remotely detect censorship events occurring between distant pairs of network locations. To overcome the systems and data science challenges of operating these techniques and synthesizing their results into a holistic view of online censorship, my students and I created Censored Planet, a censorship observatory that continuously tests the reachability of thousands of popular or sensitive sites from over 100,000 vantage points in 221 countries. Next, I’ll discuss our efforts to understand and defend the consumer VPN ecosystem. Although millions of end-users rely on VPNs to protect their privacy and security, this multibillion-dollar industry includes numerous snakeoil products, is laxly regulated, and remains severely understudied. To address this, my lab created VPNalyzer, a project that aims to bring transparency and better security to consumer VPNs. Our work includes a cross-platform test suite that crowd-sources VPN security testing, coupled with large-scale user studies that aim to understand the needs and threat models of VPN users.
- 15:30 - 16:30 - VPN and Infrastructure - Session Chair: Matt Calder (Meta / Columbia University)
Measuring the Performance of iCloud Private RelayMartino Trevisan (University of Trieste), Idilio Drago (University of Turin), Paul Schmitt (University of Hawaii), Francesco Bronzino (École normale supérieure de Lyon)Abstract: Recent developments in Internet protocols and services aim to provide enhanced security and privacy for user traffic. Apple's iCloud Private Relay is a premier example of this trend, introducing a well-provisioned, multi-hop architecture to protect the privacy of users' traffic while minimizing the traditional drawbacks of additional network hops (e.g., latency). Announced in 2021, the service is currently in the beta stage, offering an easy and cheap privacy-enhancing alternative directly integrated into Apple's operating systems and core applications. This seamless integration makes a future massive adoption of the technology very likely, calling for studies on its impact on the Internet. Indeed, the iCloud Private Relay architecture inherently introduces computational and routing overheads, possibly hampering performance. In this work, we study the service from a performance perspective, across a variety of scenarios and locations. We show that iCloud Private Relay not only reduces speed test performance (up to 10x decrease) but also negatively affects page load time and download/upload throughput in different scenarios. Interestingly, we find that the overlay routing introduced by the service may increase performance in some cases. Our results call for further investigations into the effects of a large-scale deployment of similar multi-hop privacy-enhancing architectures. For increasing the impact of our work and to aid in reproducibility, we contribute our testbed software and measurements to the community.
Characterizing the VPN Ecosystem in the WildAniss Maghsoudlou (Max Planck Institute for Informatics), Lukas Vermeulen (Max Planck Institute for Informatics), Ingmar Poese (BENOCS), Oliver Gasser (Max Planck Institute for Informatics)Abstract: With the shift to working remotely after the COVID-19 pandemic, the use of Virtual Private Networks (VPNs) around the world has nearly doubled. Therefore, measuring the traffic and security aspects of the VPN ecosystem is more important now than ever. It is, however, challenging to detect and characterize VPN traffic since some VPN protocols use the same port number as web traffic and port-based traffic classification will not help. VPN users are also concerned about the vulnerabilities of their VPN connections due to privacy issues. In this paper, we aim at detecting and characterizing VPN servers in the wild, which facilitates detecting the VPN traffic. To this end, we perform Internet-wide active measurements to find VPN servers in the wild, and characterize them based on their vulnerabilities, certificates, locations, and fingerprinting. We find 9.8M VPN servers distributed around the world using OpenVPN, SSTP, PPTP, and IPsec, and analyze their vulnerability. We find SSTP to be the most vulnerable protocol with more than 90% of detected servers being vulnerable to TLS downgrade attacks. Of all the servers that respond to our VPN probes, 2% also respond to HTTP probes and therefore are classified as Web servers. We apply our list of VPN servers to the traffic from a large European ISP and observe that 2.6% of all traffic is related to these VPN servers.
Stranger VPNs: Investigating the Geo-Unblocking Capabilities of Commercial VPN ProvidersEtienne Khan (University of Twente), Anna Sperotto (University of Twente), Jeroen van der Ham (University of Twente & NCSC-NL), Roland van Rijswijk-Deij (University of Twente)Abstract: Commercial Virtual Private Network (VPN) providers have steadily increased their presence in Internet culture. Their most advertised use cases are preserving the user's privacy, or circumventing censorship. However, a number of VPN providers nowadays have added what they call a streaming unblocking service. In practice, such VPN providers allow their users to access streaming content that Video-on-Demand (VOD) providers do not provide in a specific geographical region. In this work, we investigate the mechanisms by which commercial VPN providers facilitate access to geo-restricted content, de-facto bypassing VPN-detection countermeasures by VOD providers (blocklists). We actively measure the geo-unblocking capabilities of 6 commercial VPN providers in 4 different geographical regions during two measurements periods of 7 and 4 months respectively. Our results identify two methods to circumvent the geo-restriction mechanisms. These methods consist of: (1) specialized ISPs/hosting providers which do not appear on the blocklists used by content providers to geo-restrict content and (2) the use of residential proxies, which due to their nature also do not appear in those blocklists. Our analysis shows that the ecosystem of the geo-unblocking VPN providers is highly dynamic, adapting their chosen geo-unblocking mechanisms not only over time, but also according to different geographical regions.
- 16:30 - 17:00 - Break
- 17:00 - 18:00 - TLS - Session Chair: Oliver Gasser (Max Planck Institute for Informatics)
Exploring the evolution of the TLS certificate ecosystemSyed Muhammad Farhan (Virginia Tech), Taejoong Chung (Virginia Tech)Abstract: A vast majority of popular communication protocols for the internet employ the use of TLS (Transport Layer Security) to secure communication. As a result, there have been numerous efforts including the introduction of Certificate Transparency logs and Free Automated CAs to improve the TLS certificate ecosystem. Our work highlights the effectiveness of these efforts using the Certificate Transparency dataset as well as certificates collected via full IPv4 scans. We show that a large proportion of invalid certificates still exists and outline the reasons why these certificates are invalid and where they are hosted. Additionally, an increasing proportion of certificates are issued by a handful of CAs using a handful of keys. Moreover, we show that the incorrect use of template certificates has led to incorrect SCTs being embedded in the certificates. Taken together, our results emphasize the continued involvement of the research community to improve the web’s PKI ecosystem.
Analysis of TLS Prefiltering for IDS AccelerationLukas Sismis (CESNET), Jan Korenek (Faculty of Information Technology Brno University of Technology)Abstract: Network intrusion detection systems (IDS) and intrusion prevention systems (IPS) have proven to play a key role in securing networks. However, due to their computational complexity, the deployment is difficult and expensive. Therefore, many times the IDS is not powerful enough to handle all network traffic on high-speed network links without uncontrolled packet drop. High-speed packet processing can be achieved using many CPU cores or an appropriate acceleration. But the acceleration has to preserve the detection quality and has to be flexible to handle ever-emerging security threats. One of the common acceleration methods among intrusion detection/prevention systems is the bypass of encrypted packets of the Transport Layer Security (TLS) protocol. This is based on the fact that IDS/IPS cannot match signatures in the packet encrypted payload. The paper provides an analysis and comparison of available TLS bypass solutions and proposes a high-speed encrypted TLS Prefilter for further acceleration. We are able to demonstrate that using our technique, the IDS performance has tripled and at the same time detection results have resulted in a lower rate of false positives. It is designed as a software-only architecture with support for commodity cards. However, the architecture allows smooth transfer of the proposed method to the HW-based solution in Field-programmable gate array (FPGA) network interface cards (NICs).
DissecTLS: A Scalable Active Scanner for TLS Server Configurations, Capabilities, and TLS FingerprintingMarkus Sosnowski (Technical University of Munich), Johannes Zirngibl (Technical University of Munich), Patrick Sattler (Technical University of Munich), Georg Carle (Technical University of Munich)Abstract: Collecting metadata from Transport Layer Security (TLS) servers on a large scale allows to draw conclusions about their capabilities and configuration. This provides not only new insights into the Internet but it enables use cases like detecting malicious Command and Control(C&C) servers. However, active scanners can only observe and interpret the behavior of TLS servers, the underlying configuration and implemen-tation causing the behavior remains hidden. Existing approaches struggle between resource intensive scans that can reconstruct this data and light-weight fingerprinting approaches that aim to differentiate servers without making any assumptions about their inner working. With this workwe propose DissecTLS, an active TLS scanner that is both light-weight enough to be used for Internet measurements and able to reconstruct the configuration and capabilities of the TLS stack. This was achieved by modeling the parameters of the TLS stack and derive an active scan that dynamically creates scanning probes based on the model and the previous responses from the server. We provide a comparison of five active TLS scanner and fingerprinting approaches in a local testbed and on toplist targets. We conducted a measurement study over 9 weeks to fingerprint C&C servers and to analyze popular and deprecated TLS parameter usage. Similar to related work, the fingerprinting achieved a 99 % precision; however, we improved the recall by a factor of 2.8.
- 18:00 - 19:00 - Applications - Session Chair: Simone Ferlin (Red Hat AB, Karlstad University)
A Measurement-Derived Functional Model for the Interaction between Congestion Control and QoE in Video ConferencingJia He (Georgia Institute of Technology), Mostafa Ammar (Georgia Institute of Technology), Ellen Zegura (Georgia Institute of Technology)Abstract: Video Conferencing Applications (VCAs) that support remote work and education have increased in use over the last two years, contributing to Internet bandwidth usage. VCA clients transmit video and audio to each other in peer-to-peer mode or through a bridge known as a Selective Forwarding Unit (SFU). Popular VCAs implement congestion control in the application layer over UDP and accomplish rate adjustment through video rate control, ultimately affecting end user Quality of Experience (QoE). Researchers have reported on the throughput and video metric performance of specific VCAs using structured experiments. Yet prior work rarely examines the interaction between congestion control mechanisms and rate adjustment techniques that produces the observed throughput and QoE metrics. Understanding this interaction at a functional level paves the way to explain observed performance, to pinpoint commonalities and key functional differences across VCAs, and to contemplate opportunities for innovation. To that end, we first design and conduct detailed measurements of three VCAs (WebRTC/Jitsi, Zoom, BlueJeans) to develop understanding of their congestion and video rate control mechanisms. We then use the measurement results to derive our functional models for the VCA client and SFU. Our models reveal the complexity of these systems and demonstrate how, despite some uniformity in function deployment, there is significant variability among the VCAs in the implementation of these functions.
Effects of Political Bias and Reliability on Temporal User Engagement with News Articles Shared on FacebookAlireza Mohammadinodooshan (Linköping University), Niklas Carlsson (Linköping University)Abstract: The reliability and political bias differ substantially between news articles published on the Internet. Recent research has examined how these two variables impact user engagement on Facebook, reflected by measures like the volume of shares, likes, and other interactions. However, most of this research is based on the ratings of publishers (not news articles), considers only bias or reliability (not combined), focuses on a limited set of user interactions, and ignores the users' engagement dynamics over time. To address these shortcomings, this paper presents a temporal study of user interactions with a large set of labeled news articles capturing the temporal user engagement dynamics, bias, and reliability ratings of each news article. For the analysis, we use the public Facebook posts sharing these articles and all user interactions observed over time for those posts. Using a broad range of bias/reliability categories, we then study how the bias and reliability of news articles impact users' engagement and how it changes as posts become older. Our findings show that the temporal interaction level is best captured when bias, reliability, time, and interaction type are evaluated jointly. We highlight many statistically significant disparities in the temporal engagement patterns (as seen across several interaction types) for different bias-reliability categories. The shared insights into these differences in engagement dynamics are expected to benefit both publishers and content moderators. For example, publishers may want to augment their prediction models that encompass temporal aspects to more accurately predict user engagement, and moderators can adjust their efforts based on the category-dependent stages of the posts' lifecycle.
Wednesday, 22 March 2023
- 14:00 - 15:00 - Keynote II: Deep Dive into Interconnection in the Mobile Ecosystem, Andra Lutu (Telefónica Research) - Session Chair: Anna Brunstrom (Karlstad University)
Abstract: The IP eXchange (IPX) Network interconnects about 800 Mobile Network Operators (MNOs) worldwide and a range of other service providers (such as cloud and content providers) to enable global data roaming. Global roaming now supports the fast growth of the Internet of Things (IoT), while it also responds to the insatiable demand coming from digital nomads, who adhere to a lifestyle where they connect from anywhere in the world. In this talk, we’ll take a first look into this so-far opaque mobile ecosystem, and present first-of-its-kind in-depth analysis of an operational IPX Provider (IPX-P). The IPX Network is a private network formed by a small set of tightly interconnected IPX-Ps. We analyze an operational dataset from a large IPX-P that includes BGP data as well as statistics from signaling. We shed light on the structure of the IPX Network as well as on the temporal, structural and geographic features of the IPX traffic. Our results are a first step to fully understand the global mobile Internet, especially since it now represents a pivotal part in connecting IoT devices and digital nomads all over the world. Finally, we discuss the different operator models, the limitations of current “global” operators on the market that leverage the IPX Network, and how we envision the next generation global operator model.
- 15:00 - 16:00 - Measurement tools - Session Chair: Vasileios Giotsas (Lancaster University)
Efficient continuous latency monitoring with eBPFSimon Sundberg (Karlstad University), Anna Brunstrom (Karlstad University), Simone Ferlin-Reiter (Red Hat), Toke Høiland-Jørgensen (Red Hat), Jesper Dangaard Brouer (Red Hat)Abstract: Network latency is a critical factor for the perceived quality of experience for many applications. With an increasing focus on interactive and real-time applications, which require reliable and low latency, the ability to continuously and efficiently monitor latency is becoming more important than ever. Always-on passive monitoring of latency can provide continuous latency metrics without injecting any traffic into the network. However, software-based monitoring tools often struggle to keep up with traffic as packet rates increase, especially on contemporary multi-Gbps interfaces. We investigate the feasibility of using eBPF to enable efficient passive network latency monitoring by implementing an evolved Passive Ping (ePPing). Our evaluation shows that ePPing delivers accurate RTT measurements and can handle over 1 Mpps, or correspondingly over 10 Gbps, on a single core, greatly improving on state-of-the-art software based solutions, such as PPing.
Back-to-the-Future Whois: An IP Address Attribution Service for Working with Historic DatasetsFlorian Streibelt (Max Planck Institute for Informatics), Martina Lindorfer (TU Wien), Seda Gürses (TU Delft), Carlos H. Gañán (TU Delft), Tobias Fiebig (Max Planck Institute for Informatics)Abstract: Researchers and practitioners often face the issue of having to attribute an IP address to an organization. For current data this is comparably easy, using services like whois or other databases. Similarly, for historic data, several entities like the RIPE NCC provide websites that provide access to historic records. For large-scale network measurement work, though, researchers often have to attribute millions of addresses. For current data, Team Cymru provides a bulk whois service which allows bulk address attribution. However, at the time of writing, there is no service available that allows historic bulk attribution of IP addresses. Hence, in this paper, we introduce and evaluate our ‘Back-to-the-Future whois’ service, allowing historic bulk attribution of IP addresses on a daily granularity based on CAIDA Routeviews aggregates. We provide this service to the community for free, and also share our implementation so researchers can run instances themselves.
Towards diagnosing accurately the performance bottleneck of software-based network function implementationRu Jia (Institute of Computing Technology Chinese Academy of Sciences; University of Chinese Academy of Sciences; Sorbonne University), Heng Pan (Institute of Computing Technology Chinese Academy of Sciences; Purple Mountain Laboratories), Haiyang Jiang (Institute of Computing Technology Chinese Academy of Sciences), Serge Fdida (Sorbonne University), Gaogang Xie (Computer Network Information Center Chinese Academy of Sciences)Abstract: The software-based Network Functions (NFs) improve the flexibility of network services. Comparing with hardware, NFs have specific behavioral characteristics. Performance diagnosis is the first and most difficult step during NFs' performance optimization. Does the existing instrumentation-based and sampling-based performance diagnosis methods work well in NFs' scenario? In this paper, we first re-think the challenges of NF performance diagnosis and correspondingly propose three requirements: fine granularity, flexibility and perturbation-free. We investigate existing methods and find that none of them can simultaneously meet these requirements. We innovatively propose a quantitative indicator, Coefficient of Interference (CoI). CoI is the fluctuation between per-packet latency measurements with and without performance diagnosis. CoI can represent the performance perturbation degree caused by diagnosis process. We measure the CoI of typical performance diagnosis tools with different types of NFs and find that the perturbation caused by instrumentation-based diagnosis solution is 7.39% to 74.31% of that by sampling-based solutions. On these basis, we propose a hybrid NF performance diagnosis, to trace the performance bottleneck of NF accurately.
- 16:00 - 16:30 - Break
- 16:30 - 17:30 - Networks performance - Session Chair: Andrea Morichetta (Vienna University of Technology)
Evaluation of the ProgHW/SW Architectural Design Space of Bandwidth EstimationTianqi Fang (University of Nebraska-Lincoln), Lisong Xu (University of Nebraska-Lincoln), Witawas Srisa-an (University of Nebraska-Lincoln), Jay Patel (University of Nebraska-Lincoln)Abstract: Bandwidth estimation (BWE) is a fundamental functionality in congestion control, load balancing, and many network applications. Therefore, researchers have conducted numerous BWE evaluations to improve its estimation accuracy. Most current evaluations focus on the algorithmic aspects or network conditions of BWE. However, as the architectural aspects of BWE gradually become the bottleneck in multi-gigabit networks, many solutions derived from current works fail to provide satisfactory performance. In contrast, this paper focuses on the architectural aspects of BWE in the current trend of programmable hardware (ProgHW) and software (SW) co-designs. Our work makes several new findings to improve BWE accuracy from the architectural perspective. For instance, we show that offloading components that can directly affect inter-packet delay (IPD) is an effective way to improve BWE accuracy. In addition, to handle the architectural deployment difficulty not appeared in past studies, we propose a modularization method to increase evaluation efficiency.
An In-Depth Measurement Analysis of 5G mmWave PHY Latency and its Impact on End-to-End DelayRostand A. K. Fezeu (University of Minnesota - Twin Cities), Eman Ramadan (University of Minnesota - Twin Cities), Wei Ye, Benjamin Minneci (University of Minnesota - Twin Cities), Jack Xie (University of Minnesota - Twin Cities), Arvind Narayanan (University of Minnesota - Twin Cities), Ahmad Hassan (University of Minnesota - Twin Cities), Feng Qian (University of Minnesota - Twin Cities), Zhi-Li Zhang (University of Minnesota - Twin Cities), Jaideep Chandrashekar (Interdigital ), Myungjin Lee (Cisco Systems)Abstract: 5G aims to offer not only significantly higher throughput than previous generations of cellular networks, but also promises milli-second (ms) and sub-ms (ultra-)low latency support at the 5G physical (PHY) layer for future applications. While prior measurement studies have confirmed that commercial 5G deployments can achieve up to several Gigabits per second (Gbps) throughput (especially with the mmWave 5G radio), are they able to deliver on the (sub) milli-second latency promise? With this question in mind, we conduct to our knowledge the first in-depth measurement study of 5G mmWave PHY latency using detailed physical channel events and messages. Through carefully designed experiments and data analytics, we dissect various factors that influence 5G PHY latency of both downlink and uplink data transmissions, and explore their impacts on end-to-end delay. We find that while under the best cases, the 5G (mmWave) PHY layer is capable of delivering ms/sub-ms latency (with a minimum of 0.09 ms for downlink and 0.76 ms for uplink), these happen rarely. A variety of factors such as channel conditions, re-transmissions, physical layer control and scheduling mechanisms, mobility, and application (edge) server placement can all contribute to increased 5G PHY latency (and thus end-to-end delay). Our study provides insights to 5G vendors, carriers as well as application developers/content providers on how to better optimize or mitigate these factors for improved 5G latency performance.
A Characterization of Route Variability in LEO Satellite NetworksVaibhav Bhosale (Georgia Institute of Technology), Ahmed Saeed (Georgia Institute of Technology), Ketan Bhardwaj (Georgia Institute of Technology), Ada Gavrilovska (Georgia Institute of Technology)Abstract: LEO satellite networks possess highly dynamic topologies, with satellites moving at 27,000 km/hour to maintain their orbit. As satellites move, the characteristics of the satellite network routes change, triggering rerouting events. Frequent rerouting can cause poor performance for path-adaptive algorithms (e.g., congestion control). In this paper, we provide a thorough characterization of route variability in LEO satellite networks, focusing on route churn and RTT variability. We show that high route churn is common, with most paths used for less than half of their lifetime. With some paths used for just a few seconds. This churn is also unnecessary with rerouting leading to marginal gains in most cases (e.g., less than a 15% reduction in RTT). Moreover, we show that the high route churn is harmful to network utilization and congestion control performance. By examining RTT variability, we find that the smallest achievable RTT between two ground stations can increase by 2.5× as satellites move in their orbits. We show that the magnitude of RTT variability depends on the location of the communicating ground stations, exhibiting a spatial structure. Finally, we show that adding more satellites, and providing more routes between stations, does not necessarily reduce route variability. Rather, constellation configuration (i.e., the number of orbits and their inclination) plays a more significant role. We hope that the findings of this study will help with designing more robust routing algorithms for LEO satellite networks.
- 17:30 - 18:30 - Topology I + Wrap up day! - Session Chair: Lars Prehn (Max Planck Institute for Informatics)
Improving the Inference of Sibling Autonomous Systems Best Community ArtifactZhiyi Chen (Georgia Institute of Technology), Zachary S. Bischof (Georgia Institute of Technology), Cecilia Testart (Georgia Institute of Technology), Alberto Dainotti (Georgia Institute of Technology)Abstract: Mapping Autonomous Systems (AS) to the owner organizations is critical to connect AS-level and organization-level research. Unfortunately, constructing an accurate dataset of AS-to-organization mappings is difficult due to a lack of ground truth information. CAIDA AS-to-organization (CA2O), the current state-of-the-art dataset, relies heavily on Whois databases maintained by Regional Internet Registries (RIRs) to infer the AS-to-organization mappings. However, inaccuracies in Whois data can dramatically impact the accuracy of CA2O, particularly on inferences of ASes owned by a same organization (sibling ASes). In this work, we leverage PeeringDB (PDB) as an additional data source to detect the potential errors of sibling relations in CA2O. By conducting a meticulous semi-manual investigation, we discover the sources of inaccuracies in CA2O are two pitfalls of Whois data, and we systematically analyze how the pitfalls jointly influence the CA2O. We also build an improved dataset on sibling relations, which corrects mappings of 12.5% of CA2O organizations with sibling ASes (1,028 CA2O organizations, associated with 3,772 ASNs). To make the process more scalable, we design an automatic approach to reproduce our manually-built dataset with high fidelity. The approach is able to automatically improve inferences of sibling ASes for each new version of CA2O.
A Global Measurement of Routing Loops on the InternetAbdulrahman Alaraj (University of Colorado Boulder), Kevin Bock (University of Maryland), Dave Levin (University of Maryland), Eric Wustrow (University of Colorado Boulder)Abstract: Persistent routing loops on the Internet are a common misconfiguration that can lead to packet loss, reliability issues, and can even exacerbate denial of service attacks. Unfortunately, obtaining a global view of routing loops is difficult. Distributed traceroute datasets from many vantage points can be used to find instances of routing loops, but they are typically sparse in the number of destinations they probe. In this paper, we perform high-TTL traceroutes to the entire IPv4 Internet from a single vantage point in order to enumerate routing loops and validate our results from a different vantage point. Our datasets contain traceroutes to two orders of magnitude more destinations than prior approaches that traceroute one IP per /24. Our results reveal over 24 million IP addresses with persistent routing loops on path, or approximately 0.6% of the IPv4 address space. We analyze the root causes of these loops and uncover new types of them that were unknown before. We also shed new light on their potential impact on the Internet. We find over 320k /24 subnets with at least one routing loop present. In contrast, sending traceroutes only to the .1 address in each /24 (as prior approaches have done) finds only 26.5% of these looping subnets. Our findings complement prior, more distributed approaches by providing a more complete view of routing loops in the Internet. To further assist in future work, we made our data publicly available.
Thursday, 23 March 2023
- 14:00 - 15:00 - Security and privacy - Session Chair: Giovane Moura (SIDN Labs and TU Delft)
Intercept and Inject: DNS Response Manipulation in the WildYevheniya Nosyk (Université Grenoble Alpes), Qasim Lone (RIPE NCC), Yury Zhauniarovich (TU Delft), Carlos H. Gañán (TU Delft), Emile Aben (RIPE NCC), Giovane C. M. Moura (SIDN Labs and TU Delft), Samaneh Tajalizadehkhoob (ICANN), Andrzej Duda (Grenoble INP), Maciej Korczyński (Université Grenoble Alpes)Abstract: DNS is a protocol responsible for translating human-readable domain names into IP addresses. Despite being essential for many Internet services to work properly, it is inherently vulnerable to manipulation. In November 2021, users from Mexico received bogus DNS responses when resolving whatsapp.net. It appeared that a BGP route leak diverged DNS queries to the local instance of the k-root located in China. Those queries, in turn, encountered middleboxes that injected fake DNS responses. In this paper, we analyze that event from the RIPE Atlas point of view and observe that its impact was more significant than initially thought - the Chinese root server instance was reachable from at least 15 countries several months before being reported. We then launch a nine-month longitudinal measurement campaign using RIPE Atlas probes and locate 11 probes outside China reaching the same instance, although this time over IPv6. More broadly, motivated by the November 2021 event, we study the extent of DNS response injection when contacting root servers. While only less than 1% of queries are impacted, they originate from 7% of RIPE Atlas probes in 66 countries. We conclude by discussing several countermeasures that limit the probability of DNS manipulation.
A First Look at Brand Indicators for Message Identification (BIMI)Masanori Yajima (Waseda University), Daiki Chiba (NTT Security ), Yoshiro Yoneya (JPRS), Tatsuya Mori (Waseda University/NICT/RIKEN AIP)Abstract: As promising approaches to thwarting the damage caused by phishing emails, DNS-based email security mechanisms, such as the Sender Policy Framework (SPF), Domain-based Message Authentication, Reporting & Conformance (DMARC) and DNS-based Authentication of Named Entities (DANE), have been proposed and widely adopted. Nevertheless, the number of victims of phishing emails continues to increase, suggesting that there should be a mechanism for supporting end-users in correctly distinguishing such emails from legitimate emails. To address this problem, the standardization of Brand Indicators for Message Identification (BIMI) is underway. BIMI is a mechanism that helps an email recipient visually distinguish between legitimate and phishing emails. With Google officially supporting BIMI in July 2021, the approach shows signs of spreading worldwide. With these backgrounds, we conduct an extensive measurement of the adoption of BIMI and its configuration. The results of our measurement study revealed that, as of November 2022, 3,538 out of the one million most popular domain names have a set BIMI record, whereas only 396 (11%) of the BIMI-enabled domain names had valid logo images and verified mark certificates. The study also revealed the existence of several misconfigurations in such logo images and certificates.
A Second Look at DNS QNAME MinimizationJonathan Magnusson (Karlstad University), Moritz Müller (SIDN Labs), Anna Brunstrom (Karlstad University), Tobias Pulls (Karlstad University)Abstract: The Domain Name System (DNS) is a critical Internet infrastructure that translates human-readable domain names to IP addresses. It was originally designed over 35 years ago and multiple enhancements have since then been made, in particular to make DNS lookups more secure and privacy preserving. Query name minimization (qmin) was initially introduced in 2016 to limit the exposure of queries sent across DNS and thereby enhance privacy. In this paper, we take a look at the adoption of qmin, building upon and extending measurements made by De Vries et al. in 2018. We analyze qmin adoption on the Internet using active measurements both on resolvers used by RIPE Atlas probes and on open resolvers. Aside from adding more vantage points when measuring qmin adoption on open resolvers, we also increase the number of repetitions, which reveals conflicting resolvers -- resolvers that support qmin for some queries but not for others. For the passive measurements at root and Top-Level Domain (TLD) name servers, we extend the analysis over a longer period of time, introduce additional sources, and filter out non-valid queries. Furthermore, our controlled experiments measure performance and result quality of newer versions of the qmin-enabled open source resolvers used in the previous study, with the addition of PowerDNS. Our results, using extended methods from previous work, show that the adoption of qmin has significantly increased since 2018. New controlled experiments also show a trend of higher number of packets used by resolvers and lower error rates in the DNS queries. Since qmin is a balance between performance and privacy, we further discuss the depth limit of minimizing labels and propose the use of a public suffix list for setting this limit.
- 15:00 - 15:40 - Topology II - Session Chair: Oliver Hohlfeld (University of Kassel)
as2org+ : Enriching AS-to-Organization Mappings with PeeringDBAugusto Arturi (Universidad de Buenos Aires), Esteban Carisimo (Northwestern University), Fabian Bustamante (Northwestern University)Abstract: An organization-level topology of the Internet is a valuable resource with uses that range from the study of organizations' footprints and Internet centralization trends, to analysis of the dynamics of the Internet's corporate structures as result of (de)mergers and acquisitions. Current approaches to infer this topology rely exclusively on WHOIS databases and are thus impacted by its limitations, including errors and outdated data. We argue that a collaborative, operator-oriented database such as PeeringDB can bring a complementary perspective from the legally-bounded information available in WHOIS records. We present as2org+, a new framework that leverages self-reported information available on PeeringDB to boost the state-of-the-art WHOIS-based methodologies. We discuss the challenges and opportunities with using PeeringDB records for AS-to-organization mappings, present the design of as2org+ and demonstrate its value identifying companies operating in multiple continents and mergers and acquisitions over a five-year period.
RPKI Time-of-Flight: Tracking Delays in the Management, Control, and Data PlanesRomain Fontugne (IIJ Research Lab), Amreesh Phokeer (Internet Society), Cristel Pelsser (UCLouvain), Kevin Vermeulen (LAAS-CNRS), Randy Bush (Internet Initiative Japan & Arrcus Inc)Abstract: As RPKI is becoming part of ISPs’ daily operations and Route Origin Validation is getting widely deployed, one wonders how long it takes for the effect of RPKI changes to appear in the data plane. Does an operator that adds, fixes, or removes a Route Origin Autho- rization (ROA) have time to brew coffee or rather enjoy a long meal before the Internet routing infrastructure integrates the new information and the operator can assess the changes and resume work? The chain of ROA publication, from creation at Certification Authorities all the way to the routers and the effect on the data plane, involves a large number of players and is not instantaneous and is often dominated by ad hoc administrative decisions. This is the first comprehensive study to measure the entire ecosystem of ROA manipulation by all five Regional Internet Registries (RIRs), propagation on the management plane to Relying Parties (RPs) and to routers; measure the effect on BGP as seen by global control plane monitors; and finally measure the effects on data plane latency and reachability. We found that RIRs usually publish new RPKI information within five minutes, except APNIC which averages ten minutes slower. We observe significant disparities in ISPs reaction time to new RPKI information, ranging from a few minutes to one hour. The delay for ROA deletion is significantly longer than for ROA creation as RPs and BGP strive to maintain reachability. Incidentally we found and reported significant issues in the management plane of two RIRs and a Tier1 network.
- 15:40 - 16:00 - Break
- 16:00 - 17:00 - DNS - Session Chair: Tijay Chung (Virginia Tech)
How Ready Is DNS for an IPv6-Only World? Best Paper AwardFlorian Streibelt (Max Planck Institute for Informatics), Patrick Sattler (Technical University of Munich), Franziska Lichtblau (Max Planck Institute for Informatics), Carlos H. Gañán (Delft University of Technology), Anja Feldmann (Max Planck Institute for Informatics), Oliver Gasser (Max Planck Institute for Informatics), Tobias Fiebig (Max Planck Institute for Informatics)Abstract: DNS is one of the core building blocks of the Internet. In this paper, we investigate DNS resolution in a strict IPv6-only scenario and find that a substantial fraction of zones cannot be resolved. We point out, that the presence of an AAAA resource record for a zone’s nameserver does not necessarily imply that it is resolvable in an IPv6-only environment since the full DNS delegation chain must resolve via IPv6 as well. Hence, in an IPv6-only setting zones may experience an effect similar to what is commonly referred to as lame delegation. Our longitudinal study shows that the continuing centralization of the Internet has a large impact on IPv6 readiness, i.e., a small number of large DNS providers has, and still can, influence IPv6 readiness for a large number of zones. A single operator that enabled IPv6 DNS resolution–by adding IPv6 glue records–was responsible for around 15.5% of all non-resolving zones in our dataset until January 2017. Even today, 10% of DNS operators are responsible for more than 97.5% of all zones that do not resolve using IPv6.
TTL Violation of DNS Resolvers in the WildProtick Bhowmick (Virginia Tech), Mohammad Ishtiaq Ashiq Khan (Virginia Tech), Casey Deccio (Brigham Young University), Taejoong Chung (Virginia Tech)Abstract: The Domain Name System (DNS) provides a scalable name resolution service. It uses extensive caching to improve its resiliency and performance; every DNS record contains a time-to-live (TTL) value, which specifies how long a DNS record can be cached before being dis- carded. Since the TTL can play an important role in both DNS security (e.g., determining a DNSSEC-signed response’s caching period) and per- formance (e.g., responsiveness of CDN-controlled domains), it is crucial to measure and understand how resolvers violate TTL. Unfortunately, measuring how DNS resolvers manage TTL at scale remains difficult since it usually requires having the cooperation of many nodes spread across the globe. In this paper, we present a methodology that mea- sures TTL-violating resolvers at scale using an HTTP/S proxy service called BrightData, which allows us to cover more than 27 K resolvers in 9.5 K ASes. Out of the 8,524 resolvers that we could measure through at least five different vantage points, we find that 8.74% of them extend the TTL arbitrarily, which potentially can degrade the performance of at least 38% of the popular websites that use CDNs. We also report that 43.1% of DNSSEC-validating resolvers incorrectly serve DNSSEC-signed responses from the cache even after their RRSIGs are expired.
Operational Domain Name Classification: From Automatic Ground Truth Generation to Adaptation to Missing ValuesJan Bayer (Université Grenoble Alpes CNRS Grenoble INP LIG), Ben Chukwuemeka Benjamin (Université Grenoble Alpes CNRS Grenoble INP LIG), Sourena Maroofi (KOR Labs Cybersecurity), Thymen Wabeke (SIDN Labs), Cristian Hesselman (SIDN Labs University of Twente), Andrzej Duda (Université Grenoble Alpes CNRS Grenoble INP LIG), Maciej Korczyński (Université Grenoble Alpes CNRS Grenoble INP LIG)Abstract: With more than 350 million active domain names and at least 200,000 newly registered domains per day, it is technically and economically challenging for Internet intermediaries involved in domain registration and hosting to monitor them and accurately assess whether they are benign, likely registered with malicious intent, or have been compromised. This observation motivates the design and deployment of automated approaches to support investigators in preventing or effectively mitigating security threats. However, building a domain name classification system suitable for deployment in an operational environment requires meticulous design: from feature engineering and acquiring the underlying data to handling missing values resulting from, for example, data collection errors. The design flaws in some of the existing systems make them unsuitable for such usage despite their high theoretical accuracy. Even worse, they may lead to erroneous decisions, for example, by registrars, such as suspending a benign domain name that has been compromised at the website level, causing collateral damage to the legitimate registrant and website visitors. In this paper, we propose novel approaches to designing domain name classifiers that overcome the shortcomings of some existing systems. We validate these approaches with a prototype based on the COMAR (COmpromised versus MAliciously Registered domains) system focusing on its careful design, automated and reliable ground truth generation, feature selection, and the analysis of the extent of missing values. First, our classifier takes advantage of automatically generated ground truth based on publicly available domain name registration data. We then generate a large number of machine-learning models, each dedicated to handling a set of missing features: if we need to classify a domain name with a given set of missing values, we use the model without the missing feature set, thus allowing classification based on all other features. We estimate the importance of features using scatter plots and analyze the extent of missing values due to measurement errors. Finally, we apply the COMAR classifier to unlabeled phishing URLs and find, among other things, that 73% of corresponding domain names are maliciously registered. In comparison, only 27% are benign domains hosting malicious websites. The proposed system has been deployed at two ccTLD registry operators to support their anti-fraud practices.
- 17:00 - 18:30 - Web + Closing remarks - Session Chair: Shuai Hao (Old Dominion)
A First Look at Third-Party Service Dependencies of Web Services in AfricaAqsa Kashaf (Carnegie Mellon University), Jiachen Dou (Carnegie Mellon University), Margarita Belova (Princeton University), Maria Apostolaki (Princeton University), Yuvraj Agarwal (Carnegie Mellon University), Vyas Sekar (Carnegie Mellon University)Abstract: Third-party dependencies expose websites to shared risks and cascading failures. The dependencies impact African websites as well e.g., Afrihost outage in 2022. While the prevalence of third-party dependencies have been studied for the globally popular websites, Africa is largely underrepresented in those studies. Hence, in this work, we analyze the prevalence of third-party infrastructure dependencies in Africa-centric websites from 4 African vantage points. We consider websites that fall in one of the four categories: Africa-visited (popular in Africa) Africa-hosted (sites hosted in Africa), and Africa-dominant (sites targeted towards users in Africa), and Africa-operated (websites operated in Africa). Our key findings are: 1) 93% of the Africa-visited websites critically depend on a third-party DNS, CDN or CA. In perspective, US-visited websites are up to 25% less critically dependent. 2) 97% of Africa- dominant, 96% of Africa-hosted , and 95% of Africa-operated websites are critically dependent on a third-party DNS, CDN, or CA provider. 3) The use of third-party services is concentrated where only 3 providers can affect 60% of the Africa-centric websites. Our findings have key implications for the present usage and recommendations for the future evolution of Internet in Africa.
Exploring the Cookieverse: A Multi-Perspective Analysis of Web CookiesAli Rasaii (Max Planck Institute for Informatics), Shivani Singh (New York University), Devashish Gosain (KU Leuven Max Planck Institute for Informatics), Oliver Gasser (Max Planck Institute for Informatics)Abstract: Web cookies have been the subject of many research studies over the last few years. However, most existing research does not consider multiple crucial perspectives that can influence the cookie landscape, such as the client's location, the impact of cookie banner interaction, and from which operating system a website is being visited. In this paper, we conduct a comprehensive measurement study to analyze the cookie landscape for Tranco top-10k websites from different geographic locations and analyze multiple different perspectives. One important factor which influences cookies is the use of cookie banners. We develop a tool, BannerClick, to automatically detect, accept, and reject cookie banners with an accuracy of 99%, 97%, and 87%, respectively. We find banners to be 56% more prevalent when visiting websites from within the EU region. Moreover, we analyze the effect of banner interaction on different types of cookies (i.e., first-party, third-party, and tracking). For instance, we observe that websites send, on average, 5.5x more third-party cookies after clicking ``accept'', underlining that it is critical to interact with banners when performing Web measurements. Additionally, we analyze statistical consistency, evaluate the widespread deployment of consent management platforms, compare landing to inner pages, and assess the impact of visiting a website on a desktop compared to a mobile phone. Our study highlights that all of these factors substantially impact the cookie landscape, and thus a multi-perspective approach should be taken when performing Web measurement studies.
Quantifying User Password Exposure to Third-Party CDNsRui Xin (Duke University), Shihan Lin (Duke University), Xiaowei Yang (Duke University)Abstract: Web services commonly employ Content Distribution Networks (CDNs) for performance and security. As web traffic is becoming 100% HTTPS, more and more websites allow CDNs to terminate their HTTPS connections. This practice may expose a website's user sensitive information such as a user's login password to a third-party CDN. In this paper, we measure and quantify the extent of user password exposure to third-party CDNs. We find that among Alexa top 50K websites, at least 12,451 of them use CDNs and contain user login entrances. Among those websites, 33% of them expose users' passwords to the CDNs, and a popular CDN may observe passwords from more than 40% of its customers. This result suggests that if a CDN infrastructure has a vulnerability or an insider attack, many users' accounts will be at risk. If we assume the attacker is a passive eavesdropper, a website can avoid this vulnerability by encrypting users' passwords in HTTPS connections. Our measurement shows that less than 17% of the websites adopt this countermeasure.