0% found this document useful (0 votes)
29 views19 pages

Usenixsecurity24 Ramesh

The paper presents CalcuLatency, a solution that utilizes cross-layer network latency measurements to detect proxy-enabled abuse in privacy-focused ad delivery systems. By analyzing round-trip time differences between application-layer and network-layer connections, the system can identify users employing VPNs or proxies without compromising their privacy. The evaluation shows that CalcuLatency effectively distinguishes between regular users and those attempting to game the reward system, achieving low false positive and negative rates.

Uploaded by

SHIVAAY SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views19 pages

Usenixsecurity24 Ramesh

The paper presents CalcuLatency, a solution that utilizes cross-layer network latency measurements to detect proxy-enabled abuse in privacy-focused ad delivery systems. By analyzing round-trip time differences between application-layer and network-layer connections, the system can identify users employing VPNs or proxies without compromising their privacy. The evaluation shows that CalcuLatency effectively distinguishes between regular users and those attempting to game the reward system, achieving low false positive and negative rates.

Uploaded by

SHIVAAY SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CalcuLatency: Leveraging Cross-Layer Network

Latency Measurements to Detect Proxy-Enabled Abuse


Reethika Ramesh, University of Michigan; Philipp Winter, Independent;
Sam Korman and Roya Ensafi, University of Michigan
[Link]

This paper is included in the Proceedings of the


33rd USENIX Security Symposium.
August 14–16, 2024 • Philadelphia, PA, USA
978-1-939133-44-1

Open access to the Proceedings of the


33rd USENIX Security Symposium
is sponsored by USENIX.
CalcuLatency: Leveraging Cross-Layer Network Latency
Measurements to Detect Proxy-Enabled Abuse

Reethika Ramesh Philipp Winter Sam Korman Roya Ensafi


University of Michigan Independent University of Michigan University of Michigan

Abstract plethora of companies offer ad and tracker-blocking for users


regular browsing experience and provide alternative search
Efforts from emerging technology companies aim to democ-
engines geared toward privacy. Some also create solutions
ratize the ad delivery ecosystem and build systems that are
that promote user agency by allowing users to opt-in to view-
privacy-centric and even share ad revenue benefits with their
ing and earning rewards from privacy-preserving ads. On a
users. Other providers offer remuneration for users on their
technical level, such “ad reward networks” make use of brows-
platform for interacting with and making use of services. But
ing activity to generate ads but all the computational process
these efforts may suffer from coordinated abuse efforts aim-
occur locally on the user’s device, without any data being ex-
ing to defraud them. Attackers can use VPNs and proxies to
filtrated to the companies. They then use privacy-preserving
fabricate their geolocation and earn disproportionate rewards.
protocols to confirm ad event activity and reward users in cryp-
Balancing proxy-enabled abuse-prevention techniques with
tocurrency based on ad-providers and region-specific pricing.
a privacy-focused business model is a hard challenge. Can
There are several companies that have similar rewards and
service providers use minimal connection features to infer
provide remuneration to users on their platform for making
proxy use without jeopardizing user privacy?
use of their services [4, 5, 6, 7].
In this paper, we build and evaluate a solution, CalcuLa-
One critical threat to these novel types of reward networks
tency, that incorporates various network latency measurement
is attackers who actively game the reward system by lever-
techniques and leverage the application-layer and network-
aging VPNs or proxies. These attackers fabricate their ge-
layer differences in roundtrip-times when a user connects to
olocation to gain access to more high-reward ads and falsify
the service using a proxy. We evaluate our four measurement
interaction activity to earn disproportionate rewards. Ensur-
techniques individually, and as an integrated system using a
ing the long-term viability of these privacy-focused business
two-pronged evaluation. CalcuLatency is an easy-to-deploy,
models necessitates cost-effective and easily deployable tech-
open-source solution that can serve as an inexpensive first-
niques to differentiate regular users from those who use VPNs
step to label proxies.
or proxies, for further monitoring for suspicious activity.
Balancing abuse-prevention techniques with rigorous
1 Introduction privacy-requirements is a hard challenge, especially if the
service provider seeks to uphold user privacy. The state of the
Online advertising revenue in the U.S. reached over $209.7 art in generic VPN and proxy detection uses a series of heuris-
billion in 2022, almost three times as much as the revenue tics developed by surveilling client traffic at large. Deployed
from U.S. TV advertisements [1, 2]. Companies and service by large platforms, these services use user data to build IP
providers increasingly thrive on collecting, processing, and reputation metrics, generate user-specific metrics, and even
sharing large amounts of data on users, and monetize this data use black box services that claim to detect “suspicious” users
by enabling targeted ads and tracking. Reports have criticised and IP addresses. But to get these metrics, the service provider
leading companies such as Google and Facebook, calling will need to compromise on protecting the privacy of their
them surveillance giants, and asserting that their business users. In short, we seek to answer: Can service providers
models are a threat to privacy and even to human rights [3]. build a system using minimum connection features, such as
With the growing awareness and demand for privacy among latency, to infer VPN or proxy use, without the need for data
the general public, emerging tech companies are aiming to de- collection or jeopardizing user privacy?
mocratize the ad delivery ecosystem by building services that In this paper, we build and evaluate our solution to this ques-
are privacy-focused and even share ad revenue with users. A tion, CalcuLatency, that leverages cross-layer network latency

USENIX Association 33rd USENIX Security Symposium 2263


measurement techniques to differentiate users that are using a our system has a false negative rate of 2.9% (28/964) and a
remote, long-distance VPN or proxy from those who are not. low false positive rate of 0.95% (2/210).
We leverage the fact that the application-layer latency when Adopting a more conservative analysis strategy, if we only
a user connects through a proxy is end-to-end (browser to consider measurements where ICMP ping is successful and
server), whereas, any measurement on the network-layer only 0trace reaches the client or its network (i.e. 96% of all mea-
reaches the proxy, and the round-trip time (RTT) difference surements), we find that the RTT difference is below 50ms for
between the two can be a reliable indicator. To this end, we the 210 direct measurements, and none were wrongly flagged
combine existing techniques such as WebSocket RTT, TCP using our system. These measurements include 137 unique
handshake RTT, and ICMP ping, as well as implementing and user IPs from 84 different Autonomous Systems (ASes), from
evaluating a modified traceroute method (0trace), which has six different continents and over 34 different countries. We
not been done before. 0trace conducts hop enumeration from also reduce our false negatives to 2.77% (26/937).
within an existing, established TCP connection such as a web- While CalcuLatency cannot detect all proxy use especially
socket session. Using CalcuLatency, we can even differentiate if the user and VPN are close to each other, CalcuLatency is
between network-layer and application-layer proxies. an easy-to-deploy, open-source solution that can serve as an
To evaluate CalcuLatency, we evaluate each of the measure- inexpensive defense against proxy enabled abuse for server-
ment techniques on their own and also evaluate the system side operators. We incorporate existing methods to calculate
as a whole by using a comprehensive (geo)diverse set of network latency and implement a modified traceroute method
clients. The former is necessary to characterize and quantify, to overcome the challenges of our probes getting blocked due
for each technique, the effects of network jitter and reliabil- to stateful firewalls or NATs. Though this traceroute concept
ity of the methods. For the latter, we implement the system was first discussed over two decades ago, we are the first to
as a web service and conduct a two-pronged evaluation of implement, deploy in a real-world system, and evaluate its
CalcuLatency as a whole: (i) we perform an in-depth testbed performance.
evaluation where we maximize the number of VPN products, The following are our contributions in this paper: we pro-
servers, protocols and browsers tested, from four different pose a system, CalcuLatency, that combines existing building
user geolocations; (ii) we expand this to include a real-world, blocks and implements measurement techniques to help de-
crowdsourced evaluation to collect and analyze more diverse tect VPNs and proxies. We extensively evaluate our technique
user geolocations. To this end, we rally user participation on and elaborate on its calibration. We perform several controlled
Twitter and personal contacts, using the authors’ accounts. and crowdsourced experiments with our system deployed in
We have participants from 37 different countries located in real-networks and collect data to quantify thresholds and char-
all (six) continents, 144 autonomous systems, and collect a acterize the reliability of our techniques.
large crowdsourced dataset. Roadmap: In Section 2, we discuss the background of our
techniques and the architecture of proxies. In Section 3, we in-
We find empirically that a viable threshold to consider a troduce our system architecture, followed by our techniques to
particular client as a remote VPN or proxy connection is 50 measure roundtrip times, the implementation of CalcuLatency
milliseconds. In 98% of all direct measurements from both (§ 3.5), and our ethics discussion (§ 6). Next, we evaluate each
sets of evaluation, we find that the RTT difference is below of our building-blocks (§ 4), followed by our two-pronged
this threshold of 50ms. Conversely, 89.1% of all VPN mea- evaluation of the system (§ 5.1, § 5.2). In § 7, we present
surements in the testbed evaluation and 63.9% of the crowd- our limitation, followed by related work (§ 8) and finally,
sourced evaluation have an RTT difference of above 50ms. we discuss CalcuLatency, its implications, and conclude in
Through location analysis, we were able to attribute a major- Section 9.
ity of the remaining 10.9% and 36.1% VPN measurements to
the fact that the VPN server and the user were located very
close to each other. 2 Background
Investigating the VPN measurements whose RTT differ- Below, we discuss how network proxies differ in their archi-
ence was below 50ms, we find that in a majority of such cases tecture and we explain each of our measurement methods.
(66.2%) the VPN server is located close to the user (within
650mi). This can be equated to straight line distance between
Washington DC and Boston, MA (635mi) or the straight line 2.1 Architecture of Network Proxies
distance between Mountain View, CA and San Diego, CA Figure 1 illustrates how a proxy’s type affects the underlying
(685mi). Surprisingly, this indicates that we can reliably de- protocol stack. We divide the protocol stack into three lay-
tect proxy use when a user and the VPN or proxy are just 650 ers as per the OSI model: the application layer (HTTP), the
miles apart or more. transport layer (TCP), and the network layer (IP). Application-
Overall, we consider 50ms to be the RTT difference thresh- layer proxies terminate the application protocol, e.g., to in-
old for a connection to be labelled as a remote VPN or proxy, spect content for malware (Figure 1a). This includes Web

2264 33rd USENIX Security Symposium USENIX Association


Client Proxy Server Client Proxy Server Client Proxy Server
IP IP IP IP IP IP

TCP TCP TCP TCP TCP

HTTP HTTP HTTP HTTP


(a) Application-layer proxies, e.g., proxies (b) Transport- and Sessions-layer proxies, (c) Network-layer proxies, e.g., VPNs like
that work via HTTP CONNECT. e.g., SOCKS, Tor, and SSH. OpenVPN and WireGuard.

Figure 1: Architecture of Network Proxies—We distinguish between three types of proxies that differ based on the layer
at which they terminate client connections. Clients connecting to application-layer (1a) and transport-layer proxies (1b) first
establish a TCP connection with the proxy and initiate the proxying using protocol specific messages. The proxies create new
TCP connections to the web server, and then establish the tunnel. Whereas, network layer (1c) proxies authenticate the client and
establish a tunnel, after which all packets including TCP SYN packets are encapsulated and proxied through the server.

servers that support the HTTP CONNECT method. Transport- 2.3 The WebSocket API
layer proxies like Tor and SSH (both build on top of the
SOCKS protocol) pass through the application protocol but The WebSocket protocol is an application layer protocol on
terminate the client’s TCP connection (cf. Figure 1b). Clients top of TCP, which offers Web applications a bidirectional
use SOCKS’s signalling mechanism to tell the proxy what socket for communication. While distinct from HTTP, the
destination to connect to. Finally, network-layer proxies like WebSocket protocol is compatible with HTTP and uses the
OpenVPN or WireGuard perform NAT but pass through the HTTP Upgrade header in its handshake. This allows Web
client’s TCP connection ( Figure 1c). servers to handle both HTTP and WebSocket connections on
the same port. Once a WebSocket connection is established,
client and server exchange binary data. Web applications can
2.2 The 0trace Technique use WebSockets by taking advantage of the JavaScript Web-
Socket API that’s supported by all modern browsers [9]. Later
In a traceroute, a client sends multiple packets to a server, with in this work, we use WebSockets to determine the application-
each packet containing an incrementing time-to-live value layer round trip time between a client and server.
(TTL) in the IP header. These packets are stateless “stray”
packets, meaning that they do not belong to an open network
connection. Stateful firewalls may reject such packets, termi- 3 Method
nating the traceroute.
In 2007, Zalewski invented a traceroute technique that can Our aim is to measure round-trip times on the application,
get past stateful firewalls [8]. This technique—called 0trace— transport, and network-layers, and use a combination of these
creates trace packets that match the five-tuple of an already- different measurements to reason about whether a particular
established TCP connection. Unable to tell apart 0trace pack- connection is coming through a proxy server. We combine
ets from packets belonging to this TCP connection, stateful four cross-layer latency measurement techniques into a single
firewalls will let 0trace’s trace packets pass. 0trace achieves software service that we call CalcuLatency. Our system inte-
its goal by manipulating network packet headers and does so grates well into existing service provider infrastructure, and
by crafting its own network packets using Linux’s raw socket we create and publish a pipeline to analyze the collected data.
API. However, this techniques comes at the cost of potentially
corrupting the TCP connection at the end: the receiving end-
3.1 System Architecture and Assumptions
point may terminate the TCP connection upon receiving an
unexpected trace packet. A typical client-proxy-service scenario consists of three enti-
While 0trace’s purpose is hop enumeration in the pres- ties: (i) a service provider that makes available one or more
ence of firewalls, we use it for RTT measurements. The go-to HTTP endpoints to its clients; (ii) clients that use the service
technique for RTT measurements are ICMP echo requests provider’s services; and (iii) proxy servers that some clients
(colloquially called “pings”) but we found that many residen- use to disguise their topological (i.e., IP address) and physical
tial ISPs block pings (§ 3.4.1). While 0trace does not always (i.e., their home country) location. While most clients are
work in such settings, it does allow for more accurate RTTs honest, there are some malicious ones that seek to defraud the
in the presence of firewalling as we show in Section 4.3. service provider while using network proxies to disguise their

USENIX Association 33rd USENIX Security Symposium 2265


VPN TCP Handshake RTT (Transport Layer) 1
(or)
Proxy Initial Connection through WebSocket 2
Websocket-based RTT (Application Layer) 3
Modified Traceroute 0trace (Network Layer)
ICMP Ping (Network Layer) 4

TCP Handshake RTT (Transport Layer) 1


2
Initial Connection through WebSocket
Websocket-based RTT (Application Layer) 3
Modified Traceroute 0trace (Network Layer) 4
ICMP Ping (Network Layer) Web Server

Figure 2: Measurement Setup—User connects to the Web server either via a VPN or Proxy (top) or directly (bottom). We
illustrate the measurements done in both cases. The network layer measurements only reach the proxy server in case the user is
using a proxy, but reaches all the way to the user’s public IP if they are connecting directly.

location. For instance, an ad reward network or survey dis- was answered by the proxy in Canada. Confronted with the
tribution service provides different rewards for geolocations, RTT difference of 250 − 35 = 215 ms, the service provider
and users may seek to earn disproportionate rewards using concludes that the client is using a proxy.
proxies. Service providers need an inexpensive and practical
solution that can tell apart clients that use remote proxies
to mask their geolocation from those that do not, without Assumptions: First, CalcuLatency requires an HTTP con-
user surveillance using blackbox detection techniques. We nection between the client and the server. We chose HTTP
introduce CalcuLatency to fill this gap. because of its ubiquity but other application-layer protocols
Broadly, our technique allows for the detection of the three work equally well. Second, the client’s proxy must not be ge-
proxy types illustrated in Figure 1. CalcuLatency’s key in- ographically close to the user. This assumption is reasonable
sight is that the use of a proxy affects the layers in the OSI in the case of service providers that seek to identify users who
model differently. If we find a non-trivial difference between spoof their country of origin. Third, we assume that clients do
the application-layer RTT (∆AL ) and the transport-layer RTT not control the network behavior of the proxy and therefore
(∆TL ) and/or the network-layer RTT (∆NL ), we can conclude cannot have it delay selected network packets. Instead, we as-
that the client is using a proxy server. sume that clients either use open proxies or rent them, e.g., as
part of a VPN subscription. Even if the client does control the
proxy, it is not guaranteed to evade CalcuLatency’s detection
Consider a concrete example: A client in India is using as our 0trace component can estimate the RTT to the client
a VPN server in Canada to make an HTTP connection to using adjacent hops on the path that are outside the client’s
our CalcuLatency server in the U.S. Upon accepting the con- control.
nection, our server now determines three types of round-trip While not universally applicable, we believe that these as-
times to the client, as shown in Figure 2. First, it upgrades the sumptions are both realistic and commonplace. What’s more,
HTTP connection to WebSocket and sends several pings (us- CalcuLatency does not suffer from the shortcomings of exist-
ing JavaScript) to determine the application-layer RTT. Next, ing, established proxy detection techniques like IP reputation
our server inspects the (previously-recorded) TCP handshake blocklists that just use IP-to-Geolocation databases [10, 11,
to infer the transport-layer RTT. Finally, our server determines 12, 13].
the network-layer RTT by sending ICMP echo requests and
by running a 0trace measurement, which piggybacks onto the
already-established HTTP connection. 3.2 Determining the Application-layer RTT
Having determined all RTTs, our server notes that the Web-
Socket RTT is 250 ms—time taken for WebSocket pings to How can the service provider determine the application-layer
travel from the U.S. to Canada, to India, and back. The ICMP RTT to the client? We chose WebSocket for this task: As of
echo responses exhibit an RTT of only 35 ms—the time it June 2023, WebSocket is supported by all major browsers [14],
takes to go from the U.S. to Canada, and back again. This dif- it is compatible with HTTP, and it is purpose-built for real-
ference manifests because the WebSocket ping was answered time communication, making it a natural choice for determin-
by the client’s browser in India while the network-layer ping ing round-trip times. As mentioned above, CalcuLatency is

2266 33rd USENIX Security Symposium USENIX Association


Client SOCKSv5 Proxy Service provider 3.3 Determining the Transport-layer RTT
SYNC
SYN/ACKP We take advantage of the fact that in our setting, the client
ACKC establishes a TCP connection with our server, meaning that we
SYNP are in control of the TCP three-way handshake. We estimate
SYN/ACKS the RTT between client and server by calculating the time
∆TCP
ACKP
difference between the server responding with a SYN/ACK
WebSocket measurement segment and the client acknowledging receipt with an ACK
nonce #1 segment, as discussed by Ding and Rabinovich [16]. We refer
to this RTT as ∆TL . Again, clients have no incentive to delay
nonce #n their ACK segment and are unable to send an “anticipatory”
∆WS
segment because of the difficulty of predicting TCP sequence
ICMP measurement numbers. We find that ∆TL is reliable and straightforward to
ICMP request determine but we have only a single sample per connection,
ICMP response
∆ICMP which makes this technique susceptible to transient conditions
like network congestion. Indeed, Høiland-Jørgensen et al. [17]
0trace measurement
showed that “20% of the clients experience increased delays
TTL=1
of more than about 80 ms, at least 5% of the time.”
TTL=2 We considered taking advantage of the TCP timestamp
TTL=n option to measure RTT. We chose not to because TCP times-
∆0T
tamps are not universally used [18, § 6] and often exhibit poor
granularity—Veal et al. showed in their PAM’05 paper that
Figure 3: Measurement Flow—In this example, the client
the majority of 500 popular web servers used a granularity of
uses a SOCKSv5 proxy. The service provider determines four
either 10 ms or 100 ms [19, § 3].
round-trip times (on three layers) to infer what kind of proxy
the client uses.
3.4 Determining the Network-layer RTT
Finally, we focus on the lowest layer for which we determine
not dependent on WebSocket and could employ alternatives,
the round-trip time: the network layer. Determining the RTT
like WebRTC, XMPP, or HTTP page load times [15, § II.B].
to an IP address is challenging because it is difficult to reli-
Specifically, a client establishes a WebSocket connection ably get a remote network stack to respond to unsolicited IP
with the service provider. The server then sends n WebSocket- packets. Prior work reported that it is increasingly rare for
based pings to the client as illustrated in Figure 3. Section 4.1 hosts to respond to ICMP echo request, or send a TCP RST
suggests that n ≥ 10 is useful to make the measurement more segment in response to unsolicited SYN segments [20, § 4].
robust against transient networking issues or asymmetric rout- To maximize success, we draw on two separate techniques
ing, both of which interfere with our measurement. In our to estimate ∆NL , our network-layer RTT: ICMP (see § 3.4.1)
CalcuLatency system, we send 100 WebSocket-based pings and 0trace (see § 3.4.2).
to the client. Upon receiving a WebSocket ping, the client’s
browser echoes back each ping to the server. Upon receiving
3.4.1 ICMP RTT
the echo response, the server determines the WebSocket RTT
on the Application Layer (∆AL ) between itself and the client CalcuLatency sends five ICMP echo requests to the client. We
as follows: ∆AL = min{∆AL1 , ..., ∆ALn }. refer to this measurement as ∆ICMP . Prior work had found that
Importantly, our technique must work in an adversarial envi- some proxies don’t answer to ICMP: 90% of VPN servers
ronment because malicious clients seek to disguise their use of tested by Weinberg et al. [10] reportedly ignored ICMP re-
a proxy. These clients control the server-provided JavaScript, quests. However, they only tested servers belonging to seven
and can therefore choose to either delay the echo or send an undisclosed VPN providers. Due to ICMP’s potential use-
“anticipatory” echo before having received the corresponding fulness to CalcuLatency, we revisit this topic by asking the
request. The former is against a malicious client’s interest be- following research question:
cause it would increase the odds of CalcuLatency concluding Do VPN servers respond to ICMP pings? To answer this
that the client is using a proxy. Malicious clients are therefore question, we sent ICMP requests to IP addresses belonging to
incentivized to respond as quickly as possible. To thwart the 80 VPN providers that past work studied [21, 11]. These VPN
“anticipatory response” attack, we include a unique nonce in providers host servers that run various VPN protocols includ-
every echo request. The client has to embed the nonce in its ing OpenVPN, WireGuard, and other proprietary protocols.
echo response, preventing it from sending responses before We augment our list of IP addresses with addresses present in
having received the request. the same /24 network as the VPN IP addresses, resulting in a

USENIX Association 33rd USENIX Security Symposium 2267


list of 492 unique IP addresses. We chose to ping adjacent ad- 1.00
dresses in CalcuLatency because a response from an adjacent
address is likely to have a near-identical RTT to that of the 0.75
actual proxy, which may not respond to ICMP requests. Our Platform

ECDF
ICMP requests to these addresses resulted in 369 addresses 0.50 Internet
(75%) that responded—a higher percentage than what past Loopback
work found. 0.25
VPN servers appear likely to respond to ICMP requests
0.00
but what about a random sample of the IPv4 population? To
0.01 0.10 1.00 10.00 100.00
answer this question, we ran a ZMap scan [22] targeting a Latency in ms (log)
randomly-selected set of 1 million IP addresses. This resulted
in 10.52% of IP addresses that responded to our ICMP re-
Figure 4: Latency distribution of TCP handshake
quests. This is in line with prior work by Bano et al., which
measurements over two networks.
found that approximately 10% of IP addresses respond to
ICMP pings [20, § 4.1]—making the odds of a response low.
We conclude that VPN servers are significantly more likely Being equipped with four techniques that measure the RTT
to respond to ICMP requests than randomly-selected IP ad- across three OSI layers, we now devise a decision tree that
dresses, which includes residential clients that don’t use a helps us collapse all our measurements into a single verdict:
proxy. We therefore decide to augment our ICMP measure- does the client use a proxy?
ment with 0trace, which we discuss in the next section.

3.5 Implementation and Deployment


3.4.2 0trace RTT
CalcuLatency combines our application, transport, and
We use Zalewski’s 0trace technique [8] for the purpose of network-layer measurements into a single service. We im-
determining the round-trip time between the server (where plemented CalcuLatency in both Go and JavaScript (Web-
the 0trace measurement is initiated) and a connecting client. Socket pings). The Go service consists of (i) a Web server
Again, we take advantage of the fact that our setting has the with an endpoint that speaks WebSocket, (ii) a Go package
client establish a WebSocket connection to the server, mean- that records TCP handshakes to extract round trip times, and
ing that we have an already-established TCP connection that (iii) a package that runs a 0trace measurement to the client.
0trace can use for its TTL and RTT measurement. While developing our 0trace Go package, we take advan-
We developed a Go package that implements 0trace. This tage of the ICMP error requirements stated in RFC 1812 and
package uses Linux’s raw socket API to send manually- RFC 792, which state that the returned ICMP error message
crafted TCP segments with incrementing IP time-to-live will contain (parts of) the original datagram’s data [25, 26].
(TTL) values. We carefully craft 0trace packets to match We find that the ICMP error message reliably contains the IP
the TCP five-tuple of the WebSocket connection,1 so that header of the original datagram in the ICMP datagram. Hence,
firewalls between client and server will not interfere. Our we design our server to keep track of the IP ID in for each
0trace implementation keeps incrementing the TTL until ei- TTL-limited probe it sends out, and it uses the IP ID obtained
ther (i) the responding IP address is identical to the client’s from the IP header of the original datagram from the received
IP address; or (ii) until TTL 32 is reached. For each TTL, we ICMP error packet to correspond the hop IP to the particular
send three redundant probes to account for potential packet TTL value. Using the packet sent time and packet received
loss [23]. If we don’t receive a response to any of our three time, the server calculates the RTT for each hop that responds
probe packets within five seconds, we mark the given TTL with the ICMP error. It repeats the process until (if) it reaches
as unresponsive. The RTT of our 0trace measurement (∆0T ) the VPN IP, or until the maximum TTL value. All our code is
is the round-trip time of the probe packet that made it the available online under a free software license but we omit the
farthest to the client, i.e., the probe packet with the largest URL to preserve our anonymity.
TTL. In real-world systems, CalcuLatency can be deployed as
We specifically use 0trace because it can get past (CG)NAT part of a CAPTCHA. The CAPTCHA can be non-interactive
and stateful firewalls by operating on existing TCP connec- and consist of HTML and JavaScript, which initiates a Web-
tions. Furthermore, even if 0trace measurements do terminate Socket connection to the CalcuLatency server. In the first
at the NAT-box due to network level rules, we argue that the phase, CalcuLatency sends hundred WebSocket pings to the
distance between a NAT and the user’s machine is less than client to estimate the application-layer RTT. After Calcu-
1,000 km as most are typically deployed by their ISP [24]. Latency determines the application-layer latency, we calcu-
1 The five-tuple consists of IP source and destination addresses; TCP late the RTT of the TCP handshake, send ICMP pings, and
source and destination ports; and the IP protocol. use 0trace to determine the network-layer RTT, as discussed

2268 33rd USENIX Security Symposium USENIX Association


above. Clients that CalcuLatency deems to be using a proxy The TCP RTT is readily available and straightforward to
can be flagged for manual inspection. calculate, but we only get a single measurement per TCP
connection. One could work around this limitation by having
the client establish multiple TCP connections to the server.
4 Building Block Evaluation
Before evaluating CalcuLatency in its entirety, we study its 4.3 Reliability of 0trace Pings: Considering the
building blocks in isolation.
Variance of Latency Across the Internet
4.1 Reliability of WebSocket Pings To evaluate the reliability of 0trace, we conduct a large scale
measurement with endpoints having a variety of configura-
Unlike our network-layer RTT methods, which are typically tions to understand potential effects on the results. For this
handled by the kernel’s network stack, our application-layer experiment, we take advantage of the RIPE Atlas network.
method—WebSocket—traverses not only the kernel’s net- First, we set up an HTTPS server that runs our 0trace code
work stack but also the client’s browser, and is therefore sub- whenever a client connects to the server. Next, we instruct
ject to more jitter. To characterize and approximate this jitter, the RIPE Atlas probe to run an ICMP and a TCP traceroute
we built an HTTPS-based Web service that sends 10,000 se- using both the IPv4 and v6 addresses towards the server, and
quential WebSocket echo requests to the client and measured an “SSL” measurement to fetch our server’s TLS certificate,
the round-trip time ∆WS1 , . . . , ∆WS10,000 for each request. We which trigger our 0trace measurement toward the client’s IPv4
made these measurements over the loopback interface to elim- address. For each RIPE Atlas probe, we measure three differ-
inate networking delays and isolate processing delays. ent RTT values (0trace and two traceroute RTTs). We repeat
Figure 11 in the Appendix illustrates the results for two each of these measurements three times to account for any
consumer laptops (ThinkPad X1, MacBook Pro) running two transient errors. We then determine the absolute difference
browsers. The median RTT for all distributions is less than between the 0trace RTT and the other traceroute RTTs, which
1.4 ms, and 99% of RTT measurements completed in less than serve as ground truth in each case. Appendix B contains a
2.4 ms. All distributions exhibit a long tail, presumably due breakdown of the available RIPE probes and our selection.
to transient load spikes. CalcuLatency must account for this Our initial experiments to explore whether running latency
by running multiple measurements, spaced out over time. measurements at different times of the day influence the mea-
sured RTT difference (presented in Appendix C and Fig-
4.2 Reliability of TCP Handshake RTT ure 13) revealed no significant differences even when the
RTTs were measured every 3 hours. To maximize the utility
We conduct an experiment to understand the reliability of the of our measurements considering concurrency limits and re-
TCP handshake in measuring RTT. We used Go to build an source constraints, we trade-off the small benefit in reliability
HTTP server that served a static index page. In parallel to of running multiple experiments per day, for the diversity of a
serving this index page, the server uses libpcap to record the larger number of RIPE probes over a long period of time.
TCP handshake; in particular the time delta between the server For our measurements, we selected probes that are “con-
responding to the client’s SYN with its SYN/ACK segment nected” (not abandoned or disconnected), with a public IPv4
and the subsequent receipt of the client’s ACK. We built a and IPv6 address, valid country code, with all the desirable
shell script that establishes 86,400 TCP connections: one tags. Note that while we request all eligible probes to par-
connection per second for 24 hours. We ran this experiment ticipate in our measurements, not all of them do [27]. We
in two network settings: in the “Loopback” setting, client had 2,350 RIPE Atlas probe IP-probe ID pairs that fully
and server run on the same machine, communicating over completed the SSL, ICMP and TCP traceroute measurements,
the loopback interface. By excluding the Internet, this setting i.e. they either reached our server directly or their last hop IP
highlights computational latency. In the “Internet” setting, address was in the same AS as our server. The distribution
client and server run on separate machines, communicating of these probes over continents, ASNs, and different access
over the Internet. The client ran on a VPS in Ohio, U.S. while technology is described in the Appendix B.
the server ran on a VPS in Paris, France.
Figure 4 illustrates the results of this experiment. In the “In-
ternet” setting, we see a minimum, median, and maximum la- Differences in local and last-mile access technology We
tency of 87, 88, and 171 ms, respectively. 99% of handshakes evaluate our hypothesis of whether the probe’s last-mile ac-
exhibit a latency of less than 94 ms—only 7 ms more than cess technology (derived from the probe’s tags) has large
the minimum. The “Loopback” setting, we observe a median effects on the latency seen from the server side.
latency of 18 µs. 99% of handshakes exhibit a latency of less We present our results for the RTT difference between the
than 34 µs. As expected, we observe a long tail in both and 0trace result and the probes’ TCP and ICMP traceroutes sepa-
we account for outliers by running repeated measurements. rately. We find that VDSL has three outlier RTT differences

USENIX Association 33rd USENIX Security Symposium 2269


(a) Difference between 0trace-determined RTT and RIPE Atlas- (b) Difference between 0trace-determined RTT and RIPE Atlas-
determined ICMP RTT in ms. determined TCP RTT.

Figure 5: Comparing the RTT differences between probes with different local last-mile access technology— In 80% of all
measurements the RTT difference is less than 18ms. However RTT differences in TCP are lower and more stable. Removing
three outliers from VDSL, we see that the differences between the access technology become more pronounced for the remaining
20% of the values, with cable performing worse, and Fibre performing marginally better with no outliers.

measuring 388ms, 1970ms, and 2640ms and upon investiga- controlled testbed where we control the webserver, the client,
tion we note that these were due to two German RIPE probes and test our system with a variety of different VPN products
in AS8881 and AS3320 which failed to reach far in the tracer- and proxy servers, that run multiple different protocols. We
oute. To improve readability of our figure, we have removed test the system with mobile connections, various different
these three cases and note them separately in Figure 14 in user locations, without any proxies connected, and then by
Appendix F. From Figure 5, we see that 95% of all ICMP connecting to various VPN and tunnelling protocols. Next,
measurements have an RTT difference lower than 111ms in we extend on this experiment by conducting a large evalua-
Cable, 94.93ms in DSL, 86ms in Fibre2 , and 31.3ms in VDSL. tion with volunteer Internet users. This mimics a real-world
Similarly, 95% of all the TCP measurements have an RTT dif- deployment test for CalcuLatency because we draw on a ge-
ference lower than 46.3ms in Cable, 93.8ms in DSL, 87.9ms ographically diverse population that’s representative of the
in Fibre, and 30.7ms in VDSL (including outliers). average Internet user. For this experiment, we set up our mea-
In summary, we see that the RTT differences in TCP are surement server at a university network and advertised our
lower and more stable, that is, 0trace measurements are closest crowd-source measurement page via social media channels.
to the TCP ground truth RTTs. We note that in 80% of all
measurements the RTT difference is less than 18ms. We see
that the differences between the access technology become 5.1 Control Testbed Evaluation
more pronounced for the remaining 20% of the measurements,
with cable performing worse, and Fibre performing slightly We conduct a controlled evaluation of our CalcuLatency
better than others considering it has no outliers. service by creating a testbed and collecting various mea-
Our experiments show that given that 0trace is able to surements ourselves. In our experiment, we controlled the
match ground truth for >80% cases, and in 95% of cases has a webserver that conducted the measurements. We had team-
maximum difference of 111ms in RTT when compared with members run tests from their devices from fifteen differ-
ground truth. Given the results of our experiment, we can ent home and mobile networks using four different popular
reason about the generalizability of our measurements. We browsers—Chrome, Mozilla Firefox, Safari, and Brave.
can be confident that when measuring long distance proxies, Our network locations include three cities in the U.S. and
these differences of 111ms will not hinder CalcuLatency’s one city in Canada, UAE, and India each from multiple net-
ability to detect the proxies, and in the remaining 5% of cases, works. We conduct both direct measurements from these net-
it may inflate the endpoint RTT, which may lead to false works and also run measurements while connecting to various
negatives that CalcuLatency can tolerate. different VPN and proxy servers around the world. We in-
strumented an automation script using Selenium to trigger
measurements from multiple different browsers on a com-
5 Evaluating CalcuLatency in Practice puter at the same time. In both the direct connection case
and when we turned on a VPN, we use this automation script
We conduct a thorough evaluation of CalcuLatency using a
to conduct multiple experiments from different browsers to-
two-pronged approach. First, we conduct an evaluation with a
wards the same server. We re-ran a direct measurement each
2 We are preserving the spelling used in the RIPE Probe Tag time we started experiments, from each browser, and repeated

2270 33rd USENIX Security Symposium USENIX Association


Measurement Data: Websocket Min RTT, ICMP MinRTT, 0trace 1.00
Complete (Last hop, last RTT), TCP Handshake RTT

0.75

ECDF
TCP ICMP YES Label: “WsICMP” Type
RTT >≈ RTT Diff: Min WS RTT - 0.50
Possibly: ICMP Min RTT
WS RTT NwLayerProxy
Exists? Direct
or Direct
> 10ms < 10ms 0.25 Proxy
Possibly: NO
AppLayer
Proxy
TCP 0trace
0.00
RTT Closest YES Label: “Ws0TClient” 0 100 200 300
≈ Pkt Diff: Min WS RTT -
YES
Close in Value < 10ms ICMP == Client 0trace RTT RTT difference in ms
Label: “E2EWsTCPICMP” RTT IP
Diff: Min WS RTT - Min
{TCP RTT, ICMP RTT} NO NO Figure 7: ECDF of RTT difference for the 891 con-
Query
TCP [Link]/data/netwo trol, testbed evaluation measurements—From our measure-
rk-info/[Link]?resou
RTT ASNs
YES ≈ of 0trace
rce=<IP> ments, we find that over 98% of direct measurements and
Close in
Value < 10ms
0trace Closest Pkt
YES
less than 11% of VPN measurements have an RTT difference
RTT same as
Label: “E2EWsTCP0t” Client IP Label “Ws0TNetwork”
below 50 ms (shaded in yellow).
Diff: Min WS RTT - Min NO Diff: Min WS RTT -
{TCP RTT, 0Trace RTT}
0trace RTT
NO

0trace Label “BestEffortWs0T”


Closest
Pkt
Diff: Min WS RTT - 5.1.2 Results
YES 0trace RTT
==
Label: “E2EWs0tClient” Client IP
RTTDiff: Min WS RTT - We calculate the RTT difference for each experiment using a
0trace RTT NO
decision tree that we devised (Figure 6).
Label “E2EBestEffortWsTCP”
RTTDiff: Min WS RTT - TCP
First, we check if the TCP handshake RTT (∆TL ) ⪆ web-
RTT
socket RTT (∆AL ), i.e. if they are within ± 10 ms. If so then
the experiment is a network layer proxy or a direct measure-
Figure 6: Decision Tree—We create a decision tree that uses
ment. In this case, the ∆TL cannot be used for calculating the
the round-trip times measured using our four different tech-
RTT difference. Hence, we use the ICMP RTT, if it exists, and
niques. Based on their results, we use a combination of these
if not, we will use 0trace RTT as ∆NL . In order to differentiate
values to decide whether a particular experiment can be la-
and identify which value was used as ∆NL and to record the
beled as a possible proxy connection. The two yellow nodes
conditions that are met according to the decision tree, we
refer to cases that lack some of the measurement results and
assign labels for each end node of the tree. For instance, in
are hence, labeled “best effort” calculations.
this case, if ICMP RTT does not exist (i.e. ICMP was not
supported by the client IP), we will use 0trace RTT and assign
different labels based on whether 0trace reached the client IP
it between different experiments. The VPNs and proxies used
(*0TClient), 0trace did not reach the client but reached the
to conduct these experiments include ten popular, commer-
same ASN as the client (*0TNetwork), and label it a “best
cial VPN providers including Astrill VPN, CyberGhost VPN,
effort” if none of these conditions is met. We only use the
ExpressVPN, IPVanish, IVPN, Mozilla, Mullvad, NordVPN,
0trace RTT from the last successful hop for the calculation.
Private Internet Access, Surfshark. These VPN providers of-
However, if ∆TL ⪉ ∆AL , then the experiment could be an
fer multiple different protocols such as Wireguard, OpenVPN,
application layer proxy. In this case, we check if ∆TL ≈ to
and other proprietary protocols such as OpenWeb and Light-
either of the ∆NL (ICMP RTT, 0trace RTT). If not, we follow
way. We also instrumented our own SOCKS5 proxy servers
the same procedure as above and use ICMP RTT if it exists,
and tested them as well. We do not measure or reason about
and if not, fallback to 0trace if it reaches the client IP or its
security and privacy aspects of the tested commercial VPN
ASN. Finally, if none of those conditions are met, we label it a
providers; our aim is to simply connect to various servers they
“best effort” and use TCP RTT to calculate the RTT difference.
offer and evaluate our CalcuLatency measurement service.
In our control testbed evaluation, we find that in 98.0%
of the direct measurements (48 of 49) the calculated RTT
5.1.1 Data Characterization difference is less than 50 ms, as shown in Figure 7. Upon
investigating, the anomalous direct measurement belonged to
In total, we conducted 891 unique experiments with 354 an Indian mobile network provider, where the ICMP failed
unique client IPs. Our measurements came from 337 unique and 0trace was unable to reach even the same network, hence
VPN or proxy server IPs belonging to 82 different au- it was a “best effort” calculation. We will revisit the “best
tonomous systems (ASes), and 17 unique direct, client IPs effort” cases in §5.3.
belonging to 12 different ASes. Overall, we collected data Next, we investigate the VPN measurements, we find that
from four different countries. the RTT difference is above 50 ms in 89.1% of all VPN mea-

USENIX Association 33rd USENIX Security Symposium 2271


surements (750 of 842), shown in Figure 7. Among the rest of 1.00
the 10.9% of VPN experiments, we find that in 60.9% of them
(56 of 92), the VPN server is close to the user geographically, 0.75

ECDF
i.e. the straight line distance between user and VPN server is Type
0.50
less than 650 miles. This is equal to the straight line distance Direct
between Washington DC and Boston, MA (635 miles) or 0.25 Proxy
the straight line distance between Mountain View, California 0.00
and San Diego, California (685 miles). This is expected, as 0 100 200 300 400
specified in our assumptions, our technique primarily seeks RTT difference in ms
to identify longer-distance remote VPN users.
We investigated the 14 unique VPN server IPs belonging Figure 8: ECDF of RTT difference for the 283 public
to 36 of 92 measurements (39.14%) where the user to VPN crowdsourced evaluation measurements—From our col-
distance was above 650 miles. We found that six of those 14 lected measurements, we find that over 98% of direct mea-
IPs belonged to AS9009 (corresponding to 17 of 36 measure- surements and 36.1% of VPN measurements have an RTT
ments), and for all these IPs, the VPN provider claims their difference below 50 ms (shaded in yellow).
location to be in a Latin American country (Mexico, Costa
Rica, The Bahamas, Venezuela, Chile, Argentina). However,
when we investigated their ICMP RTTs as measured from to enter their VPN server location if known, and any details
our server in the U.S. Midwest region, we find that all their about their VPN provider.
RTTs ranged between 28 ms–52 ms. We used the speed of the We recruited users by publicizing a call for participation
Internet approximation provided by Katz-Bassett et al. [28] on Twitter using the authors’ twitter handles and collected
which is 49 c, where c is the speed of light traveling in a vac- data for this experiment over a period of 15 days. We did not
uum, and find that for all of these experiments, the locations target or advertise towards specific Twitter users and chose
of the VPN server as reported by the VPN provider is not use platforms like Prolific or MTurk as they have not been
an impossibility. Their true geolocation appears to be much evaluated to be generalizable for network measurement (but
closer than that what is reported. rather for qualitative studies).
For the remaining 19 experiments with eight unique VPN
server IPs, we were not able to disprove the advertised loca- 5.2.1 Data Characterization
tion of the server with the RTT measurements. However, five
VPN servers belonged to the same VPN provider which may We collected 283 unique experiments from 252 unique client
have a policy to inflate ICMP values. The other two IPs did IPs, with (self-reported) 161 direct measurements and 122
not have a reliable ICMP value for us to confirm or disprove VPN measurements. Our 161 direct measurements came from
their advertised location. 145 unique client IPs from 93 different autonomous systems
Overall, we find that 50 ms is a viable threshold for the (ASes). Our 122 VPN measurements came from 109 unique
RTT difference to distinguish a direct measurement from one VPN IPs belonging to 51 different ASes. The most num-
coming through a remote VPN or proxy. We expand on this ber of measurements for the direct measurements came from
evaluation by conducting a real-world evaluation to increase ASN 7922, followed by ASNs 701, 7018, 21928 and 3320 be-
the diversity of our “direct” connections and user locations. longing to Comcast, Verizon, AT&T, T-Mobile, and Deutsche
Telekom. We provide a more detailed breakdown of these IPs
by ASN in the Appendix E. Overall, we collected data from
5.2 Real-world Crowdsourced Evaluation over 37 different countries from all (six) continents.

Next, we conduct an evaluation of our system in practice, 5.2.2 Results


using real-world measurements. We developed the Calcu-
Latency system, created a web server, and deployed it on a Based on the same decision tree explained in §5.1 and Fig-
subdomain of our university. To obtain ground truth about ure 6, we calculate the RTT difference and label each experi-
the measurements, we create a form that users fill out to give ment with the appropriate label from the tree. We find that
us information regarding their setup. We ask users to option- 98.8% of all direct measurements have an RTT difference
ally provide us with an email with which we can reach them, below 50 ms (159 of 161) as shown in Figure 8, following
whether they are connecting to us though a mobile or a desk- the same trend as our controlled testbed evaluation, despite
top, who their internet service provider is, and their current a large increase in the variety and diversity of our collected
location (to reason about the measured latencies and physical direct measurements. Upon investigating the remaining 1.2%
distance). We then ask users if they are connecting to us via a (2 of 161) measurements, we find that one measurement was
VPN, or directly. If they are using a VPN, we request them conducted through Safari and indicates that the IP is an iCloud

2272 33rd USENIX Security Symposium USENIX Association


1.0 that did not respond to ICMP pings belong to Astrill VPN
(14/25), which may have a policy for some/all of its servers to
Proportion of all measurements

0.8
not respond to ICMP pings. Of the rest, four VPN servers be-
0.6 longed to personal OpenVPN VPN servers, three other servers
Public
Testbed others belonged to SOCKS5 proxies, and the remaining four
0.4
were miscellaneous VPN services including iCloud Private
0.2 Relay.
Does removing all best-effort calculations improve our
0.0
0 500 650 1000 1500 2000 method? Yes! Combining both sets of evaluation, we see
Difference in RTT in ms
that exactly 4.0% of all measurements (47/1174) was labeled
Figure 9: CDF of the distance between user and VPN when “best effort calculations”, which means that the experiment
RTT difference < 50ms—Majority of VPN experiments with did not contain successful ICMP measurements, and that TCP
a low RTT difference (below 50ms) are located close to the RTT measurement (∆NL ) is too close in value to ∆AL , and
user. 66.2% of these cases, the proxy is only upto 650 miles 0trace did not reach the client or even the same network as
away from the user (in yellow), and 89% of these cases, the the client IP. Since these anomalous measurements only com-
proxy is < 1000 miles from the user (in orange). prise 4% of all measurements, we do an investigation into
what our analysis would look like if we removed these mea-
surements. When a service provider deploys CalcuLatency,
Private Relay IP, we conclude that the user may have over- they can achieve this by simply labeling all “best effort” ex-
looked the fact that private relay was turned on, and hence periments as requiring a re-run and have the client conduct
this is a “proxy” measurement. The other experiment had another round of measurements.
an abnormally large 0trace RTT value, labeled a “best effort” From Figure 10, we see that a 100% of all direct measure-
calculation (We explore this in §5.3). ments have an RTT difference less than 50 ms, and 86.2%
Next, we investigate the VPN measurements, the RTT dif- of VPN measurements (808 of 937) have an RTT difference
ference in 63.9% of measurements (78 of 122) are above the greater than 50 ms. Of the 13.8% (129 of 937) VPN measure-
threshold of 50 ms, also shown in Figure 8. Of the 36.1% ments that have an RTT difference below 50 ms, we find that
VPN measurements whose RTT difference is below 50 ms, 66.7% of measurements are less than 650 miles away from
we find that in 77.3% of the cases (34 of 44), the user and them user and another 13.2% advertise fake VPN locations as
the VPN server are below 650 miles apart, which we con- we found in §5.1. The remaining 26 VPN experiments whose
sider as the minimum threshold for it be a “remote proxy”, RTT difference is less than 50 ms but distance from the user is
as shown in Figure 9. We use the locations provided by the more than 650mi are false negatives, and the average distance
user in our form and also compare the VPN IP-geolocation. between the VPN server and user in these cases is 1089 miles.
Another 2.2% (1 of 44) was an experiment mislabeled as a In summary, in a more conservative analysis setting where
VPN measurement, although the client IP indicates it is in we only consider the experiments where we obtain all mea-
the same location as the user. Of the remaining 20.5% of surement data points (96% of all collected measurements), we
measurements (9 of 44), we did not see any proof to disprove do not find any false positives and false negative rate is 2.77%
the apparent location of the VPN. (26/937). In production systems, the experiments that we ex-
clude in this analysis, i.e. those that are labelled “best-effort”,
can be marked by service providers for deeper inspection and
5.3 Results Combining the two Evaluations for re-running of the experiment to remove transient issues.
Overall, if we consider 50 ms to be the RTT difference thresh-
old for a connection to be labelled as a remote proxy, our 6 Ethics
method has a low false positive rate of 0.95% (2/210 direct
measurements). On the other hand, we find that 14.1% (136 Before running a public, crowd-sourced evaluation of our
of 964) VPN measurements had an RTT difference below system detailed in § 5.2, we contacted our institution’s IRB
this threshold. However, only 2.9% (28/964) of VPN mea- and were informed that our project is exempt from regula-
surements are located over 650 miles away from the user, tion. Our crowdsourcing evaluation web service presents a
and are legitimate (i.e. excluding VPNs with fake locations, input form and did not trigger any measurements until the
and mislabeled experiments). Thus, we consider 2.9% our user reads, inputs data about the measurement, and consents
method’s false negative rate. to the measurements. The index page also outlines the mea-
What percentage of VPN servers respond to ICMP surements conducted, data collected, and that any latency data
pings? In our evaluations, we collected a total of 434 unique collected may possibly be published to help future research,
VPN IPs, and we find that over 94.2% of these IPs (409/434) after anonymizing the last octet of each visitor’s IP address.
responded to ICMP pings. A majority of the VPN server IPs From the web service, we measure the user’s latency using

USENIX Association 33rd USENIX Security Symposium 2273


1.00 highly restrictive firewalls that discard all ICMP traffic. For
perfect accuracy, we need to receive ICMP error packets from
0.75 the client itself. The farther away we are from the client, topo-
ECDF

Type logically, the less accurate our RTT measurements become.


0.50
Direct While such restrictive firewalls do exist, our results suggest
0.25 Proxy that they are the exception rather than the rule. We find from
0.00
our evaluation that 94.2% of all VPN or proxy server IPs
0 100 200 300 400 respond to ICMP pings, and over 56% of direct connections
RTT difference in ms respond to ICMP pings, which is much higher than reported
before. Additionally even if direct ICMP pings are disallowed,
Figure 10: ECDF of RTT difference of 1127 reliable mea- 0trace can function as long as the firewall does not discard
surements—This analysis contains measurements that did ICMP error responses.
not have a “best effort” label from our decision tree. 100% of If a server-client pair uses the Multipath TCP (MTCP) pro-
all direct measurements and 86.2% of all VPN measurements tocol, it could reduce the effectiveness of CalcuLatency, due
have an RTT difference below 50 ms (in yellow). to the fact that this protocol uses multiple network routes con-
currently within the same TCP connection, which could cause
skewed results when comparing the 0trace RTT measurement
four different methods and any data collected is only accessi- to the application-layer RTT measurement since they could
ble to the authors of this paper. have traveled different paths. However, Shreedhar et al. de-
We acknowledge that there are potential misuses of our termine that MTCP usage is extremely low compared to TCP
CalcuLatency system that server-side operators and service (0.5%), and the vast majority of MTCP traffic is from a single
providers could use to penalize VPN users. However, ser- company (Apple), which means that CalcuLatency would still
vices currently employ black-box techniques to do similar be viable in the vast majority of use-cases [30].
blocking that maybe even more discriminatory. Our solution Finally, we also present an analysis of how latency measure-
aims to identify users that connect to far-off, remote VPNs ments can vary when it comes to dual-stack endpoints, where
specifically to abuse systems, and will serve as one of several IPv6 connections to the server side can influence a bloat in the
anti-fraud identifiers. Previous work has found that a majority measured latency differences. We present an analysis using
of VPN users use it for privacy and security purposes [29]. RIPE Atlas probes in the Appendix D. However, CalcuLa-
Such legitimate VPN users can still connect to servers closer tency only supports IPv4 based 0trace measurements, and we
to themselves to enjoy the privacy benefits of the VPN and leave the optimization of IPv6 based latency calculation for
still evade our remote proxy detection. detecting proxy-enabled abuse for future work.

7 Limitations 8 Related Work

CalcuLatency cannot reveal proxy users in all cases. We can We provide an overview of related work on latency measure-
only detect proxy users if the proxy is sufficiently far enough ment techniques and its applications, Internet liveness tests,
from the user, from our measurements this distance appears to and proxy detection methods.
be at least 650 miles or more. Otherwise if the user and proxy Latency Measurements: In a 2021 blog post, Tschacher
are closer, the RTT difference may be too small to detect writes that using in-browser measurements and on the server
the proxy. This may be a prohibitive limitation for some side by measuring RTT of the incoming TCP/IP handshake,
service providers, and a non-issue for others because users website owners can infer that a visitor is using a proxy using
may deliberately choose proxy servers that are geographically the latency difference [31]. However, the blog post conducts
far away from them, to disguise their physical location. For a limited evaluation and relies only on one incoming TCP/IP
example, users defrauding rewards networks want access to connections and the handshake to identify RTT which may
higher-reward locations, or users of streaming services often be affected by packet loss and transient factors.
use VPNs to appear to be in another country, to get access Weinberg et al. [10] used ping-time measurements to hosts
to geo-restricted content. Each service operator can decide in known locations to estimate the locations of 2,269 proxy
based on their business needs if they are conservative (treat servers. They also found that over 90% of VPN servers and
ambiguous users as direct users) or be more aggressive (treat their first hop routers tested ignore ICMP pings and do not
ambiguous users as proxy users). The service provider can send time exceeded packets. Hopper et al. [32] estimate that an
also choose to subject ambiguous users’ connections to further Internet host in 2007 can be uniquely identified by knowing
tests to confirm observations. its RTT to five randomly chosen hosts. Pelsser et al. [33]
Our technique becomes less accurate in the presence of studied whether ICMP ping provides a good estimation of

2274 33rd USENIX Security Symposium USENIX Association


delay and find that from an application perspective, ICMP However, their method must be trained for each IP address,
ping actually gives a poor estimate of the delay and jitter as it and hence is not practical.
can vary between flows.
Jiang et al. propose two techniques that a passive in-path 9 Discussion and Conclusion
monitor can use to measure the TCP round-trip time including
the handshake [34]. Unlike this work, CalcuLatency has ac- In this paper, we present and evaluate our system CalcuLa-
cess to bidirectional flows and can analyze all three segments tency that incorporates several network RTT measurement
of the TCP handshake. In 2020, Livadariu et al. found that techniques and leverage the application-layer and network-
using RTT measurements from probing an IP from multiple layer differences in roundtrip-times when a user connects to
vantage points (including from within its own AS) is a more a service using a proxy (which are absent when the user con-
viable strategy than using geolocation databases [35]. nects to the service directly). We implement and evaluate each
building block of our system individually: WebSocket RTT
Cloud Providers: Dang and Mohan [36] investigated la-
measurement on the application-layer, TCP handshake RTT
tency specifically towards cloud service providers in 2021,
recorded on the transport-layer, and ICMP ping and 0trace
with a focus on developing regions. They found that geo-
hop-enumeration RTT on the network-layer. We integrate all
graphical distance has a high impact on latency towards cloud
these techniques into one system called CalcuLatency and
systems. Kashaf et al. [37] determine that a majority ( 95%) of
conduct a two-pronged evaluation: a control testbed environ-
websites that are hosted in or serve traffic primarily to Africa
ment where we test multiple different VPN products, proxy
rely on third-party cloud infrastructure. These two conclu-
protocols, and server locations with over 337 unique VPN
sions support the efficacy of CalcuLatency in lower traffic
or proxy server IPs tested. To expand the diversity of our
regions. Dang and Mohan conclude that the last-mile tech-
evaluation, we rally users and collect a public, crowdsourced,
nology used is the greatest factor for latency when reaching
real-world evaluation of our system, which gained us 283 mea-
a cloud provider, but the results from our RIPE Atlas mea-
surements from 37 different countries in all six continents.
surements show that the last-mile technology does not have a
Our evaluations reveal that a round-trip time difference of
large impact on the measured RTT.
50 milliseconds between the application and network-layer la-
Internet Liveness: In their CCR’18 paper, Bano et al. per- tencies is a viable, empirical threshold to consider a particular
form Internet-wide scans to study the population of addresses client as a remote VPN or proxy connection. We open-source
that respond to probes [20]. They found ICMP probes are our code in order to help encourage future research [41, 42].
the most effective to discover alive hosts but TCP probes can CalcuLatency provides a preliminary, labelling technique
further add to the population of alive hosts. A combination for service-providers, and must not serve as the only signal
of ICMP, TCP, and (to a lesser extent) UDP results in the for detecting “malicious proxy traffic”. Not all VPN users are
most complete picture of liveness. Bano et al.’s results in- attackers and not all VPN or proxy traffic is abusive. Hence,
form how CalcuLatency attempts to determine an address’s our method only serves as one of the signals and service
network-layer or transport-layer latency. providers must leverage business-specific logic and make
decisions on flagging connections as possible abuse traffic.
Proxy Detection: Much work has focused on detecting For example, reward networks can flag certain connections as
proxies; mostly on detecting Web proxies, VPNs, and Tor. malicious and use such signals over time to detect and curb
Weaver et al.’s PAM’14 paper studied the prevalence of in- abusive consumers.
path Web proxies by sending controlled application-layer We acknowledge that service providers could potentially
measurements between clients and a server, controlled by the misuse this method to penalize all proxy use. However, we
researchers using Netalyzr [38]. In an S&P’19 paper, Mi et emphasize that our method’s inherent design of detecting long-
al. studied the emerging ecosystem of end user systems that distance proxies and VPNs still protect legitimate proxy use
serve as proxies to others—often without the knowledge of and its users. Users could evade detection by using proxies
the owners of the proxies [39]. close to their location, which gives them better performance
Hoogstraaten [40] explored several server-side VPN detec- and the requisite privacy and security features of the VPN.
tion methods, such as using existing IP information databases Ensuring the adoption, success, and sustainability of
(WHOIS, rDNS), fingerprinting TCP options like advertised privacy-focused business models rely heavily on the availabil-
MSS, and even timing based measurements. But they only pro- ity of computationally cost-effective and easily deployable
pose limited latency based measurements to identify internet- techniques. Service providers must strike a delicate balance
layer proxies, and conducted a short proof of concept. The between implementing robust abuse-prevention mechanisms
closest related work is by Webb et al. [15] who also proposed and maintaining a rigorous commitment to privacy. With Cal-
detecting proxies and VPNs based on traffic timing and la- cuLatency, we provide an open-source technique that can be
tency. They measure the RTT for each connecting IP address readily deployed as a software service that can support efforts
and flag anomalies in the distribution of these RTTs as proxies. that prioritize user interests and their privacy.

USENIX Association 33rd USENIX Security Symposium 2275


10 Acknowledgment [13] Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar,
Benoit Donnet, and Bamba Gueye. “IP geolocation
The authors are grateful to Armin Huremagic, Riya Agarwal, databases: Unreliable?” In: ACM SIGCOMM Com-
Farzad Siraj for their help with the evaluations. We thank the puter Communication Review (2011).
reviewers for their feedback, and RIPE Atlas for generously
[14] Can I use WebSocket? URL: https : / / caniuse . com /
granting credits to conduct an evaluation of our techniques.
?search=websocket (visited on 06/06/2023).
This material is based upon work supported by the National
Science Foundation under grant numbers CNS-2141512, and [15] Allen T. Webb and A. L. Narasima Reddy. “Finding
CNS-2237552. Proxy Users at the Service Using Anomaly Detection”.
In: CNS. IEEE, 2016. URL: [Link]
document/7860473.
References
[16] Hao Ding and Michael Rabinovich. “TCP Stretch Ac-
[1] Todd Spangler. Digital Advertising Slowed in 2022 knowledgements and Timestamps: Findings and Im-
but Was Still Up 10.8%. URL: https : / / variety. com / plications for Passive RTT Measurement”. In: ACM
2023/digital/news/digital-advertising-revenue-2022- CCR 45.3 (2015). URL: [Link]
growth- pwc- research- iab- 1235601801/ (visited on sites / default / files / ccr / papers / 2015 / July / 0000000 -
05/03/2023). [Link].
[2] PwC. Outlook 2022: The US Digital Advertising [17] Toke Høiland-Jørgensen, Bengt Ahlgren, Per Hurtig,
Ecosystem. URL: https : / / www . pwc . com / us / en / and Anna Brunstrom. “Measuring Latency Variation
industries/tmt/library/assets/pwc-iab-2022-outlook. in the Internet”. In: CoNEXT. ACM, 2016. URL: https:
pdf (visited on 10/01/2021). //[Link]/doi/pdf/10.1145/2999572.2999603.
[3] Amnesty International. Surveillance giants: How [18] Gregor Maier, Anja Feldmann, Vern Paxson, and Mark
the business model of Google and Facebook threat- Allman. “On Dominant Characteristics of Residential
ens human rights. https : / / www . amnesty . org / Broadband Internet Traffic”. In: IMC. ACM, 2009.
download / Documents / POL3014042019ENGLISH . URL : https : / / www. icir. org / vern / papers / imc102 -
PDF. Amnesty International, 2019. [Link].
[4] Presearch. [Link] [19] Bryan Veal, Kang Li, and David Lowenthal. “New
Methods for Passive Estimation of TCP Round-Trip
[5] Prolific. URL: [Link]
Times”. In: PAM. Springer, 2005. URL: [Link]
[6] Brave. [Link] [Link]/publications/papers/[Link].
[7] Qmee. [Link] [20] Shehar Bano et al. “Scanning the Internet for Liveness”.
[8] Michal Zalewski. 0trace – traceroute on established In: ACM CCR 48.2 (2018). URL: https : / / ccronline .
connections. Jan. 2007. URL: [Link] [Link]/wp-content/uploads/2018/05/sigcomm-
217023/. [Link].
[9] The WebSocket API (WebSockets). [Link] [21] Diwen Xue, Reethika Ramesh, Arham Jain, Michalis
[Link]/en-US/docs/Web/API/WebSockets_API. Kallitsis, J. Alex Halderman, Jedidiah R. Crandall, and
May 31, 2023. Roya Ensafi. “OpenVPN is Open to VPN Fingerprint-
[10] Zachary Weinberg, Shinyoung Cho, Nicolas Christin, ing”. In: Security. USENIX, 2022. URL: [Link]
Vyas Sekar, and Phillipa Gill. “How to Catch when [Link]/system/files/[Link].
Proxies Lie: Verifying the Physical Locations of Net- [22] Zakir Durumeric, Eric Wustrow, and J. Alex Halder-
work Proxies with Active Geolocation”. In: IMC. man. “ZMap: Fast Internet-Wide Scanning and its
ACM, 2018. URL: [Link] Security Applications”. In: Security. USENIX, 2013.
edu/~nicolasc/publications/[Link]. URL: [Link]
[11] Reethika Ramesh, Leonid Evdokimov, Diwen Xue, and [23] Ram Sundara Raman, Mona Wang, Jakub Dalek,
Roya Ensafi. “VPNalyzer: Systematic Investigation of Jonathan Mayer, and Roya Ensafi. “Network Measure-
the VPN Ecosystem”. In: NDSS. The Internet Society, ment Methods for Locating and Examining Censor-
2022. URL: [Link] ship Devices”. In: ACM International Conference on
24285. emerging Networking EXperiments and Technologies
[12] Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han (CoNEXT). 2022.
Zhang, Roya Ensafi, and Christos Papadopoulos. “A
look at router geolocation in public and commercial
databases”. In: Proceedings of the 2017 Internet Mea-
surement Conference. 2017.

2276 33rd USENIX Security Symposium USENIX Association


[24] Philipp Richter et al. “A Multi-perspective Analysis [35] Ioana Livadariu, Thomas Dreibholz, Anas Saeed Al-
of Carrier-Grade NAT Deployment”. In: IMC. ACM, Selwi, Haakon Bryhni, Olav Lysne, Steinar Bjørnstad,
2016. URL: https : / / dl . acm . org / doi / pdf / 10 . 1145 / and Ahmed Elmokashfi. “On the Accuracy of Country-
2987443.2987474. Level IP Geolocation”. In: Proceedings of the Applied
[25] IETF. RFC 1812: Requirements for IP Version 4 Networking Research Workshop. ANRW ’20. Virtual
Routers. [Link] Event, Spain: Association for Computing Machinery,
1995. 2020, pp. 67–73. ISBN: 9781450380393. DOI: 10.1145/
3404868 . 3406664. URL: https : / / doi . org / 10 . 1145 /
[26] IETF. RFC 792: Internet Control Message Protocol.
3404868.3406664.
[Link] 1981.
[36] The Khang Dang, Nitinder Mohan, Lorenzo Corneo,
[27] RIPE Atlas Docs. Starting your own Measurements
Aleksandr Zavodovski, Jörg Ott, and Jussi Kangasharju.
(User-defined Measurements). [Link]
docs / getting - started / user - defined - measurements . “Cloudy with a chance of short RTTs: analyzing cloud
html#the-user-defined-measurements. connectivity in the internet”. In: Proceedings of the
21st ACM Internet Measurement Conference. IMC ’21.
[28] Ethan Katz-Bassett, John P John, Arvind Krishna- Virtual Event: Association for Computing Machinery,
murthy, David Wetherall, Thomas Anderson, and Yatin 2021, pp. 62–79. ISBN: 9781450391290. DOI: 10.1145/
Chawathe. “Towards IP Geolocation Using Delay and 3487552 . 3487854. URL: https : / / doi . org / 10 . 1145 /
Topology Measurements”. In: IMC. ACM, 2006. URL: 3487552.3487854.
[Link]
[Link]. [37] Aqsa Kashaf, Jiachen Dou, Margarita Belova, Maria
Apostolaki, Yuvraj Agarwal, and Vyas Sekar. “A First
[29] Reethika Ramesh, Anjali Vyas, and Roya Ensafi. “"All Look at Third-Party Service Dependencies of Web
of them claim to be the best": Multi-perspective study Services in Africa”. In: Passive and Active Measure-
of VPN users and VPN providers”. In: 32nd USENIX ment. Ed. by Anna Brunstrom, Marcel Flores, and
Security Symposium (USENIX Security 23). Anaheim,
Marco Fiore. Cham: Springer Nature Switzerland,
CA: USENIX Association, 2023.
2023, pp. 595–622. ISBN: 978-3-031-28486-1.
[30] Tanya Shreedhar, Danesh Zeynali, Oliver Gasser, Nitin-
[38] Nicholas Weaver, Christian Kreibich, Martin Dam,
der Mohan, and Jörg Ott. A Longitudinal View at the
and Vern Paxson. “Here Be Web Proxies”. In: PAM.
Adoption of Multipath TCP. 2022. arXiv: 2205.12138
Springer, 2014. URL: [Link]
[[Link]].
[Link].
[31] Nikolai Tschacher. Detecting Proxies and VPN’s [sic]
with Latency Measurements. June 2021. URL: https: [39] Xianghang Mi et al. “Resident Evil: Understanding
/ / web. archive . org / web / 20210614154432 / https : / / Residential IP Proxy as a Dark Service”. In: Security
[Link]/2021/06/07/detecting-proxies-and- & Privacy. IEEE, 2019. URL: [Link]
org/stamp/[Link]?tp=&arnumber=8835239.
vpn-with-latencies/ (visited on 02/15/2023).
[32] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan- [40] Hans Hoogstraaten. “Evaluating server-side internet
Tin. “How Much Anonymity does Network Latency proxy detection methods”. MA thesis. Leiden Univer-
Leak?” In: ACM Transactions on Information and Sys- sity, 2018. URL: [Link]
tem Security 13.2 (2010). URL: [Link] nl/access/item%3A2701711/view.
[Link]/~hoppernj/[Link]. [41] Zerotrace. URL: [Link]
[33] Cristel Pelsser, Luca Cittadini, Stefano Vissicchio, and zerotrace-code.
Randy Bush. “From Paris to Tokyo: On the Suitability [42] CalcuLatency Evaluation Code. URL: [Link]
of ping to Measure Latency”. In: IMC. ACM, 2013. com/censoredplanet/calculatency-code.
URL : [Link]
[43] RIPE Atlas Probes Archive, Accessed 01-11-2024.
[Link]. [Link]
[34] Hao Jiang and Constantinos Dovrolis. “Passive Estima- [Link].bz2.
tion of TCP Round-Trip Times”. In: ACM CCR 32.3
(2002). URL: [Link]
jul02/[Link].

USENIX Association 33rd USENIX Security Symposium 2277


Top 1% Top Top B RIPE Atlas Probes Characterization
Continent
ASes 10% 50%
ASes ASes In our analysis outlined in 4.3, we downloaded the RIPE Atlas
probes list latest on January 11, 2024 [43]. The total number
Elig. Com. Elig. Com. Elig. Com. of probes by unique probe ID was 38,554. But many probes
Europe 29.8 31.3 65.4 66.2 87.2 87.3 have the same public IPv4 address, and the number of unique
N. America 36.5 24.6 67.3 61.4 88.8 87.8 IPv4 addresses is 27,004.
Oceania 28.9 30.1 53.0 39.7 87.2 82.2 We had to filter probes by eligibility based on multiple
S. America 7.2 15.6 32.5 28.1 66.2 65.6 criteria. We found that 1,497 (3.9% of 38,554) probes had no
Asia 10.9 9.7 41.5 40.0 77.2 74.2 public IPv4 OR IPv6 addresses. We found 588 (1.5%) probes
Africa 15.9 43.8 39.7 43.8 71.4 75.0 had a bad country code where the country code either did not
exist or the length of the country code was not two (specified
Table 1: Distribution of probes over the top 1%, 10% and by ISO-3166). We found that there were 1,107 (2.9%) probes
50% stratified by the continent of the probes. “Elig.” refers that had undesirable tags attached to them, namely "system-
to the percentage of eligible probes that exist in RIPE Atlas geoloc-disputed", "core", "cloud", "vps", "ixp", "ipv6-only",
and “Com.” refers to the percentage of probes that actually or probe’s tags start with “data-center” or “datacenter”.
participated in and completed our measurements successfully. Importantly, we found that that 24,454 (63.4%) probes
This table shows that the distribution of eligible probes closely were either disconnected and/or abandoned, and were hence
follow the distribution of the probes that actually participated unusable. We also had to filter out 126 (0.3%) probes that do
in our measurements, showing that our selection strategy did not have an IPv4 address specifically because 0trace relies on
not skew our results. the IPv4 stack on the endpoint contacting us. We identified
a set of 97 valid “tags” that determine usable probes. We
eliminated 5,382 (14%) of probes that had none of the 97
1.00 desirable tags.
Of the 38,554 probes in the RIPE Atlas probe file, the total
Platform number of ineligible probes as defined by the criteria above
0.75
MacBook (Chrome) was 33,154 (85.99% of 38,554). Therefore, the number of
ECDF

0.50 MacBook (Safari) eligible probes were 5,400 (14%). Note that while we request
all these eligible probes to participate in our measurements,
ThinkPad (Chrome)
0.25 not all of them do [27]. RIPE Atlas documentation mentions
ThinkPad (Firefox)
that it is possible for probes to not participate if the requested
0.00 probes were too busy to take on new jobs, or had changed
1 10 100 1,000 statuses, i.e. become disconnected or offline.
RTT in ms (log) As mentioned in Section 4.3, we have 2,350 RIPE Atlas
probe IP-probe ID pairs that fully completed the SSL, ICMP
Figure 11: RTT distribution of WebSocket and TCP traceroute measurements. Overall, there are 2,304
measurements on four platforms. unique RIPE Atlas probe IPs belonging to 2,193 unique probe
IDs, because some probe IPs have multiple probe IDs asso-
ciated with them and some probes have multiple IPs. The
A Reliability of WebSocket Pings probes we have in our measurement covered all continents:
Europe (1539), North America (378), Australia/Oceania (73),
Figure 11 illustrates the results from our WebSocket ping
South America (32), Asia (155), and Africa (16).
measurement with 10,000 WebSocket echo requests from two
The distribution of these probes over the top 1%, 10% and
consumer laptops (ThinkPad X1, MacBook Pro) with two
50% of ASes per continent is shown in Table 1. We see that the
browsers: ThinkPad is a X1 Carbon 8th generation equipped
distribution of eligible probes closely follow the distribution
with a i5-10210U CPU, and ran Chrome 111 and Firefox 111.
of the probes that participated in our measurements, thereby
The 2023 MacBook Pro is equipped with an M2 Pro CPU,
showing that our selection strategy did not skew our results.
and ran Safari 16.3 and Chrome 111. We find that the median
The distribution of eligible and participated probes across
RTT is less than 1.4ms for all measurements and 99% of the
different access technology shown in Table 2.
RTT measurements completed in less than 2.4ms. Hence, we
account for this by running multiple measurements, spaced
out over time in CalcuLatency. C Measurements done at different times of day
and days of the week
We tested the hypothesis that running latency measurements
at different times of the day in the local time of the probe

2278 33rd USENIX Security Symposium USENIX Association


traceroute measurements, and 1,196 that completed the ICMP
traceroutes over multiple days.
Our measurements show that in both ICMP and TCP tracer-
outes, the standard deviations of the RTTs throughout the day
is low per IP address. Figure 13 shows that in 90% of probes,
the standard deviation is lower than 9.2ms and 10.9ms in
ICMP and TCP respectively. Moreover, in over 95% of probes,
the standard deviation is lower than 15.7ms and 21.1ms in
ICMP and TCP respectively. The outlying 5% measurements
are not belonging to any specific group of probes and are
Figure 12: RIPE Atlas IPv4 and v6 RTT Differences— equally spread out among the tags, locations, and times.
From our measurements, we find that in over 85.4% of the Since RIPE Atlas has a daily quota and a concurrency limit
probes the difference between the RTTs of the terminal hops for measurements, our aim is to maximize the usefulness of
done from the traceroutes of IPv4 and IPv6 addresses (ICMP our measurements. If we run measurements at different times
and TCP separately) is less than 50ms, which is exactly our of the day and repeat each measurement thrice to account
margin threshold. In all cases above 50ms, the IPv6 traceroute for errors, we can only measure a maximum of 595 probes a
RTT was markedly higher. day (Maximum credits per day: 1,000,000, divided by credits
need per measurement round per probe (420) times 4 repeats).
But since our experiments revealed that there is no significant
Tags (Associated RIPE Tags) Eligible Participated difference in latency for 95% of the cases when measured at
Fibre (Fibre, FTTH) 2124 906 different times of the day, we make the trade-off of missing
Cable (Cable) 1071 365 out the 5% of probes with some variance in measurements
VDSL (Only VDSL or VDSL2) 475 268 taken at different times of the day, and rather choose to expand
DSL (DSL and not 356 160 our measurements to include a larger variety and diversity of
VDSL/VDSL2) probes from different geolocations and last-mile technology.
In the rest of this section, we present results from our wider,
Table 2: Distribution of probes over access technology in the diverse measurement run that was aimed at maximizing the
set of eligible probes and probes that actually participated in number of probes, and conducted over a three day period
our measurement. This table shows that the distribution of (Sunday–Tuesday).
tags follows the participation rate of ≈46%
D RIPE Atlas Probes v4 vs v6 Differences
Since RIPE Atlas allows us to run Internet measurements
from their probes which have IPv4 and IPv6 dual stack probes,
we wanted to identify if our CalcuLatency and its evaluation
have any significant effects if the tested endpoints are dual-
stack. However, we note that our 0trace system is currently
only able to run and measure towards an IPv4 address, since
we rely on the IPID field and ICMP(v4) error responses in
our system design. If the endpoint makes a TCP handshake
via their IPv6 address, 0trace will not be able to run towards
Figure 13: CDF of Standard Deviation of RTTs per RIPE that host.
Probe IP measured eight times a day—We see that in both Nevertheless, we conducted measurements from the IPv4
ICMP and TCP traceroutes the standard deviations in over and IPv6 addresses of the same probe. We find that the RTT
90% of all probes is less than 9.2ms and 10.9ms respectively. difference between the terminal hops in the ICMP traceroute
and TCP traceroute. We find that in ≈ 90% of the probes,
the RTT difference between the IPv4 and IPv6 traceroutes
could lead to variance in expected RTT due to different traffic in a given protocol (ICMP/TCP) is below 50ms, as shown in
patterns at different times. So, we restricted our measurements Figure 12. There is a long tail with the RTT differences, and
to run at eight different times during the day, with respect to upon manual investigation we notice that in each case, the
the local time of the probe, extrapolated from the latitude IPv6 traceroute ended early, and had bloated RTT values and
and longitude of the probe. We also ran these measurements hence the difference turned out to be large.
on weekdays and weekends over two weeks to observe any We note that our CalcuLatency at the moment only sup-
differences. We had 1,345 probes that completed the TCP ports IPv4 based 0trace measurements and so, we present this

USENIX Association 33rd USENIX Security Symposium 2279


(a) Difference between 0trace-determined RTT and RIPE Atlas- (b) Difference between 0trace-determined RTT and RIPE Atlas-
determined ICMP RTT in ms. determined TCP RTT.

Figure 14: Comparing the RTT differences between probes with different local last-mile access technology

analysis in the appendix, and leave the optimization of IPv6 197706. The rest of the 34 ASes had one measurement from
based latency calculation for detecting proxy-enabled abuse them.
for future work.
F Differences in local and last-mile access tech-
E Public Real-World Crowdsourced Evalua- nology
tion ASN distribution
Figure 14 contains all the data from our evaluation of our
As mentioned in Section 5.2, we had 161 direct measurements hypothesis of whether different last-mile access technologies
that came from 145 unique client IPs belonging to 93 differ- have large effects on the latency seen from the server side. We
ent autonomous systems (ASes), and 122 VPN measurements divide and present our results for the RIPE Atlas probe’s TCP
that came from 109 unique VPN IPs belonging to 51 differ- and ICMP traceroutes separately. In Section 4.3, we presented
ent ASes. Overall, we collected data from over 37 different results without outliers for better visibility. Here, we present
countries from all (six) continents. all our results.
We provide a more detailed breakdown of the IPs by au- We see that there are only three outliers, all of which were
tonomous system number. Our direct measurements had at tagged with VDSL as the access technology. As noted in
least two measurements from the following AS numbers: Section 4.3, these outliers had an RTT difference of 388ms,
7922, 701, 7018, 21928, 3320, 36375, 33915, 3209, 24309, 1970ms, and 2640ms and upon investigation we note that
1257, 22773, 35807, 577, 61832, 27, 20115, 5089, 3215, 6167. these were due to two German RIPE probes in AS8881 and
The rest of the 74 ASes had one measurement from them. Our AS3320 which failed to reach far in the 0trace traceroute.
VPN measurements had at least two measurements from the These cases would be characterized as “best effort” in the real
following AS numbers: 9009, 212238, 60068, 136787, 39351, world operation of the CalcuLatency system and would be
16509, 136557, 132825, 137409, 63949, 20473, 13335, 60781, marked for retrying and not used for determination.

2280 33rd USENIX Security Symposium USENIX Association

You might also like