Papers by Katherine Barabash

Hash-based stateful load-balancers employ connection tracking to avoid per-connection-consistency... more Hash-based stateful load-balancers employ connection tracking to avoid per-connection-consistency (PCC) violations that lead to broken connections. In this paper, we propose Just Enough Tracking (JET), a new algorithmic framework that significantly reduces the size of the connection tracking tables for hash-based stateful loadbalancers without increasing PCC violations. Under mild assumptions on how backend servers are added, JET adapts consistent hash techniques to identify which connections do not need to be tracked. We provide a model to identify these safe connections and a pluggable framework with appealing theoretical guarantees that supports a variety of consistent hash and connectiontracking modules. We implement JET in two different environments and with four different consistent hash techniques. Using a series of evaluations, we demonstrate that JET requires connection-tracking tables that are an order of magnitude smaller than those required with full connection tracking while preserving PCC and balance properties. In addition, JET often increases the lookup rate due to improved caching.
arXiv (Cornell University), Dec 23, 2018
Consistent hashing is a central building block in many networking applications, such as maintaini... more Consistent hashing is a central building block in many networking applications, such as maintaining connection affinity of TCP flows. However, current consistent hashing solutions do not ensure full consistency under arbitrary changes or scale poorly in terms of memory footprint, update time and key lookup complexity. We present AnchorHash, a scalable and fully-consistent hashing algorithm. AnchorHash achieves high key lookup rate, low memory footprint and low update time. We formally establish its strong theoretical guarantees, and present an advanced implementation with a memory footprint of only a few bytes per resource. Moreover, evaluations indicate that AnchorHash scales on a single core to 100 million resources while still achieving a key lookup rate of more than 15 million keys per second.
Flow-Logs Pipeline is an observability tool that consumes raw network flow-logs and transforms th... more Flow-Logs Pipeline is an observability tool that consumes raw network flow-logs and transforms them from their original format (e.g. NetFlow or IPFIX) into numeric metrics format. FLP allows to define transformations of data to generate condensed metrics that encapsulate network domain knowledge.
Proceedings of the 16th ACM International Conference on Systems and Storage
Enterprises often deploy their business applications in multiple clouds as well as in multiple tr... more Enterprises often deploy their business applications in multiple clouds as well as in multiple traditional environments. This work focuses on the connectivity aspects of this new way of operating and consuming digital services. We define the related requirements, analyze the challenges, and present ClusterLink, our solution for interconnecting today's and future multi-cloud applications.

A case for an open customizable cloud network
ACM SIGCOMM Computer Communication Review
Cloud computing is transforming networking landscape over the last few years. The first order of ... more Cloud computing is transforming networking landscape over the last few years. The first order of business for major cloud providers today is to attract as many organizations as possible to their own clouds. To that end cloud providers offer a new generation of managed network solutions to connect the premises of the enterprises to their clouds. To serve their customers better and to innovate fast, major cloud providers are currently on the route to building their own "private Internets", which are idiosyncratic. On the other hand, customers that do not want to stay locked by vendors and who want flexibility in using best-for-the-task services spanning multiple clouds and, possibly, their own premises, seek for solutions that will provide smart overlay connectivity across clouds. The result of these developments is a multiplication of closed idiosyncratic solutions rather than an open standardized ecosystem. In this editorial note we argue for desirability of such an ecosys...
Proceedings of the 15th ACM International Conference on Systems and Storage
Flow-Logs Pipeline is an observability tool that consumes raw network flow-logs and transforms th... more Flow-Logs Pipeline is an observability tool that consumes raw network flow-logs and transforms them from their original format (e.g. NetFlow or IPFIX) into numeric metrics format. FLP allows to define transformations of data to generate condensed metrics that encapsulate network domain knowledge.

Estimating client QoE from measured network QoS
Proceedings of the 12th ACM International Conference on Systems and Storage, 2019
This research is done in the context of the SliceNet project [4] that aims to extend 5G infrastru... more This research is done in the context of the SliceNet project [4] that aims to extend 5G infrastructure with cognitive management of cross-domain, cross-layer network slices [1], with emphasis on Quality of Experience (QoE) for vertical industries. The provisioning of network slices with proper QoE guarantees is seen as one of the key enablers of future 5G-enabled networks. The challenge is to assess the QoE experienced by the vertical application and its users without requiring the applications or the users to measure and report QoE related metrics back to the provider. To address this challenge, we propose a method for deriving application-level QoE from network-level Quality of Service (QoS) measurements, easily accessible by the provider. In particular, we describe a PoC where QoE, perceived by application users, is estimated from low level network monitoring data, by applying cognitive methods. Our main goal is enabling the cloud provider to support the desired E2E QoE-based Service Level Agreements (SLAs), e.g. by monitoring QoS metrics within the provider's domain to optimize resource allocation through provider's actuators. Additional benefit can be achieved by applying the same technique to troubleshoot issues in the provider's infrastructure. In this work, we employed classical statistical methods to assess the relationship between the application-level QoE and the network-level QoS.
Efficient data transmission in an overlay virtualized network
Bandwidth Control in Multi-Tenant Virtual Networks
Defining And Managing Virtual Networks In Multi-Tenant Virtualized Data Centers
Distributed Address Resolution Service for Virtualized Networks
Failover of blade servers in a data center
Distributed Policy Service

Networking is currently a field of innovation and massive changes driven by several factors, the ... more Networking is currently a field of innovation and massive changes driven by several factors, the most prominent of which is the shift to cloud computing. It is already widely understood that networking requirements of cloud computing infrastructures differ from anything that the established networking technologies can offer. We present Distributed Overlay Virtual nEtwork (DOVE), an architecture based on a novel network abstraction allowing to define provider-tenant contracts at application level. DOVE network abstraction is decoupled from the particulars of the physical infrastructure design. Still, it captures the network functionality that matters: connectivity, security, performance, etc. DOVE architecture allows cloud providers to consolidate multiple abstractly defined networks on large scale commodity physical infrastructures, utilizing the advantages of specialized hardware appliances, and delegating full control to their tenants that now can manage and administer their virtual networks.

Dynamic Slice Scaling Mechanisms for 5G Multi-domain Environments
2021 IEEE 7th International Conference on Network Softwarization (NetSoft)
Network slicing is an essential 5G innovation whereby the network is partitioned into logical seg... more Network slicing is an essential 5G innovation whereby the network is partitioned into logical segments, so that Communication Service Providers (CSPs) can offer differentiated services for verticals and use cases. In many 5G use cases, network requirements vary over time and CSPs must dynamically adapt network slices to satisfy the contractual network slice QoS, cooperating and using each others’ resources, e.g. when resources of a single CSP are not sufficient or suitable to maintain all it’s current SLAs. While this need for dynamic cross-CSP cooperation is widely recognized, realization of this need is not yet possible due to gaps both in business processes and in technical capabilities.In this paper, we present a 5GZORRO approach to dynamic cross-CSP slice scaling. Our approach both enables CSPs to collaborate, providing security and trust with smart multi-party contracts, and facilitates thus achieved collaboration to enable resource sharing across multiple administrative domains, either during slice establishment or when already existing slice needs to expand or shrink. Our approach allows automating both business and technical processes involved in dynamic lifecycle management of cross-CSP network slices, following ETSI’s Zero-Touch Network and Service Management (ZSM) closed-loop architecture, and relying on resource-sharing Marketplace, Distributed Ledger (DL), and Operational Data Lake. We show how this approach is realized in truly Cloud Naive way, with Kubernetes as both business and technical cross-domain orchestrator. We then showcase applicability of the proposed solution for dynamic scaling of Content Delivery Network (CDN) service.

Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies, 2016
Hybrid switching combines a high-bandwidth optical circuit switch in parallel with a low-bandwidt... more Hybrid switching combines a high-bandwidth optical circuit switch in parallel with a low-bandwidth electronic packet switch. It presents an appealing solution for scaling datacenter architectures. Unfortunately, it does not fit many traffic patterns produced by typical datacenter applications, and in particular the skewed traffic patterns that involve highly intensive one-tomany and many-to-one communications. In this paper, we introduce composite-path switching by allowing for composite circuit/packet paths between the two switches. We show how this enables the datacenter network to deal with skewed traffic patterns, and offer a practical scheduling algorithm that can directly extend any hybrid-switching scheduling algorithm. Through extensive evaluations using modern datacenter workloads, we show how our solution outperforms two recently proposed state-of-the-art scheduling techniques, both in completion time and in circuit utilization.

Proceedings of the 3rd Workshop on Data Center - Converged and Virtual Ethernet Switching, Sep 9, 2011
Server virtualization has brought about tremendous value to a modern computing landscape and in p... more Server virtualization has brought about tremendous value to a modern computing landscape and in particular to data center and cloud infrastructures. Virtual server deployments have become ubiquitous in many development and production sites, in cloud infrastructures and in disaster recovery solutions. Network connectivity is a vital aspect of modern computing. In this work we explore the new requirements the server virtualization brings to the networking world and show how these requirements are different from those of a physical server connectivity. We then describe the Distributed Overlay Virtual Ethernet (DOVE) network architecture for building virtual networks infrastructure. We show how host-based overlays answer the novel networking requirements and describe a working example of their usage for network virtualization. We discuss the benefits and the drawbacks of the method, outline options for its efficient implementation, and show what additional work is required in order to base DCN virtualization on overlays.

CogNETive
Proceedings of the 10th ACM International Systems and Storage Conference, 2017
Operating a cloud-scale service is a huge challenge. There are millions of users worldwide and mi... more Operating a cloud-scale service is a huge challenge. There are millions of users worldwide and millions of requests per seconds. For example, Amazon's Simple Storage Service (S3) in 2013 contained two trillion objects and its logs contained 1.1 million log lines per second, which are approximately 10 PB of log records per year (see [1]). Cloud scale implies thousands of servers and network elements, and hundreds of services from multiple cross-regional data centers. Cloud service operation data is scattered over various types of semi-structured and unstructured logs (e.g., application, error, debug), telemetry and network data, as well as customer service records. It is therefore extremely difficult for the multiple owners and administrators in such systems, coming from different units of the organization, to follow the possible paths and system alternatives in order to detect problems, solve issues and understand the service operation.
Uploads
Papers by Katherine Barabash