0% found this document useful (0 votes)
17 views10 pages

Nfs As LSF

This document summarizes the concept and methodology of NFS tracing and analyzes NFS trace data to draw conclusions about NFS usage. It discusses log-structured file systems, provides an overview of the NFS protocol including remote procedure calls, and describes how NFS traffic is intercepted and traces are collected and anonymized. It then summarizes specific trace data from two Harvard computer networks, including topology, workload characteristics, and details of the HOME02 and LAIR62 trace files.

Uploaded by

sidhujha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Nfs As LSF

This document summarizes the concept and methodology of NFS tracing and analyzes NFS trace data to draw conclusions about NFS usage. It discusses log-structured file systems, provides an overview of the NFS protocol including remote procedure calls, and describes how NFS traffic is intercepted and traces are collected and anonymized. It then summarizes specific trace data from two Harvard computer networks, including topology, workload characteristics, and details of the HOME02 and LAIR62 trace files.

Uploaded by

sidhujha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Analyzing NFS Traces

Tyler Wolf

12 May 2008

1 Introduction file systems, optimized for fast read access times,


are perhaps becoming obsolete in the process.
Distributed file storage is becoming an increas- The log-structured file system takes a new ap-
ingly widespread paradigm in computing. Once proach: optimize for fast write times and assume
reserved for large data centers and research en- that advances in caching technology will speed
vironments in academia, distributed storage is up read access times simultaneously [7].
now commonplace in businesses and is even per-
colating into the home.
Just as network storage is becoming more 3 The NFS Protocol
prevalent, so too is the leading protocol for inter-
3.1 Brief Overview of the Protocol
facing with network storage: Network File Sys-
tem. Whether due to its versatility or its con- Jointly developed by Sun and IBM, NFS was ini-
veniently simple name, NFS has become the de tially a stateless protocol1 based on the client-
facto standard in network file systems. Because server model. NFS was intended to be com-
NFS is so widely used, a great many people stand pletely machine – and platform – independent,
to benefit from any performance optimizations a requirement that resulted in the development
that can be found. It is in this light that we of the VFS layer in Unix. This platform indepen-
examine four NFS network traces. dence is perhaps one of the most important fac-
The goal of this paper twofold. It summarizes tors to the success of NFS. NFS is easily ported
the concept and methodology of NFS tracing, to almost any operating system, operates trans-
and it analyzes NFS trace data in an attempt parently to the client, and functions independent
to draw conclusions about NFS usage. In par- of the client’s platform [8].
ticular, the paper considers whether or not NFS
lends itself to being cast as a log-structured file 3.2 Remote Procedure Calls
system.
Rather than adhering to UNIX system calls, NFS
adopted the Sun Remote Procedure Call (RPC)
2 Log-Structured File Systems mechanism [8]. Each function that NFS can per-
form is assigned an RPC. As NFS is (for the most
The entire premise of log-structured file systems part) stateless, the RPC parameters contain all
is predicated on the concept that disk reads are of the information needed to execute the opera-
slow, memory accesses are fast, and reading a file tion. NFS network traffic, then, is simply a series
from disk into the cache enables fast subsequent of these RPCs between the server and the client.
reads [4]. 1
While the protocol itself was technically stateless,
With modern computer systems shipping with stateful locking was implemented outside the core of the
ever increasing amounts of built-in memory, the protocol [8]. As of version 4, NFS itself is no longer state-
cache size is constantly increasing, and modern less [5].

1
4 Tracing NFS Traffic fields with unpredictable, but unique, values.
Unlike a hash, where a given path will always
Tracing the traffic on NFS consists of two pro- hash to the same anonymized result, this nonde-
cesses: intercepting the packets, and distilling terministic process will give different results each
the trace from the network data. time it is run [2]. As such, traces from a known
system cannot be used to uncover anonymized
4.1 Network Monitoring information about another system. For example,
with a hash, an attacker could trace their own
Unlike classical file systems that only operate on system and examine the hashed result of a com-
the local machine, monitoring the file systems mon file like /etc/fstab. Given that, the at-
calls in NFS requires no kernel hooks, APIs, or tacker could find /etc/fstab in the anonymized
special software.2 All that is required is access trace, giving them a foothold into deanonymiz-
to the network. To intercept the NFS traffic, ing the trace. With the nondeterministic pro-
one need only configure a computer to listen to cess, this is not possible.
network traffic and attach that computer to a
mirrored port on the network that the NFS is
running on. The packets to and from the NFS
server will be mirrored to the tracing computer,
5 The Trace Data
which can then record that trace for later anal-
ysis [1]. Daniel Ellard et al. [2] used this method of pas-
The ease by which complete traces of NFS sive NFS trace collection to gather a series of
traffic can be taken highlights the glaring se- NFS traces of various systems within Harvard.
curity inadequacies in the protocol. Whereas a These traces are the source of all trace data an-
trace of FFS requires kernel cooperation, an NFS alyzed in this paper.
trace only requires a free port on a hub. Some
of the security concerns have been addressed in
5.1 The Networks
NFS version 4 [3]; these concerns, however, are
outside the scope of this paper. The general topologies of two of the three net-
works traced in the data employed by this paper
4.2 Trace-Collection Software are available. I have no information about the
third network (from the DEASNA traces).
Once the NFS traffic is being mirrored to a trac-
ing computer, the tracing computer has to ac-
tual take the trace. One such approach (the 5.1.1 CAMPUS
one employed to collect the traces analyzed in
this paper) is the nfsdump application, created The CAMPUS NFS system serves the majority
by Daniel Ellard et al. [2], which is a heavily of the school and administration at Harvard; it
modified version of the tcpdump utility. has over 10,000 active accounts. It consists of
Nfsdump is extremely adaptable. It can handle three NFS servers, each of which has several net-
any combination of NFSv2 or NFSv3, TCP or work ports and connects to fourteen disk arrays
UDP, Jumbo Frames, and Gigabit Ethernet [2]. (see Fig. 1). Each disk array contains 53 GB,
Anonymization capabilities are built into most of which is used as home directories for a
nfsdump. The tool anonymizes customizable subset of the students and faculty.
2
Analyzing and parsing this information does require
The workload on this system is predominantly
software; however, simply accessing the data is not as web services and email; we expect a fairly read-
difficult as it is in other file systems. heavy trace as a result.

2
CAMPUS

...
home01
home01 home02
home02 home03
home03 home04
home04 home14
home14

Figure 1: The topology of the CAMPUS network. The students and faculty connect to one
of three NFS servers, each of which is connected to all of the fourteen 53-GB disk arrays. Figure
drawn from the description given in [2].

5.1.2 EECS 5.2 The Traces


5.2.1 HOME02

The HOME02 trace was collected from one of


the fourteen disk arrays in the CAMPUS system.
The EECS NFS system provides the network The data contains 4,099,012,325 operations, col-
storage for the computer science department. Its lected between 9/1/2001 and 11/30/2001. Com-
workload is predominantly research; there is no pressed, it occupies 48 GB on disk.
mail on this system. The topology is simply a
single NetApp filer connected to the network. 5.2.2 LAIR62

The LAIR62 trace was collected from the Ne-


With a research workload we expect to see a tApp filer that is the sole NFS storage device on
much lower read-write ratio. the EECS system. LAIR62 contains 416,323,390

3
operations, collected over the same time period asynchronously. Timing inconsistencies are
as HOME02. It occupies 9.5 GB of disk space inevitable. However, this problem still com-
compressed. plicates analysis.

• Some of the size values for certain setattr


5.2.3 DEASNA
calls are recorded in the wrong-endian for-
The DEASNA trace was taken from the NFS mat. The only way to catch this error is
system at Harvard’s Department of Engineering by examining the values and noticing ab-
and Applied Sciences. This system’s workload normally large values. This bug has been
is a mixture of HOME02 and LAIR62: there fixed in the software since the release of the
is some web service, some email, and some re- traces, so it will not affect subsequent traces.
search. DEASNA has 1,746,890,523 operations
between 10/17/2002 and 11/22/2002. It’s com- • The terms “count” and “size” are used in-
pressed size is 26 GB. consistently throughout the various RPC
mappings.
5.2.4 DEASNA2 • Some RPC calls were lost. In the HOME02
The DEASNA2 trace was taken from the same trace, the network was running faster than
NFS system as DEASNA, simply at a later date. the maximum speed of the NIC on the trac-
DEASNA2 covers 1/29/2003 to 3/10/2003 and ing machine. As a result, during especially
contains 2,183,639,296 operations. It occupies bursty periods, some calls are simply lost.
35 GB on disk, compressed. This results in some calls with no returns,
and some returns that appear to have been
unasked for.
6 Issues
In the course of collection, parsing, and analyz-
6.2 Issues with Trace Analysis in Gen-
ing the traces, the following issues were encoun-
eral
tered. 6.2.1 Obtaining Traces
There are simply not very many publicly avail-
6.1 Issues with the Specific Traces
able traces. Those that do exist are quite large
6.1.1 Anonymization is a Hindrance and only available from a few sources. Those
sources that are offering traces are not always
Certain data fields in the traces were either omit-
reliable. Obtaining these traces involved sev-
ted or nondeterministically anonymized. I was
eral weeks of cumulative download time and re-
not able to conduct many of the experiments I
peated download resumption as the filer hosting
would have liked to because of insufficient data.
the data crashed. This experience highlights the
Examples will be presented in Section 7.
need for a consolidated repository of these traces,
preferably one that is reliable, highly-available,
6.1.2 The Traces Have Problems and with a great amount of bandwidth available.
There are several problems with the specific
traces and their format, many of which have al- 6.2.2 Analyses are Expensive
ready been documented by Ellard et al. [2]:
Running analyses on such a large data set is very
• Some RPC calls have timing inconsisten- expensive computationally. Searching a single
cies. Of course, this is a distributed sys- trace for a simple statistic, for instance the num-
tem, with several computers communicating ber of reads in HOME02, took upwards a day

4
of dedicated computation. Granted, my scripts (0) Initialize a “cache” C: a linked list of
were optimized for programming simplicity, not 4K filenames, initially empty.
efficiency. But the process is certainly time con- (1) Select the next data access A; if there
suming. On more than one occasion throughout are no more, return.
the project, I enqueued an analysis and returned
the next to find that my code had a bug and I has (2) Check if A is in C. If it is, record a
wasted an entire day of computation. Needless cache hit, remove A from C and rein-
to say, this project is not the best for someone sert it onto the back of C, and go back
that procrastinates. to (1); otherwise, record a cache miss
and continue.
(3) Check if C has free space. If it does,
7 Results add A to the back and go back to (1);
otherwise, continue.
In the following results, all graphs, unless other-
wise noted, span the entire duration of the trace. (4) Remove the first element from C, add
As general trends are more important in my dis- A to the back of C, and go back to (1).
cussion than specific data points, and due to the Clearly, this is not an actual cache simu-
sheer volume of data points, the time scale has lation. There is no thought to spatial lo-
been omitted. This is largely due to my inability cality at all (since I do not have access
to convince my graphing program to only label to this information), only temporal local-
selected points on the x axis. I intended to use ity. Also, evicting files from the cache has
a more customizable graphing solution, but had nothing to do with where the data is ad-
to give up because of time constraints. dressed to. That is to say, the position in the
Besides general access trends, I focused on two cache is unrelated to the data. Nonetheless,
major components: I thought that this would at least provide
some indication about the caching effects we
• Read-Write Ratio. My original goal was
might see from temporal locality.
to examine the feasibility of NFS as a log-
structured file system. As such, I was look-
ing for how much more prevalent reads are 7.1 HOME02
than writes. The HOME02 had approximately 33.61% reads
and 10.47% writes (see Table 1). Its access pat-
• Cache Effects. This is the major area
tern was very cyclic, predictable, and stable (see
where anonymization impacted my work.
Fig. 2). There is very little variation to speak of;
In addition to not knowing directory struc-
the only major dips occur on weekends.
tures, I also could not determine the size
The most noticeable anomaly in this data
of the caches on any of the NFS hardware.
(when viewed at a much higher resolution) is a
However, examining cache effects was cen-
sudden jump in NFS activity on September 11,
tral to my study, and so I decided that a
2001. Not only did the total number of oper-
cursory, rough statistic was better than no
ations per hour increase dramatically, but the
statistic at all.
peaks became wider, meaning that NFS activ-
In an attempt to discern approximately how ity became more sustained throughout the day.
often NFS accesses deal with data that is From the week before, that network saw a 75%
potentially cached, I executed the following increase in operations. Upon cross-checking this
LRU algorithm on a week-long period from date with Harvard’s academic calendar, I con-
each of the traces: clude that this phenomenon is probably due to

5
Figure 2: The access pattern for the duration of the HOME02 trace. There seems to be very little
variation at all, save for a handful of random peaks.

the fact that new-student orientation began on they are the same machines, simply one a month
September 10. after the other.

DEASNA had approximately 18.58% reads


7.2 LAIR62 and 5.88% writes. This is, as we expected,
somewhere in the middle. DEASNA2 had about
The LAIR62 trace had approximately 6.77%
24.64% reads and 7.91% writes, again in the
reads and 6.83% writes, the only trace in which
middle (see Table 1).
writes outnumber reads (see Table 1). Un-
like HOME02, LAIR62 is extremely erratic (see However, these traces are significantly less
Fig. 3). There is no discernible pattern to speak similar than one might expect, considering that
of, only random spikes in activity. This makes it is the same set of computers (see Fig. 4).
sense: research is not a constant, unwavering ac- DEASNA2 is closer to HOME02 than DEASNA
tivity. It is characterized by occasional bursts of is: it has more operations, it has relatively more
heavy computation, whereas email is much more reads and writes, and the graph of its opera-
predictable and constant. tions (Fig. 4) is closer to the graph of HOME02’s
It is also worth noting that over 85% of the operations (Fig. 2), in that the peaks are much
operations are other than read and write; this more uniform. This might be a result of a
workload is dominated by metadata updates [2]. change in how the network is being utilized. The
DEASNA NFS installation is a mixture of web
7.3 DEASNA and DEASNA2 and email services and research workloads. It is
conceivable that the change in the data is ev-
We expect the DEASNA and DEASNA2 traces idence of the DEASNA network’s workload for
to resemble a sort of mixture between the the later time period having a relatively lighter
HOME02 and LAIR62 extremes. Moreover, we research workload and increased web and email
expect them to be very similar to one another, as service than it had in the first trace.

6
Figure 3: The access pattern for the duration of the LAIR62 trace. This trace is extremely sporadic,
with a few periods of bursty activity.

Table 1: Fraction of Operations as Reads and Writes, by Trace.

% Reads % Writes Total Ops


HOME02 33.61 10.47 4,099,012,325
LAIR62 6.77 6.83 416,323,390
DEASNA 18.58 5.88 1,746,890,523
DEASNA2 24.64 7.91 2,183,639,296

7
Deasna Ops per Hour
9,000,000

6,750,000

4,500,000

2,250,000

Figure 4: The access patterns for the DEASNA traces. They seem like a mixture of the other two
traces, but are fairly dissimilar to one another.

7.4 Read-Write Ratios an extension of the techniques employed before


([2], [6], [4]) on new trace data, specifically to
Of the four traces, only one (LAIR62) had
compare NFS traffic over time.
a Read-Write ratio below one (meaning more
To my knowledge, there has never before been
writes), and then only barely. The other traces
any work comparing NFS caching effects with
were very clearly read dominated (see Fig. 5).
respect to log-structured file systems.

7.5 Cache Test


The results for the cache test are listing in Ta-
9 Conclusions
ble 2.
I have been able to draw a few conclusion from
my work with the Harvard traces:
8 Related Work
• As has been posited before [2], [6], [4], the
My work is based upon the work done by El- workload on the file system dramatically im-
lard et. al. The traces used were all col- pacts the access patterns and overall char-
lected by Ellard and his team. The charac- acteristics of the activity on the file system.
terization of HOME02 and LAIR62, specifically
the differences in workloads, was explored pre- • As is shown by the DEASNA and
viously [2]. DEASNA and DEASNA2, however, DEASNA2 traces, access patterns on a
were not included in this study. The compari- given system are by no means constant. A
son of DEASNA and DEASNA2 can be seen as difference of only a month had a significant

8
Read-Write Ratios Over an Average Week
60

45

30

15

0
Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Figure 5: The read-write ratios for the four traces over their durations. The solid line near the x
axis shows the read=write baseline.

Table 2: Cache Hits and Misses

Hits Misses %Hits % Misses


HOME02 7,559,271 547,602 93 7
LAIR62 1,971,714 520,896 79 21
DEASNA 5,029,381 1,293,373 80 20
DEASNA2 5,305,877 1,043,795 84 16

9
impact on DEASNA2’s statistics and over- [4] J. K. Ousterhout, H. D. Costa, D. Harrison,
all access pattern. This should not be a new J. A. Kunze, M. D. Kupfer, and J. G. Thomp-
conclusion: humans are not constant crea- son. A Trace-Driven Analysis of the UNIX
tures, and human usage is largely what de- 4.2 BSD File System. In Proceedings of the
termines the access pattern. 10th Symposium on Operating System Prin-
ciples, pages 15–24, December 1985.
• Regardless of use case or read-write ratio,
all the NFS traces I looked at exhibit a high [5] B. Pawlowski, S. Shepler, C. Beame,
degree of temporal locality. This suggests B. Callaghan, M. Eisler, D. Noveck,
that server-side cache effects may play a sig- D. Robinson, and R. Thurlow. The NFS
nificant role, perhaps enough to overshadow Version 4 Protocol. In Proceedings of the
reads. Second International System Administration
and Networking SANE Conference, Maas-
tricht, The Netherlands, May 2000.
10 Future Work
[6] D. Roselli, J. R. Lorch, and T. E. Ander-
The results of the caching test suggest that son. A Comparison of File System Work-
server-side caching may actually be a large factor loads. pages 41–54.
in NFS performance. If this actually turns out
[7] M. Rosenblum and J. K. Ousterhout.
to be the case, then the hypothesis that NFS is
The Design and Implementation of a Log-
well-suited to a log-structured file system may
Structured File System. ACM Transactions
actually be validated. A more thorough study of
on Computer Systems, 10(1):26–52, 1992.
these caching effects would include file sizes, off-
sets, legitimate cache collisions, etc. With some [8] R. Sandberg, D. Goldberg, S. Kleiman,
rough estimates of NFS cache sizes, and exami- D. Walsh, and B. Lyon. Design and Imple-
nation of the concrete structure of the underlying mentation of the Sun Network Filesystem. In
file system, more can be learned in this area. Proc. Summer 1985 USENIX Conf., pages
119–130, Portland OR (USA), 1985.

References
[1] M. Blaze. NFS Tracing by Passive Network
Monitoring. In Proceedings of the USENIX
Winter 1992 Technical Conference, pages
333–343, San Fransisco, CA, USA, 20–24
1992.

[2] D. Ellard, J. Ledlie, P. Malkani, and


M. Seltzer. Passive NFS Tracing of Email
and Research Workloads. In Proceedings of
the Second USENIX Conference on File and
Storage Technologies (FAST’03), pages 203–
216, San Francisco, CA, March 2003.

[3] C. Odhner. Security in NFS Storage Net-


works. Technical Report 3387, Network Ap-
pliance, February 2005.

10

You might also like