File Sharing

How remote file sharing works at the UIUC Learning and Language Lab

Introduction

The file-server

The lab purchased and setup a single server in 2018. It is a System76 Silverback WS (version silw3) running Ubuntu 18.04. Technical details can be found at https://system76.com/guides/silw3arrow-up-right. It is located on the 4th floor in the communications room at the UIUC Psychology building. The machine's IP address is static, and is 130.126.181.53.

The file server has 1 256GB hard disk drive that powers the operating system, and 4 additional 3.7TB hard disk drives which are used in a RAID array and where the lab's files are stored and shared.

To view all available drives and partitions, login to the file-server, and run:

$ lsblk

Basics

Sharing files between computers, requires a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user’s computer while the data is being processed and is then returned to the server.

Samba

Samba was installed on the file-server to enable remote access to the file-servers disk space. Samba is a client/server-based application which allows accessing folders located on remote machines via a network connection. Samba uses the smb protocol, which allows file sharing across operating systems. Samba ships with a set of tools, including a configuration file, where the lab manager, or lab administrator can configure what is shared, where, by and for whom. It is located at:

/etc/samba/smb.conf

and requires administrator credentials to modify. To use the samba software (to add or remove users or drives, etc.), a person has to login to the file-server via ssh (e.g. ssh [email protected]). Only users with administrator credentials can modify the samba configuration file.

Shared Drives

Samba can be configured to expose multiple folders on the file-server as "shares" (or "drives"). Currently, three shares are in use:

ra_data

All RAs have full read & write access to this share. It is accessible from the Macs in the lab by clicking on the IP address on Finder (or CTL +K). It is recommended not to work directly in this folder, and only upload files after completing your work (and downloading them before beginning your work). Please be very careful not to delete any files.

archival_data

This share is for archival purposes, which are not directly related to research, or to work performed by RAs. For example, IRB documents, experimental stimuli and any data associated with publications. This drive can be used to increase longevity of your data by providing a secondary secure location, in addition to cloud storage.

ludwig_data

This is where Ludwig stores data. When used correctly, there is no need to manually create folders or files - Ludwig automagically creates a folder for each research project. RAs do not have access, and anyone with superra credentials has administrative privileges, meaning full read & write access.

Off-Campus Access

The server is connected to the university network. This means that it is behind a firewall and cannot be accessed by the general public not connected to the university network. To access off-campus, please follow the instructions provided by the University for accessing the network off-campus. This entails downloading and installing a VPN client. Users wishing to access the server from off-campus must first connect to campus using the VPN client.

Connecting to a Drive (Linux)

Auto-mounting on startup

To auto-mount a shared drive on your machine at startup, first:

You will need to specify the IP of the server, the name of the share, and your samba credentials. For example, the user ph would mount the ludwig_data share by adding the following line:

The username and password depend on the level of access required. The credentials are specific to samba and need not match those to remotely login to the machine (e.g. via SSH). More information about this can be found in the next section.

circle-info

Make sure to add uid=LOCAL_USERNAME as this ensures that your local user inherits the correct permissions to read and write data on the shared drive.

Next, make the directoy where the drive will be mounted, and then mount it:

Connecting to a Drive (MacOS)

To connect to, for example, ludwig_data, open Finder, select Go > Connect To, and enter smb://130.126.181.53/ludwig_data

Users and Credentials

To access a shared drive, a person must have credentials to login to the shared drive. These credentials are not the credentials one might use to login to the actual file server. These credentials are created using the samba software. New samba users must meet with administrators to receive credentials to access the shared drive. The administer will decide how to proceed next; there are two options:

  • create new credentials (not recommended)

  • provide the new user with existing credentials (recommended)

It should be very rare that new user credentials needs to be created: New graduate students should be given the superra credentials, and new RAs should be given the ra credentials. Only if special permissions are required, is creating a new samba user recommended. To create a new samba user, we first have to create a user on the file-server (samba uses already existing UNIX authentication). To add a user to the file-server:

Then, add a password:

A new user on the file-server should never be provided administrative privileges. This should be done only when a new lab member is taking over administrative duties. To list all samba users:

In the case that a new samba user was created, in order to access the shared drives, additional steps must be taken: Open the samba configuration file,

Next, add the username to the configurations matching those drives the user is requesting access to. For example, to provide a hypothetical user called newuser access to the share ra_data:

After modifying the configuration file, it is necessary to restart the samba daemon:

triangle-exclamation

To remove a samba user:

Advanced

RAID

Basics

RAID stands for either Redundant Array of Independent Disks. The intention of RAID is to spread data across several disks, such that a single disk failure will not lose that data. Some forms of RAID store multiple copies of the data, so if you lose a disk, you have an identical copy elsewhere. This facility is sometimes used for backups - remove one of the disks from the array and store it safely, replacing it with another disk. Because storing multiple copies can be very wasteful of space, other forms of raid store parity along with the data, so that if a drive fails, the contents of that drive can be calculated from the other drives.

Array creation

A software-based Linux RAID-10 was created in October 2019 using the command-line tool mdadm. For information about the tool, check out its man page :

The primary benefit of RAID-10 is data redundancy and high performance, at the expense of large amounts of disk space. The array was setup with the following command:

The resulting array includes 4 3.7TB disks, with 2 copies. To ensure the array is automatically assembled and mounted each boot, the following operations were performed:

  • an ext4 filesystem was created on the array

  • a mount point at /mnt/md0 was created.

  • /etc/mdadm/mdadm.conf was updated

  • the initial RAM file system was updated to that the array will be available during the early boot process

  • an entry for the array was added to /etc/fstab

Warning

Altogether, there are 5 physical drives attached to the server, 4 of which are part of the array, and one which contains the boot and swap partition. If the latter fails, the RAID does not offer any protection.

Check free space

To check how much free space is available on each disk, run:

You should see something like the following:

The RAID array /dev/md0 has 7.3TB of storage space. Why? There are 4 physical drives, each with 3.7TB of storage. Because the array uses two copies, the total amount of space is 3.7 x 2 = 7.3.

Email Alerts

When mdadm detects something unusual (e.g. drive failure), it sends an email alert to [email protected]. Additional email address may be added by editing /etc/mdadm/mdadm.conf

The sender email address is [email protected], and can be changed by editing /etc/msmtprc

Disk Failure

In general you should remember that when a disk fails you need to “remove” it from the array, shutdown the machine, replace the failing drive with a replacement and then “add” the new drive to the array after you have created the appropriate disk layout (partitions) on it if necessary. For example, to remove a disk called /dev/sda1 from an array called /dev/md0:

Then, add a disk to an existing array:

Once that’s done, make sure the array is rebuilding and watch the progress with:

General information about the array can be viewed, by running:

Backing up Data

No automatic backup plan exists. In case, a physical disk part of the RAID array fails, a copy of its data exists on another drive. However, RAID is not a substitute for backing up data. It is each lab members own responsibility to periodically backup their data to the cloud (e.g. Google Drive storage space is provided to each member of UIUC for free).

While setting up automatic backups on the server would make life a lot easier, the author (Philip Huebner) does not feel sufficiently confident in being able to implement such a solution.

Ludwig

The file-server is not only where files are shared between members of the lab, but is a place where the eight "computational" machines (also physically located in the 4th floor IT room of the UIUC Psychology building) save files so that they can be accessed without having to login to each of the "computational" machines directly.

Lab members interested in submitting computationally expensive Python jobs to the "computational" machines, should consult the Github homepage of Ludwig, a program written by the author to facilitate job submission and interface with the "computational" machines. Each machine, has access to the file-server.

Last updated

Was this helpful?