File Sharing
How remote file sharing works at the UIUC Learning and Language Lab
Introduction
The file-server
The lab purchased and setup a single server in 2018. It is a System76 Silverback WS (version silw3) running Ubuntu 18.04. Technical details can be found at https://system76.com/guides/silw3. It is located on the 4th floor in the communications room at the UIUC Psychology building. The machine's IP address is static, and is 130.126.181.53.
The file server has 1 256GB hard disk drive that powers the operating system, and 4 additional 3.7TB hard disk drives which are used in a RAID array and where the lab's files are stored and shared.
To view all available drives and partitions, login to the file-server, and run:
$ lsblkBasics
Sharing files between computers, requires a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user’s computer while the data is being processed and is then returned to the server.
Samba
Samba was installed on the file-server to enable remote access to the file-servers disk space. Samba is a client/server-based application which allows accessing folders located on remote machines via a network connection. Samba uses the smb protocol, which allows file sharing across operating systems. Samba ships with a set of tools, including a configuration file, where the lab manager, or lab administrator can configure what is shared, where, by and for whom. It is located at:
/etc/samba/smb.confand requires administrator credentials to modify. To use the samba software (to add or remove users or drives, etc.), a person has to login to the file-server via ssh (e.g. ssh [email protected]). Only users with administrator credentials can modify the samba configuration file.
Shared Drives
Samba can be configured to expose multiple folders on the file-server as "shares" (or "drives"). Currently, three shares are in use:
ra_data
All RAs have full read & write access to this share. It is accessible from the Macs in the lab by clicking on the IP address on Finder (or CTL +K). It is recommended not to work directly in this folder, and only upload files after completing your work (and downloading them before beginning your work). Please be very careful not to delete any files.
archival_data
This share is for archival purposes, which are not directly related to research, or to work performed by RAs. For example, IRB documents, experimental stimuli and any data associated with publications. This drive can be used to increase longevity of your data by providing a secondary secure location, in addition to cloud storage.
ludwig_data
This is where Ludwig stores data. When used correctly, there is no need to manually create folders or files - Ludwig automagically creates a folder for each research project. RAs do not have access, and anyone with superra credentials has administrative privileges, meaning full read & write access.
Off-Campus Access
The server is connected to the university network. This means that it is behind a firewall and cannot be accessed by the general public not connected to the university network. To access off-campus, please follow the instructions provided by the University for accessing the network off-campus. This entails downloading and installing a VPN client. Users wishing to access the server from off-campus must first connect to campus using the VPN client.
Connecting to a Drive (Linux)
Auto-mounting on startup
To auto-mount a shared drive on your machine at startup, first:
You will need to specify the IP of the server, the name of the share, and your samba credentials. For example, the user ph would mount the ludwig_data share by adding the following line:
The username and password depend on the level of access required. The credentials are specific to samba and need not match those to remotely login to the machine (e.g. via SSH). More information about this can be found in the next section.
Next, make the directoy where the drive will be mounted, and then mount it:
Connecting to a Drive (MacOS)
To connect to, for example, ludwig_data, open Finder, select Go > Connect To, and enter smb://130.126.181.53/ludwig_data
Users and Credentials
To access a shared drive, a person must have credentials to login to the shared drive. These credentials are not the credentials one might use to login to the actual file server. These credentials are created using the samba software. New samba users must meet with administrators to receive credentials to access the shared drive. The administer will decide how to proceed next; there are two options:
create new credentials (not recommended)
provide the new user with existing credentials (recommended)
It should be very rare that new user credentials needs to be created: New graduate students should be given the superra credentials, and new RAs should be given the ra credentials. Only if special permissions are required, is creating a new samba user recommended. To create a new samba user, we first have to create a user on the file-server (samba uses already existing UNIX authentication). To add a user to the file-server:
Then, add a password:
A new user on the file-server should never be provided administrative privileges. This should be done only when a new lab member is taking over administrative duties. To list all samba users:
In the case that a new samba user was created, in order to access the shared drives, additional steps must be taken: Open the samba configuration file,
Next, add the username to the configurations matching those drives the user is requesting access to. For example, to provide a hypothetical user called newuser access to the share ra_data:
After modifying the configuration file, it is necessary to restart the samba daemon:
After removing a UNIX user, make sure to remove the corresponding samba user. This is important because if a UNIX user with the same name as an existing samba user were to be re-created, the samba user would be invalid. In such a case, the samba user must be deleted, and then re-added.
To remove a samba user:
Advanced
RAID
Basics
RAID stands for either Redundant Array of Independent Disks. The intention of RAID is to spread data across several disks, such that a single disk failure will not lose that data. Some forms of RAID store multiple copies of the data, so if you lose a disk, you have an identical copy elsewhere. This facility is sometimes used for backups - remove one of the disks from the array and store it safely, replacing it with another disk. Because storing multiple copies can be very wasteful of space, other forms of raid store parity along with the data, so that if a drive fails, the contents of that drive can be calculated from the other drives.
Array creation
A software-based Linux RAID-10 was created in October 2019 using the command-line tool mdadm. For information about the tool, check out its man page :
The primary benefit of RAID-10 is data redundancy and high performance, at the expense of large amounts of disk space. The array was setup with the following command:
The resulting array includes 4 3.7TB disks, with 2 copies. To ensure the array is automatically assembled and mounted each boot, the following operations were performed:
an
ext4filesystem was created on the arraya mount point at
/mnt/md0was created./etc/mdadm/mdadm.confwas updatedthe initial RAM file system was updated to that the array will be available during the early boot process
an entry for the array was added to
/etc/fstab
Warning
Altogether, there are 5 physical drives attached to the server, 4 of which are part of the array, and one which contains the boot and swap partition. If the latter fails, the RAID does not offer any protection.
Check free space
To check how much free space is available on each disk, run:
You should see something like the following:
The RAID array /dev/md0 has 7.3TB of storage space. Why? There are 4 physical drives, each with 3.7TB of storage. Because the array uses two copies, the total amount of space is 3.7 x 2 = 7.3.
Email Alerts
When mdadm detects something unusual (e.g. drive failure), it sends an email alert to [email protected]. Additional email address may be added by editing /etc/mdadm/mdadm.conf
The sender email address is [email protected], and can be changed by editing /etc/msmtprc
Disk Failure
In general you should remember that when a disk fails you need to “remove” it from the array, shutdown the machine, replace the failing drive with a replacement and then “add” the new drive to the array after you have created the appropriate disk layout (partitions) on it if necessary. For example, to remove a disk called /dev/sda1 from an array called /dev/md0:
Then, add a disk to an existing array:
Once that’s done, make sure the array is rebuilding and watch the progress with:
General information about the array can be viewed, by running:
Backing up Data
No automatic backup plan exists. In case, a physical disk part of the RAID array fails, a copy of its data exists on another drive. However, RAID is not a substitute for backing up data. It is each lab members own responsibility to periodically backup their data to the cloud (e.g. Google Drive storage space is provided to each member of UIUC for free).
While setting up automatic backups on the server would make life a lot easier, the author (Philip Huebner) does not feel sufficiently confident in being able to implement such a solution.
Ludwig
The file-server is not only where files are shared between members of the lab, but is a place where the eight "computational" machines (also physically located in the 4th floor IT room of the UIUC Psychology building) save files so that they can be accessed without having to login to each of the "computational" machines directly.
Lab members interested in submitting computationally expensive Python jobs to the "computational" machines, should consult the Github homepage of Ludwig, a program written by the author to facilitate job submission and interface with the "computational" machines. Each machine, has access to the file-server.
Last updated
Was this helpful?