Available Storage

This page contains instructions about accessing the storage available in the Open Cirrus site at Intel Labs.


Local Disks

If you are running Maui/Torque jobs, /x and /y are local disks (in the case of physical machines) or are backed by local disks (in the case of virtual machines). The space is limited in these locations, but the behavior/performance should be similar to a real disk.

We have avoided using hardware RAID in most places in the cluster which means that any data you put on a local disk is only as reliable and durable as the disk itself. If your results are important to you, you should either move them to one of the bdstore machines, GlusterFS, or HDFS.


Archival Storage - NFS

There are 5 NFS servers that are for general use, bdstore01-bdstore05. These servers have 7.3TB of usable storage each, but they are gradually filling up. These servers and their RAID arrays are maintained by members of the SRE group, so can be considered reasonably reliable storage. However, there are only five servers which results in pretty low bandwidth if everyone is using them.

For historical reasons, there are two exported targets, bdstore0n:/x and bdstore0n:/y. They both share the same 7.3TB. If you are using Maui/Torque, these are available at /mnt/bdstore0n/x and /mnt/bdstore0n/y. In VMs, you need to mount them yourself. You may need to install nfs-common to mount NFS shares.


Archival Storage - GlusterFS

We are currently experimenting with using GlusterFS to expose all five bdstore nodes as one target. If you are interested in using it, please contact one of the members of the SRE group (Michael or Richard).


DFS

HDFS is the primary data store in the cluster. Although the access mechanism is different from that of other distributed file systems, it exposes all the capacity and throughput of the entire cluster (~300TB and 480 spindles).

The current version of Hadoop (and HDFS) on the cluster is 0.18.0. You can download the package for Hadoop and a JVM and access it via the namenode at hdfs://hadoop-master:8547/.

However, you can get read-only access to HDFS via a VM in the cluster called hdfs-access. Either point a browser at http://hdfs-access/hdfs/ or use NFS to mount hdfs-access:/hdfs. Additionally, a configured and working Haddop 0.18.0 tarball is available at http://hdfs-access/hadoop.tar.gz. You can install this package, along with a JVM, in a VM and then access HDFS.

In addition to the programmatic APIs for HDFS, you can use the command line utilities to upload and download files:

hadoop@r1r4u13:/usr/local/hadoop/hadoop$ ./bin/hadoop dfs -ls /
Found 17 items
drwxr-xr-x   - hadoop supergroup          0 2009-07-18 09:47 /b
drwxr-xr-x   - hadoop supergroup          0 2009-08-27 11:56 /backup
drwxr-xr-x   - hadoop supergroup          0 2009-07-09 14:28 /c
drwxrwxrwx   - hadoop supergroup          0 2009-09-02 16:04 /caremedia
drwxrwxrwx   - hadoop supergroup          0 2007-12-12 10:46 /examples
drwxrwxrwx   - hadoop supergroup          0 2009-06-25 18:34 /homes
drwxrwxrwx   - hadoop supergroup          0 2008-11-20 09:51 /lost+found
drwxrwxrwx   - hadoop supergroup          0 2008-11-17 15:06 /mnt
drwxrwxrwx   - hadoop supergroup          0 2007-12-12 10:46 /oc2
drwxr-xr-x   - hadoop supergroup          0 2009-08-10 14:01 /shared
drwxr-xr-x   - hadoop supergroup          0 2009-06-15 14:16 /system
drwxrwxrwx   - hadoop supergroup          0 2009-07-15 11:38 /tmp
drwxrwxrwx   - hadoop supergroup          0 2007-12-12 10:46 /ueer
drwxrwxrwx   - hadoop supergroup          0 2009-08-19 16:31 /user
drwxrwxrwx   - hadoop supergroup          0 2007-12-12 10:46 /userrchen
drwxrwxrwx   - hadoop supergroup          0 2008-12-04 17:58 /x
drwxrwxrwx   - hadoop supergroup          0 2009-06-15 10:49 /y
hadoop@r1r4u13:/usr/local/hadoop/hadoop$ ./bin/hadoop dfs -copyFromLocal
Usage: java FsShell [-copyFromLocal <localsrc> ... <dst>]
hadoop@r1r4u13:/usr/local/hadoop/hadoop$ ./bin/hadoop dfs -copyToLocal
Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
hadoop@r1r4u13:/usr/local/hadoop/hadoop$ ./bin/hadoop dfs -help

Obviously, one of the most expedient ways to use HDFS is with Hadoop.