Hardware
Our cluster consists of 24 machines, including one storage and
management server, 20 Dell R220 servers, and 3 Dell R730 servers. They
are connected by a Dell N4000 10Gb ethernet switch. The configurations
are as follows:
- 1 Management node:
- Hardware: 4-core, 16GB of memory, two 3TB disks as a mirror (RAID 1), and 1Gb ethernet
- Name: node-ops.cse.ohio-state.edu from outside, headnode from inside
- Usage: this node serves as a proxy node between the
department's network and our internal network. Any user should log into
this node first and then log into other nodes (R220s and R730s). This
node also serves as a storage server for all source code and papers.
- Note: DO NOT run experiments on this node.
- 20 Dell R220 servers:
- Hardware: 4-core, 16GB of memory, two disks, and 1Gb ethernet
- Name: node220-1 to node220-20
- Usage: for normal experiments
- Note: DO NOT store important data on these nodes.
- 3 Dell R730 servers:
- Hardware: 16-core, 64GB of memory, 8 disks, one SSD, and 10Gb ethernet
- Name: node730-1 to node730-3
- Usage: for experiments that require high speed network or SSD
- Note: DO NOT store important data on these nodes.
- 1 Dell N4000 Switch
- Hardware: 48 10Gb ports. Enough bandwidth even if all connected machines are running at 10Gb/sec.
Usage
If you are interested in becoming a user of this cluster, please send
me an email with your preferred username and your Google account. I will create the user for
you and email you a temporary password. Then you can first "ssh
node-ops.cse.ohio-state.edu" with your username and password. After
that you can ssh to any R220 or R730 nodes without passwords.
- Choose a strong password. Machines with weak passwords do get hacked.
- Your home directory is shared among all nodes by NFS. You can put
your source code there. However, the shared NFS directory is usually
slow, so if your experiment needs storage, you'd better use local disks.
- You are granted sudo priviledges on all R220s and R730s. Please
be very careful when operating with sudo. You are not granted sudo on
the management node. If you really need to do something special on the
management node, please email me.
- If you need to install a software, please first try to install it in your own home directory without
sudo. This will minimize its impact on other users. Sometimes it is not
possible and if you have to install a software with sudo, please 1)
install it on all machines and 2) document it at /share/install_history so that we can
keep track of it.
- Similarly, you are allowed to modify kernels or change OS parameters with sudo, but do document it
at /share/install_history. If you don't need those modifications any more, please change them back.
- Please reserve your machines at Google calendar (I will add your account
to that calendar). Leave your name and phone
number for emergency contact. No mechanism is enforcing you to follow
your reservations, but processes without reservation can be killed
without notifying.
Resources
- Your home directory in /users is shared among all machines.
There is no quote on your home directory, so do delete unnecessary
files.
- The /share directory is also shared among all
machines. This directory is supposed to store files that are valuable
to the whole group. Examples include trace files, benchmarks, papers,
etc. DO NOT store anything without proper copyrights.
- An
svn repo is created. The root directory is
svn+ssh://node-ops.cse.ohio-state.edu/projects/svn/repos. All users
have read access. Please email me if you need write access.
- Your home directory and /share directory is stored on a mirrored (RAID 1) disk on the management node. Do backup your important files (source codes, papers, etc) at other places in case the management node is destroyed, but be careful don't make those files public.