Running Jobs |
Setting permissions |
Software |
"Trick" for Running MCNP in the background |
Changing my Password |
PuTTY SSH
server name |
server type |
processor specifications |
operating system |
RAM |
grove |
Dell PowerEdge 2950 (master) |
(2) Intel Xeon 5345 (Quad-Core 2.33GHz) |
RedHat Enterprise Linux AS 4 |
8GB |
|
(2) Intel Xeon 5345 (Quad-Core 2.33GHz) each 80 processor equivalent for running jobs |
RedHat Enterprise Linux WS 4 |
4GB |
|
elm |
Sun V20z |
(2) AMD Opteron 250 (2.4 GHz) |
Solaris 10 |
4GB |
All jobs on our Linux cluster must be submitted to a queue. The following queues are available:
queue name |
priority |
nice |
availability |
hosts |
limitations |
normal |
30 |
20 |
all users |
all |
none |
short |
35 |
20 |
all users |
all |
15 minutes |
Queues are managed by Platform Lava. The full guide is available for viewing here. Quick Start instructions (also see "man bsub"):
| bsub my_job | submit a job to the "normal" queue; my_job is an executable |
| bsub -n 4 my_job | submits my_job as a parallel job to start when 4 processors are available |
| bkill jobID | kills job "jobID" |
| bjobs | reports the status of Lava jobs |
To submit a job:
bsub -n <num_processors> mpirun -np <num_processors> <MPI_JOB> <ARGUMENTS>
MCNP executables are mcnp5.mpi and mcnp5 in /usr/local/bin. If you wish to run MCNP on multiple processors, you must use the mcnp5.mpi executable and submit your job using MPI. To submit a MCNP job to the queue:
bsub -n <num_processors> mpirun -np <num_processors> /usr/local/bin/mcnp5.mpi i=input eol
/scratch space is available on the cluster; "cd /scratch" and "mkdir username" (substitute your username) to make a directory to place your executables. It is recommended that you run your jobs from /scratch if possible. Remember that /scratch is not backed up so copy your important results back to your home space (/emchome/username).
Commands you may need to use if you run jobs that do not completely finish:
| ipcs | provide information on ipc facilities (inter-process ommunications); used to determine if a user has a job maintained in system memory |
| ipcrm | remove a message queue, semaphore ste or shared memory id; used to remove ipcs |
| cluster-fork | run the following command on all nodes; i.e., cluster-fork ipcrm |
If you see an error message such as p4_error: alloc_p4_msg failed: 0, you have requested more memory than the node(s) can handle. Reduce your job size.
The following link allows you to download PuTTY.
To install PuTTY SSH to your computer's start menu and desktop, download "A Windows Installer for everything except PuTTYtel", the file
putty-0.60-installer.exe.
If you just want a copy of the executable file to run the program, download from "For Windows 95, 98, ME, NT, 2000, XP, and Vista on Intel x86",
the file putty.exe.

1. Open PuTTY and enter the Hostname as above, click 'Open'.

2. If prompted, click 'Yes' to add the key to PuTTY's cache.

3. Type your NE username, and your NE password at the PuTTY prompt that appears.
elm runs Solaris 10 which allows the multi-threading support necessary to obtain peak performance on parallel Sparc machines. Elm contains 2 2.4 GHz AMD Opteron processors. Elm operates in a shared memory configuration. The compilers now offer options for automatic parallelization of source code as well as direct access to the multi-thread libraries for serious hand-coding of parallel implementations. See the man pages on f90, f77, and cc for more information.
Jobs that are disk I/O intensive should not be run directly from your account. This will cause the system to write the data across the network instead of taking advantage of the fast SCSI system local to the compute server (a difference of 1.25Mb/s vs. 10+Mb/s!). So, any job that will write a large amount of output and/or temporary files (greater than 10Mb) should be run on the local disk. A scratch space has been set up for exactly this purpose. To do this, simply transfer your input file to the temporary space:
% cd /scratch/elm
% mkdir your-login
% cd your-login
% cp $HOME/input .
% command < input > output &
Submitting the job in this manner will write the output of the job to /scratch/elm/your-login, taking advantage of the higher transfer rates.
The /scratch disk space has a capacity of approximately 13GB on elm and is temporary file space -- files left here will not be backed-up and, if not accessed for long periods of time, are subject to removal. When you have completed your job, please transfer the useful output files to your account and remove all remaining files. Everybody must share this file space, so please be courteous and clean up your files regularly.
elm should have little to no interactive use, so jobs do not need to be submitted with the nice command as explained below.
UNIX systems are intended to be multi-tasking systems. As such, they are capable of responding to interactive use while, at the same time, running large computational programs. Typically, large jobs are submitted and then the user logs off; this is called running "in the background". You can submit background jobs by simply appending an ampersand, "&", to your command line:
% command < input > output &
Note that the input and output redirection are optional, and are shown for the sake of generality. This will launch the job and then return you to the interactive prompt.
If you forget to launch a job in the background (and have lost your interactive command prompt), type "CTRL-Z" (control Z) to suspend the job. You can then put the job in the background using the "bg" command:
% command < input > output
CTRL-Z
Suspended
% bg
[1] command < input > output &
Since all other UNIX machines are available for interactive use, you should submit any long-running jobs (greater than 10 CPU minutes) using the nice command:
% nice +15 command < inp > output &
If you submit a long-running job without the nice command, all jobs submitted properly (with the nice command) will cease to collect CPU time -- an unfair situation. Please cooperate so that everybody has equal access to the CPU cycles.
If you forget to nice a job or discover a "quick" job that has run past 10 CPU minutes, you can renice the job:
% renice 15 PID
where PID is the process ID obtained from the "top" command. You can also renice a job from within top.
Scheduling Many Jobs for Execution
You should not dump a large number of background jobs on the machine at once. You will only slow down the machine, hurting yourself and everyone else using the system. At most times, you should limit yourself to two concurrent jobs. If they are extremely CPU-intensive, or if several other jobs are also running, limit yourself to only one. You can check the number and owner of running processes via the "top" command.
Since the CPU has finite resources, it divides them among the jobs submitted, spending time determining which process to perform next and transferring the appropriate data. So 10 jobs launched in sequence will complete more quickly than 10 jobs launched in parallel. A simple way to submit jobs in sequence is to create a shell script, simply a file of commands for the computer to execute. A sample shell script might be:
#!/bin/sh
# shell script to run jobs in sequence
NICE=/usr/bin/nice
COMMAND=$HOME/bin/command
$NICE -15 $COMMAND < input_1 > output_1
$NICE -15 $COMMAND < input_2 > output_2
Executing this script then launches the jobs in sequence, starting the next as soon as the previous has completed.
In the example above, the command is launched using the nice command. The nice command reduces the priority of your job on the system so that the interactive use of the system takes precedence. Note the difference between the shell script example and the command line: a minus "-" versus a plus "+". The shell script example accesses the UNIX command "nice" whereas the command line example utilizes the C-shell "nice" command. Please consult the csh and nice man pages for more information.
Users are not allowed to install UNIX software on the server or terminals. Please contact staff@ne.tamu.edu if you would like software installed.
File Security
The default read/write/execute permission for file creation is group readable and world unreadable, meaning that anyone in your group (typically your research group) can look at your files but everybody else on the system cannot. If you have private files you don't want others accessing, you must take the responsibility to make sure those files have the proper permission. This can be accomplished:
% chmod 700 private.file
Consult the chmod man page for more details.
To run MCNP in the background successfully it is necessary to redirect the standard output to a file. This is done like the following:
servername% mcnp i=input_file o=output_file > out.out &
MCNP Manuals are located in Zach 2A.
In order to change your password run the following command:
% /opt/quest/bin/vastool passwd