User Tools

Site Tools


tutorial:torque

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorial:torque [2017/03/22 14:09]
sey212 [Running Gurobi]
tutorial:torque [2024/02/28 13:12] (current)
mjm519 [Table]
Line 31: Line 31:
 | polyp1--polyp15  | 16 AMD Opteron(tm) Processor 6128 | 32 GB | --- | | polyp1--polyp15  | 16 AMD Opteron(tm) Processor 6128 | 32 GB | --- |
 | polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) | | polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) |
 +
 +
 +Configured Resources as provided in the Maui scheduler. This is pulled from Torque: 
 +                        PROCS: 16  
 +                        MEM: 31G  
 +                        SWAP: 63G  
  
 ===== Submitting Jobs ===== ===== Submitting Jobs =====
Line 44: Line 50:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem=1GB
 #PBS -q batch #PBS -q batch
  
Line 61: Line 68:
 </code> </code>
 If you do not want to write the submission script you can do it just by calling If you do not want to write the submission script you can do it just by calling
-<code>qsub -N JobName -q batch -l nodes=1:pnn=2  myscript.sh</code>+<code>qsub -N JobName -q batch -l nodes=1:ppn=2  myscript.sh</code>
 Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'') Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'')
  
 ===== Options ===== ===== Options =====
  
-  * ''-q <queue>'' set the queue. Often you will use the standard queue, so no need to set this up. +^ Option  ^ Description 
-  ''-V'' will pass all environment variables to the job +''-q <queue>''  | Set the queue. Often you will use the standard queue, so no need to set this up.  
-  ''-v var[=value]'' will specifically pass environment variable 'var' to the job +''-V''  | Will pass all environment variables to the job  
-  ''-b y'' allow command to be a binary file instead of a script. +''-v var[=value]''  | Will specifically pass environment variable 'var' to the job  
-  ''-w e'' verify options and abort if there is an error +''-b y''  | Allow command to be a binary file instead of a script.  
-  ''-N <jobname>'' name of the job. This you will see when you use qstat, to check status of your jobs. +''-w e''  | Verify options and abort if there is an error  
-  ''-l resource_list'' specify resources +''-N <jobname>''  | Name of the job. This you will see when you use qstat, to check status of your jobs.  
-  ''-l h_rt=<hh:mm:ss>'' specify the maximum run time (hours, minutes and seconds) +''-l resource_list''  | Specify resources  
-  ''-l s_rt=hh:mm:ss'' specify the soft run time limit (hours, minutes and seconds) - Remember to set both s_rt and h_rt. +''-l h_rt=<hh:mm:ss>''  | Specify the maximum run time (hours, minutes and seconds)  
-  ''-cwd'' run in current working directory +''-l s_rt=hh:mm:ss''  | Specify the soft run time limit (hours, minutes and seconds) - Remember to set both s_rt and h_rt.  
-  ''-wd <dir>'' Set working directory for this job as <dir> +''-cwd''  | Run in current working directory  
-  ''-o <output_logfile>'' name of the output log file +''-wd <dir>''  Set working directory for this job as <dir>  
-  ''-e <error_logfile>'' name of the error log file +''-o <output_logfile>''  | Name of the output log file  
-  ''-m ea'' Will send email when job ends or aborts +''-e <error_logfile>''  | Name of the error log file  
-  ''-P <projectName>'' set the job's project +''-m ea''  Will send email when job ends or aborts  
-  ''-M <emailaddress>'' Email address to send email to +''-P <projectName>''  | Set the job's project  
-  ''-t <start>-<end>:<incr>'' submit a job array with start index , stop index in increments using+''-M <emailaddress>''  Email address to send email to  
 +''-t <start>-<end>:<incr>''  | Submit a job array with start index , stop index in increments using |
  
-See [[http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html|THIS]] for more details+You can find detailed information [[http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html|here]].
  
 +<note tip>You need to use option ''-V'' to pass environment variables, which is needed to run solvers such as (Cplex, Gurobi, MOSEK, etc..). [[tutorial:torque#running_solvers|See here]].</note>
 ===== Monitoring and Removing jobs ===== ===== Monitoring and Removing jobs =====
  
Line 103: Line 112:
 We have few queues ''qstat -Q'' We have few queues ''qstat -Q''
 <code> <code>
-Queue            Memory CPU Time Walltime Node  Run Que Lm  State +Queue              Max    Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T   Cpt 
----------------- ------ -------- -------- ----  --- --- --  ----- +----------------   ---   ----    --    --   ---   ---   ---   ---   ---   --- -   --- 
-gpu                --      --       --      --    0   --   E R +MOSEK               48      0   yes   yes                         0 E     0 
-medium             --      --       --      --    0   --   R +AMPL                      0   yes   yes                         0 E     0 
-short              --      --       --      --      --   R +long                30      1   yes   yes                         0 E     0 
-long               --      --       --      --    0   0 --   E R +gpu                  4        yes   yes                         0 E     0 
-batch              --      --       --      --    0   0 --   R +verylong            20      0   yes   yes                             0 
-verylong           --      --       --      --    0   50   R +medium             100        yes   yes                         0 E     0 
-AMPL               --      --       --      --      10   R +coraverylong                 no    no         0                 0 E     0 
-MOSEK              --      --       --      --    0   0 50   E R +special             24        yes   yes                             0 
 +batch                     1   yes   yes                         0 E     0 
 +short                0        yes   yes                         0 E     0 
 +urgent                       no    no         0                 0 E     0 
 +background                  yes   yes                             0 
 +mediumlong          60      0   yes   yes                         0 E     0
 </code> </code>
  
Line 121: Line 134:
  
 You can see limits using this command ''qstat -f -Q'' You can see limits using this command ''qstat -f -Q''
-^ Queue ^ Wall Time ^ +^ Queue       ^ Wall Time  ^ Max Queueable  ^ Max Running  ^ Max User run  ^ Max User Queuable  ^ Notes                         
-batch  | 01:00:00  +urgent      |            |                |              |                                  | high priority - upon request  
-| short  | 02:00:00  +| batch       | 01:00:00   |                |              |                                  |                               
-| medium | 04:00:00 +| short       | 02:00:00   |                |              |                                  |                               
-| long  | 72:00:00  +| medium      | 04:00:00   |                | 100          | 40            | 200                |                               | 
-very long  | 240:00:00 |+| mediumlong  | 24:00:00   | 1200           | 60                                            |                               
 +| long        | 72:00:00   |                | 30           | 20            | 900                |                               
 +verylong    | 240:00:00                 | 20           | 10            | 600                |                               | 
 +| special     | 72:00:00                  | 24                                            |                               | 
 +| background  | unlimited  |                |              |                                  | low priority                  | 
 +| gpu                    |                | 4            | 1                                | GPU node is not in Torque     | 
 +| AMPL        |            | 200            | 8            | 6                                |                               | 
 +| MOSEK                  | 50             | 48                                            |                               |
  
 +
 +
 +Notes:
 +  * Urgent queue has no limits and jobs have a higher priority over all other jobs in the queues. Please be respectful of others if using this queue to complete time sensitive or critical jobs.
 +  * background queue has no limits and jobs have a lower priority over all other jobs in the queues.
 ===== Examples ===== ===== Examples =====
  
 ==== Submitting a Small or Large Memory Job ==== ==== Submitting a Small or Large Memory Job ====
  
-You can use the option ''-l mem=size,vmem=size'' to limit memory usage of your job.+You can use the option ''-l pmem=size,vmem=size'' to limit memory usage of your job.
  
 <code bash limited.sh> <code bash limited.sh>
-qsub -l mem=4gb,vmem=4gb test.pbs+qsub -l pmem=4gb,vmem=4gb test.pbs
 </code> </code>
  
 Sometimes your job needs more memory. You can choose a larger memory size with the same option: Sometimes your job needs more memory. You can choose a larger memory size with the same option:
  
-<code bash large.pbs>qsub  -l mem=20gb  test.pbs</code>+<code bash large.pbs>qsub  -l pmem=20gb  test.pbs</code> 
 + 
 +To see what resources have been assigned by the batch queuing system run the ulimit command (bash) or limit comamnd: 
 +<code bash pbs job submission command>qsub -I -l nodes=1:ppn=1 -l pmem=30GB:vmem=4GB -q short -N test -e TEST.err -o TEST.out -w e</code> 
 +<code bash ulimit>user@polyp13:~$ ulimit -a 
 +core file size          (blocks, -c) 0 
 +data seg size           (kbytes, -d) 31457280 
 +scheduling priority             (-e) 0 
 +file size               (blocks, -f) unlimited 
 +pending signals                 (-i) 128344 
 +max locked memory       (kbytes, -l) unlimited 
 +max memory size         (kbytes, -m) 31457280 
 +open files                      (-n) 65536 
 +pipe size            (512 bytes, -p) 8 
 +POSIX message queues     (bytes, -q) 819200 
 +real-time priority              (-r) 0 
 +stack size              (kbytes, -s) unlimited 
 +cpu time               (seconds, -t) unlimited 
 +max user processes              (-u) 128344 
 +virtual memory          (kbytes, -v) unlimited 
 +file locks                      (-x) unlimited</code>
  
 +**[[https://www.geeksforgeeks.org/ulimit-soft-limits-and-hard-limits-in-linux|For more information on the ulimit command review this link.]]**
 ==== Running MATLAB ==== ==== Running MATLAB ====
  
Line 150: Line 196:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem:1GB
 #PBS -q batch #PBS -q batch
  
Line 157: Line 204:
 <note tip>Use **-singleCompThread** [[https://www.mathworks.com/help/matlab/ref/maxnumcompthreads.html|option]] for Matlab to use a single thread. A similar option may be needed for the program/solver you're using.</note> <note tip>Use **-singleCompThread** [[https://www.mathworks.com/help/matlab/ref/maxnumcompthreads.html|option]] for Matlab to use a single thread. A similar option may be needed for the program/solver you're using.</note>
  
-==== Running Solvers (Gurobi/CPLEX/Mosek/AMPL/...) ====+==== Running Solvers ====
  
-In order to run solvers, you need to use "-V" (it is Upper case) option. i.e.:+In order to run solvers (such as Gurobi/CPLEX/Mosek/AMPL/...), you need to use "-V" (it is Upper case) option. i.e.:
  
 <code>qsub -V submitFile.pbs </code> <code>qsub -V submitFile.pbs </code>
Line 181: Line 228:
  
 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30
 +
 +If you are using TensorFlow in Python, you can set the limit on amount of GPU memory using:
 +<code>config_tf = tf.ConfigProto()
 +config_tf.gpu_options.per_process_gpu_memory_fraction = p</code>
 +in which **//p//** is the percent of GPU memory (a number between zero and one). 
  
 ==== Running MPI and Parallel Jobs ==== ==== Running MPI and Parallel Jobs ====
Line 305: Line 357:
   * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler)   * **PBS_WALLTIME** (the walltime requested by the user or default walltime allotted by the scheduler)
  
 +
 +==== Tensorflow with GPU ====
 +To use tensorflow with a specific GPU, say GPU 1, you can simply set
 +<code bash>
 +export CUDA_VISIBLE_DEVICES=1
 +</code>
 +and then schedule your jobs with Torque to perform experiments on GPU 1.
 +
 +
 +====== MOAB Scheduler ======
 +PBS Torque is used to schedule and run jobs on our cluster. Two PBS processes are required to run jobs. On the PBS server, the pbs_server process runs to accept your job and add it to the queue. It will also dispatch the job to the nodes to run under the pbs_mom process.
 +
 +
 +==== Useful MOAB Commands ====
 +1. [[https://docs.adaptivecomputing.com/maui/commands/showq.php|showq]] - Displays information about active, eligible, blocked, and/or recently completed jobs.
 +
 +2. [[https://docs.adaptivecomputing.com/maui/commands/showstart.php|showstart]] - Displays the estimated start time of a job based a number of analysis types.
 +
 +3. [[https://docs.adaptivecomputing.com/maui/commands/checkjob.php|checkjob]] - Allows end users to view the status of their own jobs.
 +
 +====Useful External Resources====
 +[[https://www.icer.msu.edu/sites/default/files/files/understand_job_scheduler_v2.pdf|MSU -Understand job scheduler and resource manager]] - Describes the batch queuing system and has some useful diagrams explaining the interrelationship between the scheduler and the server.
 +
 +[[https://wvuhpc.github.io/2019-Intro-HPC/07-jobs/index.html|WVU - Job Submission (Torque and Moab)]] - Lists frequently used commands for Torque and Moab. Also includes information on Prologue and Epilogue scripts.
 +
 +[[http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#pbsintegration.html|Moab-TORQUE/PBS Integration Guide]] - Guide for Administrators and integrators on the deployment and integration of PBS Torque and Moab into a computer system
 +
 +[[https://silas.net.br/tech/hpc/torque.html|Torque Notes]] - Information about the processes involved in using torque and debugging information.
  
  
tutorial/torque.1490206188.txt.gz · Last modified: 2017/03/22 14:09 by sey212