User Tools

Site Tools


tutorial:torque

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorial:torque [2017/04/08 15:03]
sertalpbilal [Running Solvers (Gurobi/CPLEX/Mosek/AMPL/...)]
tutorial:torque [2024/02/28 13:12] (current)
mjm519 [Table]
Line 31: Line 31:
 | polyp1--polyp15  | 16 AMD Opteron(tm) Processor 6128 | 32 GB | --- | | polyp1--polyp15  | 16 AMD Opteron(tm) Processor 6128 | 32 GB | --- |
 | polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) | | polyp30 | 24 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz | 128 GB | 2x K80 (4GPUs) |
 +
 +
 +Configured Resources as provided in the Maui scheduler. This is pulled from Torque: 
 +                        PROCS: 16  
 +                        MEM: 31G  
 +                        SWAP: 63G  
  
 ===== Submitting Jobs ===== ===== Submitting Jobs =====
Line 44: Line 50:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem=1GB
 #PBS -q batch #PBS -q batch
  
Line 61: Line 68:
 </code> </code>
 If you do not want to write the submission script you can do it just by calling If you do not want to write the submission script you can do it just by calling
-<code>qsub -N JobName -q batch -l nodes=1:pnn=2  myscript.sh</code>+<code>qsub -N JobName -q batch -l nodes=1:ppn=2  myscript.sh</code>
 Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'') Now, we will run the code but we are setting the job parameters using ''-'' character (e.g. ''-N JobName'')
  
Line 86: Line 93:
  
 You can find detailed information [[http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html|here]]. You can find detailed information [[http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html|here]].
 +
 +<note tip>You need to use option ''-V'' to pass environment variables, which is needed to run solvers such as (Cplex, Gurobi, MOSEK, etc..). [[tutorial:torque#running_solvers|See here]].</note>
 ===== Monitoring and Removing jobs ===== ===== Monitoring and Removing jobs =====
  
Line 103: Line 112:
 We have few queues ''qstat -Q'' We have few queues ''qstat -Q''
 <code> <code>
-Queue            Memory CPU Time Walltime Node  Run Que Lm  State +Queue              Max    Tot   Ena   Str   Que   Run   Hld   Wat   Trn   Ext T   Cpt 
----------------- ------ -------- -------- ----  --- --- --  ----- +----------------   ---   ----    --    --   ---   ---   ---   ---   ---   --- -   --- 
-gpu                --      --       --      --    0   --   E R +MOSEK               48      0   yes   yes                         0 E     0 
-medium             --      --       --      --    0   --   R +AMPL                      0   yes   yes                         0 E     0 
-short              --      --       --      --      --   R +long                30      1   yes   yes                         0 E     0 
-long               --      --       --      --    0   0 --   E R +gpu                  4        yes   yes                         0 E     0 
-batch              --      --       --      --    0   0 --   R +verylong            20      0   yes   yes                             0 
-verylong           --      --       --      --    0   50   R +medium             100        yes   yes                         0 E     0 
-AMPL               --      --       --      --      10   R +coraverylong                 no    no         0                 0 E     0 
-MOSEK              --      --       --      --    0   0 50   E R +special             24        yes   yes                             0 
 +batch                     1   yes   yes                         0 E     0 
 +short                0        yes   yes                         0 E     0 
 +urgent                       no    no         0                 0 E     0 
 +background                  yes   yes                             0 
 +mediumlong          60      0   yes   yes                         0 E     0
 </code> </code>
  
Line 121: Line 134:
  
 You can see limits using this command ''qstat -f -Q'' You can see limits using this command ''qstat -f -Q''
-^ Queue ^ Wall Time ^ +^ Queue       ^ Wall Time  ^ Max Queueable  ^ Max Running  ^ Max User run  ^ Max User Queuable  ^ Notes                         
-batch  | 01:00:00  +urgent      |            |                |              |                                  | high priority - upon request  
-| short  | 02:00:00  +| batch       | 01:00:00   |                |              |                                  |                               
-| medium | 04:00:00 +| short       | 02:00:00   |                |              |                                  |                               
-| long  | 72:00:00  +| medium      | 04:00:00   |                | 100          | 40            | 200                |                               | 
-very long  | 240:00:00 |+| mediumlong  | 24:00:00   | 1200           | 60                                            |                               
 +| long        | 72:00:00   |                | 30           | 20            | 900                |                               
 +verylong    | 240:00:00                 | 20           | 10            | 600                |                               | 
 +| special     | 72:00:00                  | 24                                            |                               | 
 +| background  | unlimited  |                |              |                                  | low priority                  | 
 +| gpu                    |                | 4            | 1                                | GPU node is not in Torque     | 
 +| AMPL        |            | 200            | 8            | 6                                |                               | 
 +| MOSEK                  | 50             | 48                                            |                               |
  
 +
 +
 +Notes:
 +  * Urgent queue has no limits and jobs have a higher priority over all other jobs in the queues. Please be respectful of others if using this queue to complete time sensitive or critical jobs.
 +  * background queue has no limits and jobs have a lower priority over all other jobs in the queues.
 ===== Examples ===== ===== Examples =====
  
 ==== Submitting a Small or Large Memory Job ==== ==== Submitting a Small or Large Memory Job ====
  
-You can use the option ''-l mem=size,vmem=size'' to limit memory usage of your job.+You can use the option ''-l pmem=size,vmem=size'' to limit memory usage of your job.
  
 <code bash limited.sh> <code bash limited.sh>
-qsub -l mem=4gb,vmem=4gb test.pbs+qsub -l pmem=4gb,vmem=4gb test.pbs
 </code> </code>
  
 Sometimes your job needs more memory. You can choose a larger memory size with the same option: Sometimes your job needs more memory. You can choose a larger memory size with the same option:
  
-<code bash large.pbs>qsub  -l mem=20gb  test.pbs</code>+<code bash large.pbs>qsub  -l pmem=20gb  test.pbs</code> 
 + 
 +To see what resources have been assigned by the batch queuing system run the ulimit command (bash) or limit comamnd: 
 +<code bash pbs job submission command>qsub -I -l nodes=1:ppn=1 -l pmem=30GB:vmem=4GB -q short -N test -e TEST.err -o TEST.out -w e</code> 
 +<code bash ulimit>user@polyp13:~$ ulimit -a 
 +core file size          (blocks, -c) 0 
 +data seg size           (kbytes, -d) 31457280 
 +scheduling priority             (-e) 0 
 +file size               (blocks, -f) unlimited 
 +pending signals                 (-i) 128344 
 +max locked memory       (kbytes, -l) unlimited 
 +max memory size         (kbytes, -m) 31457280 
 +open files                      (-n) 65536 
 +pipe size            (512 bytes, -p) 8 
 +POSIX message queues     (bytes, -q) 819200 
 +real-time priority              (-r) 0 
 +stack size              (kbytes, -s) unlimited 
 +cpu time               (seconds, -t) unlimited 
 +max user processes              (-u) 128344 
 +virtual memory          (kbytes, -v) unlimited 
 +file locks                      (-x) unlimited</code>
  
 +**[[https://www.geeksforgeeks.org/ulimit-soft-limits-and-hard-limits-in-linux|For more information on the ulimit command review this link.]]**
 ==== Running MATLAB ==== ==== Running MATLAB ====
  
Line 150: Line 196:
 #PBS -o /home/mat614/TEST.out #PBS -o /home/mat614/TEST.out
 #PBS -l nodes=1:ppn=4  #PBS -l nodes=1:ppn=4 
 +#PBS -l pmem=2GB:vmem:1GB
 #PBS -q batch #PBS -q batch
  
Line 181: Line 228:
  
 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30 However, first you have to have a permission to use GPU (given by Prof. Takac) -- this is just formality to allow to certain users to use video driver on polyp30
 +
 +If you are using TensorFlow in Python, you can set the limit on amount of GPU memory using:
 +<code>config_tf = tf.ConfigProto()
 +config_tf.gpu_options.per_process_gpu_memory_fraction = p</code>
 +in which **//p//** is the percent of GPU memory (a number between zero and one). 
  
 ==== Running MPI and Parallel Jobs ==== ==== Running MPI and Parallel Jobs ====
Line 312: Line 364:
 </code> </code>
 and then schedule your jobs with Torque to perform experiments on GPU 1. and then schedule your jobs with Torque to perform experiments on GPU 1.
 +
 +
 +====== MOAB Scheduler ======
 +PBS Torque is used to schedule and run jobs on our cluster. Two PBS processes are required to run jobs. On the PBS server, the pbs_server process runs to accept your job and add it to the queue. It will also dispatch the job to the nodes to run under the pbs_mom process.
 +
 +
 +==== Useful MOAB Commands ====
 +1. [[https://docs.adaptivecomputing.com/maui/commands/showq.php|showq]] - Displays information about active, eligible, blocked, and/or recently completed jobs.
 +
 +2. [[https://docs.adaptivecomputing.com/maui/commands/showstart.php|showstart]] - Displays the estimated start time of a job based a number of analysis types.
 +
 +3. [[https://docs.adaptivecomputing.com/maui/commands/checkjob.php|checkjob]] - Allows end users to view the status of their own jobs.
 +
 +====Useful External Resources====
 +[[https://www.icer.msu.edu/sites/default/files/files/understand_job_scheduler_v2.pdf|MSU -Understand job scheduler and resource manager]] - Describes the batch queuing system and has some useful diagrams explaining the interrelationship between the scheduler and the server.
 +
 +[[https://wvuhpc.github.io/2019-Intro-HPC/07-jobs/index.html|WVU - Job Submission (Torque and Moab)]] - Lists frequently used commands for Torque and Moab. Also includes information on Prologue and Epilogue scripts.
 +
 +[[http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#pbsintegration.html|Moab-TORQUE/PBS Integration Guide]] - Guide for Administrators and integrators on the deployment and integration of PBS Torque and Moab into a computer system
 +
 +[[https://silas.net.br/tech/hpc/torque.html|Torque Notes]] - Information about the processes involved in using torque and debugging information.
 +
 +
tutorial/torque.1491678221.txt.gz · Last modified: 2017/04/08 15:03 by sertalpbilal