User Tools

Site Tools


condor

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
condor [2015/09/29 10:30]
sertalpbilal
condor [2017/07/16 00:05]
sertalpbilal [CONDOR]
Line 1: Line 1:
 ======CONDOR====== ======CONDOR======
 +
 +<note warning>This page is about our retired job scheduling system **CONDOR**. Check [[tutorial:torque|Torque]] to schedule jobs in Polyps.</note>
 +
  
 ===== What is CONDOR ===== ===== What is CONDOR =====
Line 14: Line 17:
 To submit a job via CONDOR, you need to create a .sub file. This .sub file must include a program that you will execute (e.g., matlab, cplex, etc.) along with the arguments for the program (such as your file to be executed).  It's an automated way to run programs. To submit a job via CONDOR, you need to create a .sub file. This .sub file must include a program that you will execute (e.g., matlab, cplex, etc.) along with the arguments for the program (such as your file to be executed).  It's an automated way to run programs.
  
-<note tip> 
-You can find the ''Executable'' of a program by calling ''which //program//'' command. 
-</note> 
- 
-=== Matlab === 
- 
-Here is an example .sub file which submits the matlab file 'test.m' to condor for running: 
  
 +=== A case study: Matlab ===
  
 +Suppose that we want to run a MATLAB code on Polyps. Here is an example .sub file which submits the matlab file 'test.m' to condor for running and saves results of the code to 'out.txt' file, while CONDOR errors and logs are stored at 'error.txt' and 'log.txt', respectively.
  
 <code bash myexp.sub> <code bash myexp.sub>
Line 49: Line 47:
 </code> </code>
 to submit the file to condor.\\ to submit the file to condor.\\
 +
 +<note tip>
 +You can find the "Executable" of a program by calling ''which //program//'' command.
 +</note>
 +<note tip>Frequently used executables on Polyps:
 +  * Matlab: /usr/local/matlab/latest/bin/matlab
 +  * Cplex: /usr/local/cplex/bin/x86-64_linux/cplex
 +  * Mosek: /usr/local/mosek/7.1/tools/platform/linux64x86/bin/mosek
 +  * Ampl: /usr/local/ampl/ampl
 +</note>
  
 ==== Submitting Multiple Jobs ==== ==== Submitting Multiple Jobs ====
Line 200: Line 208:
  
 </code> </code>
 +
 +<note important>
 +Be sure to provide output argument to your Condor submissions. Otherwise, you may not able to see results of your tasks.
 +</note>
 ==== Checking Jobs ==== ==== Checking Jobs ====
  
Line 212: Line 224:
 condor_q userid    #this checks all jobs under specific user name condor_q userid    #this checks all jobs under specific user name
 </code> </code>
 +
 +<note tip>If you think somehow your jobs are not being processed, you can debug and see the reasons by calling ''condor_q //userid// -analyze'' command.</note>
  
 ==== Removing Jobs ==== ==== Removing Jobs ====
Line 241: Line 255:
  
 ^ Command ^ Action ^ Basic Usage ^ Example ^ ^ Command ^ Action ^ Basic Usage ^ Example ^
-| condor_submit | submit a job | condor_submit  [submit file] | $ condor_submit job.condor | +''condor_submit'' | submit a job | condor_submit  [submit file] | $ condor_submit job.condor | 
-| condor_q | show status of jobs | condor_q [cluster] | $ condor_q 1170 | +''condor_q'' | show status of jobs | condor_q [cluster] | $ condor_q 1170 | 
-| condor_rm | remove jobs from the queue | condor_rm [cluster] | $ condor_rm 1170 | +''condor_rm'' | remove jobs from the queue | condor_rm [cluster] | $ condor_rm 1170 | 
-| ''condor_rm userid'' | remove all jobs of user  |  |  |+| ''condor_rm //userid//'' | remove all jobs of user  |  |  |
  
 [[http://www.rcc.uh.edu/hpc-docs/134-basic-condor-commands.html|Source]] [[http://www.rcc.uh.edu/hpc-docs/134-basic-condor-commands.html|Source]]
 +
 +===== Some other CONDOR commands =====
 +
 +^ Command ^ Action ^ Info ^
 +| ''condor_userprio'' | shows the user priority | condor_userprio |
 +| ''condor_status'' | show the current status of CONDOR nodes |
 +
  
 ===== Running MPI Jobs with Condor ===== ===== Running MPI Jobs with Condor =====
  
 FIXME To submit MPI jobs to our condor pool you can check Dr. Takac's [[http://polyps.ie.lehigh.edu/mpi|MPI tutorial]] FIXME To submit MPI jobs to our condor pool you can check Dr. Takac's [[http://polyps.ie.lehigh.edu/mpi|MPI tutorial]]
 +
 +
 +===== Using AMPL with Condor =====
 +
 +We have limited license of AMPL installed in COR@L Lab. The license allows at most 10 simultaneous AMPL jobs. If you are using AMPL in your experiments you can let condor know about this and it will schedule all jobs that needs AMPL considering the license limit. For this you should add the following line to your condor submission file.
 +
 +''concurrency_limits = AMPL''
 +
 +===== Condor Jobs Memory Usage =====
 +Please check status of your condor jobs regularly, especially memory usage.
 +Each polyp node has 16 processors and 32 GB memory. This means 1 process
 +gets 2 GB memory in average.
 +
 +When a polyp node is out of memory it starts using hard drive (swap) as memory
 +but reading and writing from hard drives is 1000 times slower. This means
 +if your jobs are using large amounts of memory and the polyp node processing
 +your job is out of memory, do not expect your job to terminate.
 +
 +Tips:
 +You can see memory usage of your job using ''condor_q'' command (7th column
 +gives memory usage in MB).
 +
 +You can check the node your job is running using ''condor_q -run''
 +
 +You can check memory status in a node using ''ssh polyp6 "vmstat -s"''.
 +For more memory checking commands see http://www.binarytides.com/linux-command-check-memory-usage/
 +or google is your friend.
 +
 +**Your job might get killed if it is using swap. Do not waste your
 +system administrators' time with this and force them to police the condor jobs.
 +Just control your jobs and submit jobs that are reasonable.**
 +
condor.txt · Last modified: 2017/07/16 00:05 by sertalpbilal