- Why mrjob?
- Writing jobs
- Config file format and location
- Options available to all runners
- Hadoop-related options
- Configuration quick reference
- Cloud runner options
- Job Environment Setup Cookbook
- Hadoop Cookbook
- Testing jobs
- Cloud Dataproc
- Elastic MapReduce
- Python 2 vs. Python 3
- Contributing to mrjob
- mrjob.cmd: The
- mrjob.conf - parse and write config files
- mrjob.dataproc - run on Dataproc
- mrjob.emr - run on EMR
- mrjob.hadoop - run on your Hadoop cluster
- mrjob.job - defining your job
- mrjob.protocol - input and output
- mrjob.spark.runner - run on any Spark cluster
- mrjob.runner - base class for all runners
- mrjob.step - represent Job Steps
- mrjob.cmd: The
- What’s New
- Setting the slice to a python date object: job.setall(time(10, 2)) job.setall(date(2000, 4, 2)) job.setall(datetime(2000, 4, 2, 10, 2)) Run a jobs command. Running the job here will not effect it’s existing schedule with another crontab process: jobstandardoutput = job.run Creating a job with a comment.
- Here is an example slurm script that loads anaconda3 module and runs hello world python script. Note: to use the scheduler, you prepend python hello.py with srun command. Save this slurm script to hello.slurm. Then run it by submitting the job to the slurm scheduler with: We will take this slurm job script and modify it to run as a job array.
- Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please. If you store your jobs in a database, they will also survive scheduler restarts and maintain their state.
Mrjob lets you write MapReduce jobs in Python 2.7/3.4+ and run them on several platforms. You can: Write multi-step MapReduce jobs in pure Python. Test on your local machine. Run on a Hadoop cluster. Run in the cloud using Amazon Elastic MapReduce (EMR) Run in the cloud using Google Cloud Dataproc (Dataproc).
An abstraction layer to run jobs on HPC clusters using Grid Engine, Torque, or locally.
An abstraction layer to run jobs on HPC clusters using Grid Engine, SLURM, Torque, or locally.
The jobrunner package was developed by the United States Foodand Drug Administration, Center for Food Safety and Applied Nutrition.
- Free software
- Documentation: https://jobrunner.readthedocs.io
- Source Code: https://github.com/CFSAN-Biostatistics/jobrunner
- PyPI Distribution: https://pypi.python.org/pypi/jobrunner
- Python API for job submission
- Consistent interface to run jobs on Grid Engine, SLURM, Torque, or locally
- Dependencies between jobs
- Array jobs and normal non-array jobs
- Array job parameter substitution
- Array job slot-dependency
- Limit the CPU resources consumed by array jobs
- Separate log files for each array job task
To cite jobrunner, please reference the jobrunner GitHub repository:
See the LICENSE file included in the jobrunner distribution.
- Add support for wall clock time limits.
- Allow array tasks in local mode to process only a portion of the lines in the array fileby setting num_tasks to a value less than the number of lines in the array file.
- Add support for the SLURM job scheduler.
- Add capability to request exclusive access to compute nodes when running on SLURM.
- Add the capability to run in quiet mode when running locally on a workstationso the job stdout and stderr are written to log files only.
Python Job Runner Download
- HPC array job command lines are quoted and executed in a subshell by default with better support for complex command lines.
Release historyRelease notifications RSS feed
Python Job Runner Tutorial
Python Job Runner
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size jobrunner-1.4.0.tar.gz (19.5 kB)||File type Source||Python version None||Upload date||Hashes|
Hashes for jobrunner-1.4.0.tar.gz