PBS Professional Features

Used by thousands of organizations worldwide, from Top500 supercomputers to single-site clusters to cloud environments, PBS Professional provides the power, flexibility, security, scalability and reliability users need to manage their complex HPC infrastructures.

Policy-Based and Resource-Based Scheduling

  • SLAs enforced via tunable priorities, fairshare, reservations, preemption, access control lists (ACLs) and backfilling
  • Tunable scheduling formula to define any policy, including on-the-fly exceptions, with full math functions (e.g., sqrt(), ceil()), conditional expressions, and a threshold for job start eligibility
  • Fairshare to ensure system resources are allocated proportionally, adjusted for recent usage and organizational priorities
  • General fairshare formula enables accruals per-Q, license sharing, time-of-day, power use, even combinations of
  • User, group, and project limits to implement fine-grained policy adjustments
  • Advance reservations and standing reservations to guarantee resources for recurring needs
  • Preemption and checkpointing (suspend/checkpoint/requeue) allows users to immediately run high-priority work
  • Fine-grained policy settings for preemption and backfill, configurable at the queue level
  • Ability to start work immediately (interrupting the scheduling cycle) for high priority jobs
  • Ability to mark jobs as “top job ineligible”, ensuring backfilling is available for the highest priority work
  • Age-based scheduling allowing users to adjust priority based on length of eligible time waiting in the queue
  • License scheduling via dynamic resources which allocates (and shares) software licenses served by FlexLM or other 3rd party licensing mechanisms
  • Eligible time which provides equitable job prioritization even when some users "stuff the queue" with lots of jobs
  • GPU / co-processor scheduling prioritizes use and manages access to all types of accelerators (NVIDIA, AMD) as well as the Intel® Xeon Phi™ coprocessor

User Productivity

  • Ability for jobs to be batch (both blocking and non-blocking) or interactive (including automatic X11 forwarding)
  • Shows estimated job start times, so users can plan workflows and meet deadlines
  • Guaranteed "run once" semantics, so jobs with side effects are truly run one time at most
  • Job arrays provide a natural syntax for submitting and managing thousands of similar tasks as a single object (e.g., for design-of-experiment (DOE) workflows)
  • Job status with history (via "qstat -x") so users never lose track of jobs
  • Job dependencies so users can define complex workflows for automatic execution
  • Ability for Hybrid MPI+OpenMP jobs to specify exact requirements (e.g., 64-way MPI job with each MPI rank having 4 OpenMP threads, where MPI rank 0 has 64 GB of memory and all other ranks only need 1 GB of memory)
  • Advance reservations and standing reservations to guarantee resources for recurring needs

Administrator Productivity

  • Utilities for reporting job and system status, including debugging utilities to gather all distributed log data for “jobs of interest”
  • On-the-fly reconfiguration – add/remove nodes, change configuration settings, restart daemons, even define new custom resources – all without negatively impacting running jobs
  • Extensive accounting data for detailed troubleshooting and customized reporting
  • Ability to restrict user logins on nodes — restrict_user capability allows you to prevent users from directly logging in and using nodes (without going through PBS Pro)
  • Ability to define “invisible” resources to capture internal scheduling policy without exposing details to all users
  • Plugin framework (“hooks”) for custom health checking, mitigation, and notification capabilities including off-lining flaky nodes, restarting scheduling cycles and requeuing jobs
  • Hook events cover entire job lifecycle, including at job submit, launch, host boot, task attach, even server periodic, and tools make authoring hooks easier (per-hook config files, offline debugging support, and improved logging for troubleshooting)
  • Plugin enhancement to add custom usage measurements, available to users on-the-fly via qstat and via accounting reports
  • Access to node information from within PBS run-time environment ("hooks" interface), including the ability to off-line nodes from within the runjob hook (to address "black holes")
  • qmgr command line editing/history
  • Supports standard admin facilities on Linux, e.g., RPM, systemd, Python (2.7).
  • Broad platform and 3rd party software support – runs most anywhere (Linux, Windows, MPIs, OpenMP, …) – for details see Supported Platforms"

Scalability, Security, Resilience

  • Million-core scalability – tested to 50,000+ nodes
  • Supports cgroups to eliminate resource contention – jobs run faster and don’t interfere with each other or the OS*
  • Embedded, multi-threaded database that delivers high volume, fast performance to 1000s of simultaneous users running jobs on 1,000,000s of cores
  • Rapid job submission (~100 qsubs per second per user) via automatic background processing
  • Fast, event-based scheduling ensures jobs start quickly; scheduling speed is enhanced via numerous optimizations, including fuzzy backfill.
  • Common Criteria EAL3+ security certified
  • RedHat cross-domain security (or "multi-level security"(MLS)) capabilities via SELinux support, with Kerberos v5 available*
  • Support for authentication via MUNGE for additional security
  • Bulletproof reliability with no single point of failure architecture and automatic failover server configuration
  • Comprehensive health check framework monitors your health check script behavior – either checks run or node is marked down

Application Performance

  • Fast, reliable startup of huge MPI jobs – tested at tens of thousands of MPI ranks; minimizes delays caused by faulty nodes
  • Heterogeneous MPI allocations (e.g. 64GB mem for rank 0, but only 1 GB for others) reduce memory waste
  • Enhanced job placement options that allow MPI tasks to be scattered by vnode (e.g., for NUMA node or GPU) and allow hosts to be allocated exclusively (e.g., for jobs on Cray systems and dedicated time on HPE SGI UV systems)
  • Topology-aware scheduling (both inter- and intra-node) to ensure maximum application performance while minimizing cross-job network contention; PBS Professional optimizes task placement for all HPC network topologies (InfiniBand, HPE, Cray, IBM, GigE etc.), improving application performance and reducing network contention
  • Node grouping to ensure jobs are allocated nodes with similar attributes, e.g. same CPU speed, to make the most efficient use of the hardware (so a single slow node doesn’t slow down a 100-way MPI job)

Throughput, Utilization, Minimizing Waste

  • Fast throughput – supports 1,000,000+ jobs per day
  • Green Provisioning™ (power-aware scheduling) for automatic resource shutdown/restart to conserve energy (proven to lower one customer's energy use by up to 30%)
  • Backfill TopN scheduling to eliminate wasted cycles without delaying top priority work
  • Dynamic OS provisioning automatically changes the OS to match changing workload demands
  • Shrink-to-fit jobs boost utilization, especially before planned system outages – one supercomputing center recovered 800,000+ idle CPU hours in just a few months (plus, jobs actually run sooner)
  • Job arrays allowing for maximum throughput to schedule, execute and manage unlimited jobs
  • Metascheduling (leveraging Altair's Peer Scheduling technology) for job scheduling and management across distinct clusters
  • Ability to aggregate heterogeneous nodes (or entire clusters) into “one big cluster” — eliminating silos and enabling additional sharing of resources to increase overall utilization and reduce waste
  • Ability for indirect resources to enable partitioned sharing (e.g., one scratch disk per rack)
  • Desktop cycle harvesting to run jobs using idle cycles on desktop systems, eliminating waste and boosting throughput; especially useful during nights and weekends
  • Load balancing to ensure machines running multiple jobs are not overloaded
  • Over-subscription to enable executing more jobs than cores (over-allocating the node) in order to gain additional throughput for jobs that do not use the entire CPU
  • Node sorting to prioritize allocation of hardware to jobs, allowing the best available resources to be used

Open Architecture and Extensibility

  • Standards: POSIX Batch standard, EAL3+ security, Web services, Python
  • Broad Platform Support - LINUX and Windows
  • MPI integrations for all major MPI libraries, including full usage accounting for MPI jobs
  • Python is available everywhere allowing one script to be used across all architectures
  • Availability of source code
  • Submission filtering hooks to change/augment capabilities on-site, on the fly
  • User customizable “runjob” hooks to ensure allocation management limits are strictly enforced
  • Parallel prologue-like hooks that can run at job setup time and perform complex (and custom) node health checks
  • Parallel epilogue-like hooks that can run as the last action after a job finishes, just prior to the host being freed, and can perform final (custom) cleanup actions
  • Periodic node-level hooks that can check node health, measure and report resource availability and use, and even reboot/offline faulty nodes

* Currently Limited Availability — ask Altair for details about implementing this capability at your site

Request Information

Get Started Today!