PBS Professional Features

Used by thousands of organizations worldwide, from Top500 supercomputers to single-site clusters to cloud environments, PBS Professional provides the power, flexibility, security, scalability and reliability users need to manage their complex HPC infrastructures.

Policy-Based and Resource-Based Scheduling

  • NEW in 13.0: Expanded scheduling priority formula with full math functions (e.g., sqrt(), ceil(), …), conditional expressions, and a threshold for job start eligibility
  • SLAs enforced via tunable priorities, fairshare, reservations, preemption, access control lists (ACLs) and backfilling
  • Tunable scheduling formula to define any policy, including on-the-fly exceptions
  • Fairshare to ensure system resources are allocated proportionally, adjusted for recent usage and organizational priorities
  • NEW in 13.0: General fairshare formula enables accruals per-Q, license sharing, time-of-day, power use, even combinations of these
  • User, group, and project limits to implement fine-grained policy adjustments
  • Advance reservations and standing reservations to guarantee resources for recurring needs
  • Preemption and checkpointing (suspend/checkpoint/requeue) allows users to immediately run high-priority work
  • NEW in 13.0: Fine-grained targeting for preemption, configurable at the queue level (admin controlled)
  • Ability to start work immediately (interrupting the scheduling cycle) for high priority jobs
  • Age-based scheduling allowing users to adjust priority based on length of eligible time waiting in the queue
  • License scheduling via dynamic resources which allocates (and shares) software licenses served by FlexLM or other 3rd party licensing mechanisms
  • Eligible time which provides equitable job prioritization even when some users "stuff the queue" with lots of jobs
  • GPU / co-processor scheduling prioritizes use and manages access to all types of accelerators (NVIDIA, AMD) as well as the Intel® Xeon Phi™ coprocessor

User Productivity

  • NEW in 13.0: Long job and reservation names supported
  • Ability for jobs to be batch (both blocking and non-blocking) or interactive (including automatic X11 forwarding)
  • Estimated job start times feature, so users can plan workflows and meet deadlines
  • Guaranteed "run once" semantics, so jobs with side effects are truly run one time at most
  • Job arrays that a natural syntax for submitting and managing thousands of similar tasks as a single object (e.g., for design-of-experiment (DOE) workflows)
  • Job status with history (via "qstat -x") so users never lose track of jobs
  • Job dependencies so users can define complex workflows for automatic execution
  • Ability for Hybrid MPI+OpenMP jobs to specify exact requirements (e.g., 64-way MPI job with each MPI rank having 4 OpenMP threads, where MPI rank 0 has 64 GB of memory and all other ranks only need 1 GB of memory)
  • Advance reservations and standing reservations to guarantee resources for recurring needs

Administrator Productivity

  • NEW in 13.0: Custom resources can be created directly using qmgr, without the need to restart the server
  • Utilities for reporting job and system status, including debugging utilities to gather all distributed log data for “jobs of interest”
  • On-the-fly reconfiguration – add/remove nodes, change configuration settings, restart daemons – all without negatively impacting running jobs
  • Extensive accounting data for detailed troubleshooting and customized reporting
  • Ability to restrict user logins on nodes — restrict_user capability allows you to prevent users from directly logging in and using nodes (without going through PBS)
  • Ability to define “invisible” resources to capture internal scheduling policy without exposing details to individual users
  • Plugin framework (“hooks”) for custom health checking, mitigation, and notification capabilities including off-lining flaky nodes, restarting scheduling cycles and requeuing jobs
  • NEW in 13.0: New hook events at job launch, host boot, and task attach, plus easier hooks authoring (per-hook config files, offline debugging support, and improved logging for troubleshooting)
  • NEW in 13.0: Plugin enhancement to add custom usage measurements, available to users on-the-fly via qstat and via accounting reports
  • Access to node information from within PBS run-time environment ("hooks" interface), including the ability to off-line nodes from within the runjob hook (to address "black holes")
  • qmgr command line editing/history
  • Broad platform and 3rd party software support – runs most anywhere (Linux, UNIX, Windows, MPIs, OpenMP, …) – for details see Supported Platforms
    • NEW in 13.0: Expanded support: Intel MPI and MPICH2 on Windows; UNC paths for stdin, stdout, and file staging on Windows; SLES 12 and RHEL 7


Scalability, Security, Resilience

  • NEW in 13.0: Million-core scalability – tested to 50,000+ nodes
  • NEW in 13.0: cgroups eliminate resource contention – jobs run faster and don’t interfere with each other or the OS*
  • Embedded, multi-threaded database that delivers high volume, fast performance to 1000s of simultaneous users running jobs on 1,000,000s of cores
  • Rapid job submission (~100 qsubs per second per user) via automatic background processing
  • Common Criteria EAL3+ security certified
  • RedHat cross-domain security (or "multi-level security"(MLS)) capabilities via SELinux support, with Kerberos v5 available — currently Limited Availability (LA); ask us for details
  • Bulletproof reliability with no single point of failure architecture and automatic failover server configuration
  • NEW in 13.0: Comprehensive health check framework monitors your health check script behavior – either checks run or node is marked down
  • High-availability reservations — advance and standing reservations automatically detect and replace failed nodes
  • Enhanced security option to disallow root/Admin jobs and hooks coming from the Server
  • Ability to offline or reboot "current" node from within a MOM hook (scalable to 10,000's of MOMs)



Application Performance

  • NEW in 13.0: Fast, reliable startup of huge MPI jobs – tested at tens of thousands of MPI ranks; minimizes delays caused by faulty nodes
  • Heterogeneous MPI allocations (e.g. 64GB mem for rank 0, but only 1 GB for others) reduce memory waste
  • Enhanced job placement options that allow MPI tasks to be scattered by vnode (e.g., for NUMA node or GPU) and allow hosts to be allocated exclusively (e.g., for jobs on Cray systems and dedicated time on SGI UV systems)
  • Topology-aware scheduling (both inter- and intra-node) to ensure maximum application performance while minimizing cross-job network contention; PBS Professional optimizes task placement for all HPC network topologies (InfiniBand, SGI, Cray, IBM, GigE etc.), improving application performance and reducing network contention
  • Node grouping to ensure jobs are allocated nodes with similar attributes, e.g. same CPU speed, to make the most efficient use of the hardware (so a single slow node doesn’t slow down a 100-way MPI job)


Throughput, Utilization, Minimizing Waste

  • NEW in 13.0: Fast throughput – supports 1,000,000+ jobs per day
  • Green Provisioning™ (power-aware scheduling) for automatic resource shutdown/restart to conserve energy (proven to lower one customer's energy use by up to 30%)
  • Backfill TopN scheduling to eliminate wasted cycles without delaying top priority work
  • Dynamic OS provisioning automatically changes the OS to match changing workload demands
  • Shrink-to-fit jobs boost utilization, especially before planned system outages – one supercomputing center recovered 800,000+ idle CPU hours in just a few months (plus, jobs actually run sooner)
  • Job arrays allowing for maximum throughput to schedule, execute and manage unlimited jobs
  • Metascheduling (leveraging Altair's Peer Scheduling technology) for job scheduling and management across distinct clusters
  • Ability to aggregate heterogeneous nodes (or entire clusters) into “one big cluster” — eliminating silos and enabling additional sharing of resources to increase overall utilization and reduce waste
  • Ability for indirect resources to enable partitioned sharing (e.g., one scratch disk per rack)
  • Desktop cycle harvesting to run jobs using idle cycles on desktop systems, eliminating waste and boosting throughput; especially useful during nights and weekends
  • Load balancing to ensure machines running multiple jobs are not overloaded
  • Over-subscription to enable executing more jobs than cores (over-allocating the node) in order to gain additional throughput for jobs that do not use the entire CPU
  • Node sorting to prioritize allocation of hardware to jobs, allowing the best available resources to be used


Open Architecture and Extensibility

  • Standards: POSIX Batch standard, EAL3+ security, Web services, Python, OGSA BES HPCBP
  • Broad Platform Support - LINUX, UNIX and Windows
    • NEW in 13.0: Expanded Windows support: Intel MPI and MPICH2 on Windows; UNC paths for stdin, stdout, and file staging on Windows; SLES 12 and RHEL 7
    • NEW in 13.0: SLES 12 support
    • NEW in 13.0: RHEL 7 support
  • MPI integrations for all major MPI libraries, including full usage accounting for MPI jobs
  • Python is available everywhere allowing one script to be used across all architectures Availability of source code
  • Submission filtering hooks to change/augment capabilities on-site, on the fly
  • User customizable “runjob” hooks to ensure allocation management limits are strictly enforced
  • Parallel prologue-like hooks that can run at job setup time and perform complex (and custom) node health checks
  • Parallel epilogue-like hooks that can run as the last action after a job finishes, just prior to the host being freed, and can perform final (custom) cleanup actions
  • Periodic node-level hooks that can check node health, measure and report resource availability and use, and even reboot/offline faulty nodes

 

* Currently Limited Availability — ask Altair for details about implementing this capability at your site

Request Informaton



Get Started Today!