Skip to main content

Industry

UPS for AI, GPU & HPC Infrastructure | Australia

AI training and inference workloads draw far more power than legacy IT. Typical NVIDIA H100 racks pull 30-60kW each, and dense AI training pods can exceed 100kW per rack. Power architecture must accommodate higher rack density, bigger inrush currents, and tighter voltage regulation tolerances than traditional data hall design.

AI, GPU & High-Density Computing UPS infrastructure, UPS Services Australia

UPS Services designs critical power for AI, GPU rendering farms, and HPC environments. We work with high-density modular UPS, lithium-ion battery systems for runtime efficiency, and high-current PDU architectures suitable for 50A+ per rack distribution.

The challenge is not just total kW: it is the load profile. AI training generates large step-load transients as jobs start and stop across hundreds of GPUs simultaneously. Standard UPS sizing based on steady-state kW will underperform. Our designs account for transient response, power factor at load, and the thermal coupling between compute and cooling infrastructure.

We support the full stack from incoming supply assessment through UPS specification, high-current distribution, rack-level PDU selection, and ongoing maintenance. Whether you are building a new AI training cluster or retrofitting GPU capability into an existing facility, we design the power architecture to match.

Sector challenges

What makes ai, gpu & high-density computing different.

5 critical design considerations that shape UPS architecture for this sector.

01 / 05

GPU step loads

AI training generates large step-load transients as jobs start/stop across GPU clusters. UPS topology must handle 0-100% load swings without dropping to bypass or flagging fault. Modular UPS with fast DSP control loops performs best under these conditions.

02 / 05

Heat-rejection coupling

AI sites are often liquid-cooled. The UPS protected load must include CDU (Coolant Distribution Unit) pumps, heat-rejection fans, and control systems, not just the compute. Losing cooling during a power event causes thermal shutdown of the entire cluster within seconds.

03 / 05

Power factor characteristics

Modern GPU PSUs have near-unity PF (>0.99) so UPS sizing is closer to kVA = kW. However, older equipment with PF 0.7-0.8 requires significantly larger UPS. Mixed environments need careful per-rack PF assessment, not assumptions.

04 / 05

Density demands different distribution

At 30-100kW per rack, traditional branch-circuit distribution runs out of breaker slots. Overhead busway, high-current whips, and 60A/100A rack PDUs replace standard 32A C19/C20 distribution.

05 / 05

Runtime vs recovery strategy

AI training jobs can checkpoint and restart, so the UPS runtime need is the time to complete an orderly checkpoint (30-90 seconds), not the full generator handover time. This can significantly reduce battery sizing and cost for pure AI workloads.

Typical configurations

UPS patterns we deploy.

  • 01High-density modular UPS (200-1000kW)
  • 02Lithium-ion (high cycle, fast charge)
  • 03Three-phase distribution to rack (busway or cable)
  • 0460A/100A rack PDU
  • 05Tight voltage regulation (±2%)
  • 06N+1 modular redundancy
  • 07Dual-corded rack PDU distribution

Equipment

Recommended for this sector.

Manufacturer-trained installation and service across all major UPS brands.

  • APC Galaxy VL (modular, high-density ready)
  • Eaton 93PM (fast transient response)
  • Vertiv Liebert APM2 (modular, scalable)
  • Vertiv Trinergy Cube (high-density)
  • Lithium-ion battery systems (all brands)
  • Raritan PX4 high-density PDU (60A)
  • Vertiv Geist rPDU (100A)

When it matters

Real-world scenarios.

What goes wrong without proper UPS, and how the right architecture prevents it.

Scenario 01

GPU cluster trips UPS to bypass

A 200-GPU training cluster starts a new job, drawing 800kW in under 2 seconds. The legacy UPS cannot slew fast enough, so it flags overload and transfers to bypass. Any subsequent mains event during bypass would drop the entire cluster. A modular UPS with sub-cycle transient response handles the step load within specification.

Scenario 02

Cooling loss cascades to thermal shutdown

A power dip causes the CDU pumps to trip. The UPS protects the GPUs but not the cooling loop. Within 90 seconds, GPU junction temperatures exceed limits and the cluster thermally shuts down, losing 6 hours of training progress. Designing cooling infrastructure on the UPS-protected bus prevents this cascade.

Scenario 03

Undersized UPS from legacy kVA calculation

A facility sizes UPS at 0.8 PF (legacy IT assumption) for a new GPU cluster running at 0.99 PF. The UPS is 20% oversized on kVA but the design missed the higher inrush currents and step-load transients. The financial cost was correct but the technical specification was wrong. Per-rack load profiling avoids this.

Our services

Relevant services for ai, gpu & high-density computing.

Frequently asked questions

4 questions answered.

Q01

Do AI workloads need different UPS topology?

Yes. AI workloads generate step-load transients (rapid 0-100% load swings) that exceed the slew rate of many legacy UPS systems. Online double-conversion with fast DSP control loops and modular architecture handles these transients best. Line-interactive and legacy transformer-based UPS are not suitable for high-density GPU environments.

Q02

How much UPS runtime do I need for an AI cluster?

It depends on your recovery strategy. If your training framework supports checkpointing (most do, including PyTorch and TensorFlow), you only need enough runtime to complete an orderly checkpoint and graceful shutdown: typically 30-90 seconds. If you need the cluster to ride through to generator power, 5-10 minutes is standard. The checkpoint strategy can reduce battery CapEx by 60-80% compared to full-runtime designs.

Q03

Should cooling be on the UPS?

For liquid-cooled AI clusters: absolutely. GPU thermal shutdown happens in 30-120 seconds without active cooling. If the UPS protects the compute but not the CDU pumps and heat-rejection equipment, you have not actually protected the workload. Design the UPS to cover the full thermal envelope: compute + cooling + control.

Q04

What power distribution works for 30kW+ racks?

Standard 32A C19/C20 distribution runs out of capacity above 15-20kW per rack. For 30kW+ racks, use overhead busway with high-current tap-offs, or direct cable whips to 60A/100A rack PDUs. Dual-corded PDU distribution provides redundancy. We design the distribution alongside the UPS to ensure the full power chain is rated for the density.

Specify ai, gpu & high-density computing

Quote returned within one business day. Australia-wide.