Data Center Engineering

Every watt a server consumes becomes a watt of heat. There is no other place for the energy to go.

The job of a data center cooling system is to capture that heat at the silicon, move it through a series of fluid loops, and dump it into the atmosphere — all while keeping intake air at the rack within an envelope the IT equipment is rated for, and doing so at the lowest possible parasitic energy cost. Get the chain of heat exchanges right and your PUE sits near 1.2. Get it wrong and you burn another watt of facility energy for every watt of compute.

This is the engineering problem in one sentence. The implementation has filled fifty years of conference proceedings, and the recent surge in AI rack densities has invalidated assumptions that held for two decades. What follows is how the system actually works, layer by layer, from the silicon junction to the cooling tower fan.

The Thermal Problem

A modern data hall must reject heat continuously, in the right place, at the right rate, and within a tight temperature band. ASHRAE TC 9.9 publishes the thermal guidelines almost every operator designs to: a recommended dry-bulb envelope of 18°C to 27°C at the IT inlet, with allowable extensions up to 32°C or higher depending on the equipment class. Stay inside that band and server fans throttle normally, electronic component life follows the manufacturer's MTBF curves, and you avoid thermal trips. Drift outside it and the IT equipment compensates by spinning its own fans harder — which raises both IT energy and the heat that energy becomes, in a feedback loop that punishes overcooling almost as much as undercooling.

The thermal load itself is the sum of the IT power plus minor contributions from lighting and personnel. For a hall provisioned at 5 MW of IT, you are rejecting roughly 5 MW of heat continuously, 8,760 hours a year. That figure does not change with outdoor temperature; what changes is how much work the cooling plant must do to reject it. In January in Reykjavík you can do it with a fan and an open damper. In August in Phoenix you cannot.

Average rack densities reported in the Uptime Institute Global Data Center Survey have crept from roughly 5–8 kW a decade ago into the 10–30 kW band today, with AI training racks deploying at 50–150 kW. That density jump is the single most important variable in modern cooling design, because it determines whether air can carry the heat away at all.

Air Cooling and Why It Has a Ceiling

For most of the industry's history, cooling has meant moving cold air past the front of a server and exhausting hot air from the back. The standard implementation is the hot aisle / cold aisle arrangement: racks face each other across a cold aisle, server fans pull conditioned supply air from the front, and the hot exhaust dumps into an alternating hot aisle behind. Computer Room Air Handlers (CRAH units, fed by chilled water) or Computer Room Air Conditioners (CRAC units, with integral DX cooling) line the perimeter or sit between rows, pulling hot return air across coils and pushing supply air either under a raised floor or through overhead plenums.

The arrangement works because air has predictable thermal properties: roughly 1.005 kJ per kilogram per degree of specific heat at sea level. To carry 10 kW of heat across a 12°C delta you need about 830 cubic meters per hour of air — manageable. To carry 50 kW you need five times that volume, which means either much higher face velocities at the rack (and intolerable noise, recirculation, and pressure losses) or a much larger temperature delta (which requires colder supply air and erodes economizer hours). Past roughly 25–30 kW per rack, the physics of pushing enough air through a 600 mm-wide cabinet stops being practical. Hyperscale and HPC operators have known this for years; the rest of the industry is now meeting the same wall.

Containment is the single highest-ROI intervention in any air-cooled hall. Without it, hot exhaust recirculates around the top of the rack and mixes with the cold supply, forcing the CRAH to deliver lower setpoints to compensate. With hot-aisle or cold-aisle containment — physical barriers above the racks and at the ends of the aisles — supply and return air streams stay separated, return air temperatures rise (which improves chiller and economizer performance), and supply temperatures can rise too without violating IT inlet specs. Most retrofits of older halls pay back containment in under two years on cooling energy alone.

Airflow management at the rack level matters as much as the architecture. Blanking panels in empty U positions, brush grommets at cable cutouts, and pressure-balanced perforated tiles all prevent bypass and recirculation. A well-managed air-cooled hall with containment can reach a PUE in the 1.3–1.4 range in moderate climates. A poorly managed one in the same building, same climate, will run 1.6 or worse.

The Chilled Water Plant

Behind the CRAH units sits the plant — the loop that actually rejects heat to the outside world. In its most common form, it consists of three loops in series.

The chilled water loop circulates between the CRAH coils in the white space and the evaporators of the chillers in the central plant. Typical supply temperatures are 7°C to 18°C, with the higher end increasingly common as operators adopt warmer ASHRAE envelopes. Variable primary pumping has largely replaced primary-secondary arrangements in new builds because it reduces pumping energy at part load.

The chiller lifts heat from the chilled water loop to a condenser water loop. The dominant technology is the water-cooled centrifugal chiller, with magnetic-bearing oil-free machines (Turbocor and similar) increasingly specified for their part-load efficiency. Efficiency at full load runs around 0.5–0.55 kW per ton; at the part-load conditions a data center actually sees most of the year, modern variable-speed chillers can drop below 0.35 kW per ton. Air-cooled chillers replace the condenser water loop with finned coils and fans, trading efficiency for the elimination of cooling towers and the water they consume.

The condenser water loop carries heat from the chiller to a cooling tower, where it is rejected to atmosphere through evaporation. Cooling towers exploit the wet-bulb temperature rather than the dry-bulb, which is why they outperform dry coolers in humid climates — the wet-bulb is always lower than the dry-bulb, often by 10°C or more in arid regions. The tradeoff is water consumption: a 5 MW data hall using cooling towers will evaporate several million liters of water per year, which is the central reason WUE (Water Usage Effectiveness) is now reported alongside PUE in serious sustainability disclosures.

In dry climates or where water is constrained, dry coolers and adiabatic coolers substitute for cooling towers. Dry coolers reject heat through finned-tube heat exchangers with fans, sized against dry-bulb temperature; adiabatic coolers add a pre-cooling spray that drops the entering air temperature toward the wet-bulb during the hottest hours. Both eliminate or reduce evaporative water consumption at the cost of higher fan energy and larger physical footprint.

Economization: The Free Cooling Window

Whenever outdoor conditions are cold enough to reject heat without running a vapor-compression cycle, the chiller can be bypassed or unloaded. This is economization, and it is where modern facilities earn most of their PUE advantage.

Air-side economization brings outdoor air directly into the data hall whenever the outdoor dry-bulb is below the supply setpoint, exhausting return air rather than recirculating it. Filtration becomes more demanding (MERV 13 or higher) and humidity must be managed, but in temperate climates an air-side economizer can run thousands of hours per year on fan energy alone. Facebook (now Meta) made air-side economization the basis of their Open Compute designs in Prineville, Oregon, and the approach is now standard in Northern European hyperscale deployments.

Water-side economization is the more common retrofit option because it preserves the existing chilled water distribution. A plate-and-frame heat exchanger is plumbed in parallel with the chiller; whenever the condenser water (or a dedicated dry cooler loop) is cold enough to chill the supply water directly, the chiller compressors shut off and the heat exchanger does the work. Partial economization — where the heat exchanger pre-cools chilled water returning to a still-running chiller — extends the useful range further. A well-designed water-side economizer in a moderate climate can deliver 4,000 to 6,000 free-cooling hours annually.

The economizer hour count is one of the most important early decisions in site selection. The cooling system on paper looks the same in Dallas and in Dublin; in operation, the Dublin facility spends most of the year in economization and the Dallas facility spends most of the year compressing.

Liquid Cooling

Once rack densities cross roughly 30 kW, air-handling alone stops working and the heat exchange has to move closer to the silicon. Three architectures dominate.

Rear-door heat exchangers (RDHx) mount a chilled-water coil on the back of the cabinet, in the hot exhaust stream. Server fans push their own hot air across the coil before it returns to the room, and the air leaving the rear door is at or near room temperature. The technology is essentially an in-rack CRAH and integrates cleanly with an existing chilled water plant. RDHx handles 30–50 kW per rack comfortably and represents the lowest-disruption entry to liquid cooling, since the servers themselves remain conventional air-cooled equipment.

Direct-to-chip liquid cooling (DLC) plumbs a coolant — usually treated water or a water-glycol mix — directly to cold plates mounted on the CPUs, GPUs, and other high-power components inside the server. A coolant distribution unit (CDU) in the row or at the end of the aisle isolates the technology cooling system loop from the facility chilled water loop. DLC removes 70–80% of the rack's heat through the liquid path; the remainder, from memory, voltage regulators, and other components, still goes to air. The advantage is enormous: facility water can run at 30–45°C supply temperatures, which keeps chillers off for most of the year and makes dry coolers viable in climates where they could not support an air-cooled load. NVIDIA, AMD, Dell, HPE, and Lenovo now ship liquid-cooled SKUs as standard options on their AI-class hardware, and direct-to-chip is on track to be the dominant cooling architecture for new AI deployments by 2027.

Immersion cooling submerges the entire server in a dielectric fluid, either a mineral oil or a synthetic fluorocarbon. Single-phase immersion circulates the fluid through an external heat exchanger; two-phase immersion uses a fluid that boils at the chip surface, with the vapor condensing on coils above the bath. Heat transfer is excellent, fans are eliminated entirely (further reducing IT energy), and densities exceeding 100 kW per tank are achievable. The trade-offs are real: serviceability is harder, supply chains for engineered dielectrics are limited, and PFAS-related regulatory pressure has pushed some operators away from two-phase fluids. Immersion remains a strong fit for HPC and edge applications, less so for mainstream colocation.

Controls, Set Points, and the Things That Quietly Degrade Performance

A cooling system that was efficient at commissioning is rarely still efficient five years later, and the reason is almost always controls drift, not equipment failure. The set points that matter most are supply air temperature (drives chiller and economizer load), chilled water supply temperature (drives chiller efficiency and economizer hours), and condenser water temperature setpoint (drives cooling tower fan energy versus chiller energy).

The classical mistake is to over-tune for safety margin. Lowering supply air from 24°C to 18°C "to be safe" eliminates thousands of economizer hours, depresses chiller COP, and saves no IT equipment that was not already comfortable. Raising chilled water supply from 7°C to 14°C unlocks dramatic free-cooling gains in any temperate climate and costs nothing in IT performance if airflow is well-managed. Every facility should periodically test how far supply temperatures can rise before the IT envelope is threatened, then operate as close to that limit as the variability of the load allows.

Continuous commissioning — measuring actual versus design performance on the cooling plant, the air handlers, and the airflow paths through the hall — catches the slow degradation that nobody notices day to day. Filters fouling, dampers drifting, economizer valves not fully isolating, sensor calibration drift: each one costs 1–3% in cooling energy and they accumulate.

Where the Industry Is Going

The shift that defines the current decade is the move from air to hybrid to predominantly liquid cooling for new high-density builds. Rack densities are not coming back down. NVIDIA's current-generation training nodes draw 700W per GPU; eight-GPU servers approach 6 kW each; a 42U rack of them lands well past 100 kW. Air cannot remove that much heat from a single cabinet at any reasonable noise or velocity. Direct-to-chip on a warm-water loop is becoming the default, with rear-door heat exchangers serving the lower-density tail and air handling demoted to the residual 20–30% of rack heat that liquid does not capture.

What does not change is the underlying engineering principle: every degree you can raise the temperature of the working fluid before it leaves the facility is a degree of free cooling you can exploit and a percentage point of chiller efficiency you can recover. The architecture changes. The physics does not.

A modern cooling system is judged on three numbers — PUE, WUE, and the climate-adjusted economizer hours it can deliver. The engineering job is to design and operate the heat-rejection chain so that all three sit at the limit of what the site and the workload allow. Everything in this article is in service of that one objective.

References: ASHRAE TC 9.9 Thermal Guidelines for Data Processing Environments; The Green Grid PUE methodology; Uptime Institute Global Data Center Survey 2025; ISO/IEC 30134 series.

How Data Center Cooling Systems Work

The Thermal Problem

Air Cooling and Why It Has a Ceiling

The Chilled Water Plant

Economization: The Free Cooling Window

Liquid Cooling

Controls, Set Points, and the Things That Quietly Degrade Performance

Where the Industry Is Going

Paul Owiredu

LEAVE A REPLY

How Data Center Cooling Systems Work

The Thermal Problem

Air Cooling and Why It Has a Ceiling

The Chilled Water Plant

Economization: The Free Cooling Window

Liquid Cooling

Controls, Set Points, and the Things That Quietly Degrade Performance

Where the Industry Is Going

Paul Owiredu

LEAVE A REPLY

More like this

Inside data center cooling: How CRAC, CRAH, and chilled water systems work