LIQUID COOLING COMMISSIONING FOR AI DATA CENTERS: PROVING CDU, CONTROLS AND IST READINESS BEFORE HANDOVE
- Rafael Bagvanov

- May 28
- 10 min read

Important Technical Qualification
As of May 2026, liquid cooling for high-density and AI data center environments is developing rapidly, but there is not yet one universally adopted, end-to-end industry standard covering the manufacture, installation, pre-commissioning, functional testing and integrated testing of all liquid cooling architectures. Guidance documents, manufacturer requirements, owner specifications and project-specific sequences of operation do exist and must be followed where applicable. The considerations in this article reflect our current technical understanding, commissioning experience and interpretation of developing industry practice; they should not replace approved design documents, OEM requirements, safety procedures or project-specific acceptance criteria.
INTRODUCTION
Artificial intelligence is changing the physical requirements of data center infrastructure. AI workloads are driving higher rack densities, tighter thermal margins and increasing dependency on advanced cooling architectures. As a result, many new AI-ready facilities are evaluating or deploying direct liquid cooling systems supported by Coolant Distribution Units (CDUs).
This shift changes the commissioning challenge.
In a conventional facility, commissioning may focus heavily on airside performance, chilled-water systems, redundancy sequences, alarms and electrical transitions. In a liquid-cooled AI data center, the commissioning boundary extends closer to the IT equipment itself. Coolant flow, differential pressure, coolant quality, leak detection, CDU controls, rack-level interfaces and BMS visibility all become part of the operational readiness question.
The industry is responding to this change. Uptime Institute published its AI in Practice Paper: Level 4 and 5 Commissioning on 14 May 2026, focusing attention on the integration of new AI infrastructure technologies into a structured quality-assurance process before owner handover. The Open Compute Project (OCP) has also published guidance addressing pre-commission preparation of Technology Cooling System (TCS) row manifolds in liquid-cooled data centers. Separately, current industry analysis continues to identify electrical capacity and heat rejection as practical constraints on AI data center expansion.
For owners, developers and project teams, the conclusion is straightforward: liquid cooling capability cannot simply be installed and assumed ready. It must be inspected, tested, trended, documented and demonstrated before critical IT loads depend on it.
For a broader overview of startup, functional testing and integrated systems testing, see our previous article on the data center commissioning process.
WHY LIQUID-COOLED AI DATA CENTERS REQUIRE A DIFFERENT COMMISSIONING APPROACH
Liquid cooling introduces a different risk profile from traditional air-cooled environments. A direct-to-chip cooling architecture may include:
Facility water systems and primary cooling equipment.
CDU heat exchangers, pumps, controls and redundant components.
Secondary coolant loops and TCS pipework.
Row or rack manifolds, hoses and quick disconnects.
Rack-level sensors and interfaces to liquid-cooled IT equipment.
Leak detection, safety interfaces and supervisory controls.
BMS/DCIM monitoring, alarms, trend logs and operator graphics.
Each component may operate correctly in isolation while the integrated system still fails to provide the required operational outcome.
A CDU, for example, may start successfully and circulate coolant locally. This alone does not demonstrate that:
The required flow is available at each critical circuit or manifold.
Differential pressure remains within approved limits during changing demand.
Temperature control remains stable under the defined test conditions.
A standby pump or redundant component responds correctly following failure.
Leak detection identifies the correct affected zone and initiates the approved response.
Alarms are intelligible and actionable at the BMS level.
Trend data is sufficient for investigation, reporting and future operations.
The system can be safely returned to normal service after a fault or transition.
Cooling remains controlled during an electrical, controls or upstream plant event.
Liquid cooling commissioning must therefore be treated as a systems-integration and operational-readiness activity, not simply as mechanical equipment startup.

TCS LOOP PREPARATION: CLEANLINESS, FLUID QUALITY AND DOCUMENTED READINESS
The condition of the Technology Cooling System loop is fundamental to reliable operation. Liquid-cooled IT equipment may contain narrow flow passages and sensitive interfaces; construction debris, corrosion products, residual chemicals or improperly prepared fluid can impair performance and create long-term reliability risk.
Before the CDU and downstream liquid-cooled equipment are placed into operational service, the project team should establish and document the required pre-commissioning preparation for the connected cooling loop. Depending on the system design and approved requirements, this may include:
Inspection of installed pipework, manifolds, valves, fittings and temporary works.
Hydrostatic or pressure-testing records and acceptance criteria.
Flushing methodology, flushing velocities or other approved cleanliness procedures.
Filtration arrangements and filter inspection or replacement records.
Water or coolant quality requirements, sampling results and treatment records.
Evidence of cleanliness before connection to sensitive equipment.
Confirmation that temporary strainers, bypasses or flushing connections have been correctly managed.
Accurate as-built documentation, equipment tags and valve-position records.
The commissioning team does not replace the installing contractor, specialist water-treatment provider or OEM. However, commissioning should verify that the evidence required to support system readiness exists, is consistent with the approved procedure and is complete before dependent testing begins.
CDU COMMISSIONING: WHAT MUST BE PROVEN BEFORE HANDOVER
The CDU is the operational interface between facility cooling infrastructure and the liquid-cooled IT environment. Its commissioning must demonstrate more than basic start-up. The test programme should establish thermal performance, hydraulic stability, controls functionality, redundancy, alarm behaviour and recovery from abnormal conditions.
Installation and Pre-Functional Verification
Before functional testing, the commissioning team should verify that the CDU installation is consistent with approved drawings, submittals, manufacturer documentation and project requirements. Verification should cover, as applicable:
Equipment identification, model, duty and approved location.
Pipework routing, flow direction and correct primary/secondary connections.
Isolation valves, control valves, bypass arrangements, strainers and drains.
Pressure, differential-pressure, flow and temperature sensors.
Pump installation, power supplies, local isolators and variable-speed drives.
Control panels, network connections and communication cabling.
Leak detection interfaces and safety-related interlocks.
Physical access for operation, filter replacement, service and emergency response.
Labels, valve charts, operating instructions and relevant as-built information.
The team should also confirm the existence of approved records for system cleaning, flushing, pressure testing and coolant preparation. A correctly manufactured and installed CDU cannot perform reliably if the connected secondary loop has not been properly prepared.
Flow and Differential Pressure Validation
Liquid cooling performance depends on controlled delivery of coolant to the connected load. The commissioning procedure should verify that flow and differential pressure meet the approved project criteria under the applicable testing conditions.
Testing should address:
Total available secondary-loop flow at the CDU.
Flow distribution or confirmation at identified critical branches, manifolds or representative endpoints, where testable.
Differential pressure at approved reference or critical locations.
Response to pump-speed modulation, control-valve movement or changing simulated demand.
Stability of flow and pressure during duty/standby pump changeover.
Differential-pressure monitoring across filters or strainers, including alarm thresholds where provided.
Accuracy and consistency of locally displayed and BMS-displayed hydraulic values.
Incorrect flow distribution may not initially present as total system failure. Instead, it can create local thermal risk, unstable control behaviour or uncertainty when diagnosing future operating problems.
Supply and Return Temperature Performance
The CDU must maintain coolant temperatures in accordance with approved design intent, IT equipment requirements and the sequence of operations. Functional testing should verify:
Supply temperature control and stability.
Return temperature monitoring and plausibility.
Control response during applicable changing-load or simulated-load conditions.
High-temperature and low-temperature alarm functions.
Accuracy of temperature values at local controllers and through the BMS.
Interaction between thermal-control logic, pumps and valves.
Recovery and stability after an abnormal-temperature condition is cleared.
The objective is not merely to record a temperature sensor value; it is to demonstrate that the thermal control loop operates predictably and that abnormal conditions become visible to the operations team.
Pump Redundancy and Failure Response
Cooling interruption in a high-density environment can rapidly become an operational risk. Where duty/standby pumps or redundant CDU components form part of the design, commissioning should prove the required response under controlled and approved conditions.
Typical testing may include:
Duty pump operation under the applicable test condition.
Standby pump availability and ready status.
Controlled simulation of duty-pump shutdown, trip or loss of availability.
Automatic transfer or changeover behaviour.
Alarm generation locally and through the BMS.
Confirmation of flow and differential-pressure recovery.
Verification of equipment status after fault reset and return to normal mode.
Review of any transition effects on temperature-control stability.
A successful failover test is not only proof that a standby pump starts. It should demonstrate that the system maintains or recovers acceptable performance, clearly annunciates the event and returns to a defined normal operating state.
Valves, Filters, Leak Detection and Safety Interfaces
Liquid-cooled systems contain additional mechanical and monitoring interfaces that need structured testing. The test scope should address, where provided by the design:
Control valve stroke, modulation and feedback verification.
Correct valve fail position following loss of signal or power, where specified.
Filter or strainer differential-pressure indication and alarm behaviour.
Leak detection initiation, annunciation and zone/equipment identification.
Interlocks or controlled shutdown sequences associated with leakage or abnormal conditions.
Alarm priorities, alarm text and required operator actions.
Safe fault reset and restoration of normal service.
A generic “system fault” indication offers limited operational value. For readiness at handover, alarms should identify the condition and affected system sufficiently clearly for operations personnel to act in accordance with approved procedures.
CONTROLS AND BMS INTEGRATION: WHERE HIDDEN HANDOVER RISKS APPEAR
Many significant commissioning findings in mission-critical facilities are not caused by the failure of a pump, valve or sensor. Instead, they arise from incomplete integration between equipment-level controllers and the wider monitoring or control environment.
A CDU may be mechanically capable of operating correctly while the BMS displays an incorrect status, receives incomplete alarms, records no useful trends or obscures the root cause of a fault.
Controls and integration validation should therefore address:
Point mapping: Verify critical commands, statuses, alarms and analogue values from field device through controller and communications gateway to BMS display.
Alarm text and priority: Confirm that alarms identify the actual condition, equipment and required response rather than providing ambiguous generic messages.
Sensor failure logic: Simulate failed, missing or unreasonable values where permitted and verify the approved control response and alarm generation.
Communication-loss behaviour: Confirm that communication loss generates a clear alarm and that local controller operation is consistent with the approved sequence.
Trend availability: Verify trend logging for supply and return temperature, flow, differential pressure, pump command/status, valve position where applicable, leak detection and critical alarms.
Graphic accuracy: Compare local equipment indications and BMS graphics during both normal and abnormal testing.
Reset and event chronology: Confirm that alarms, events and trends provide a coherent record of fault initiation, system response, reset and recovery.
For operations teams, these matters are critical. During an incident, operators respond to alarms, graphics, trend data and operating procedures. If those interfaces are unreliable or unclear at handover, the cooling system may be technically functional but operationally unready.
L4 FUNCTIONAL TESTING FOR LIQUID-COOLED AI INFRASTRUCTURE
Level 4 functional testing should demonstrate that the installed liquid cooling systems operate in accordance with approved design intent, sequences of operation, manufacturer requirements, safety constraints and defined acceptance criteria.
The exact test script must be specific to the project. A typical L4 test programme for a CDU-supported liquid cooling system may include the following scenarios.
Normal Operating Mode
Operate the CDU under the approved testing methodology and verify supply temperature, return temperature, flow, differential pressure, pump status, valve response, alarm-normal status and BMS visibility. Relevant readings should be recorded and compared against the approved acceptance criteria.
Duty Pump Failure or Loss of Availability
Under approved safety controls, simulate failure or shutdown of the active pump. Verify standby pump response, transfer timing where applicable, alarm generation, BMS status, restoration of flow and pressure, and post-fault system stability.
Temperature or Pressure Abnormal Condition
Apply an approved simulation or test method for a high/low temperature or abnormal pressure condition. Confirm alarm thresholds, alarm messaging, control-loop response, any protective sequence and operator visibility.
Sensor Failure or Implausible Value
A failed sensor can lead to inappropriate control decisions or missed risks. Simulate the condition where permitted and confirm that the system identifies the input as abnormal, annunciates the fault and follows the approved response logic.
Communication Loss
Interrupt or simulate loss of communication between the CDU controls and BMS or supervisory network, where safely permitted. Confirm local operating behaviour, loss-of-communications alarms, BMS indication and recovery after communication is restored.
Leak Detection Response
Where leak detection forms part of the system design, prove detection, alarm annunciation, identification of the relevant equipment or area, any required escalation or interlock, and reset behaviour after the test condition is cleared.
Recovery Following Fault Clearance
Functional testing should not end once an alarm is generated. The commissioning team should confirm the steps required to clear the fault, restore normal operating mode, reset alarms correctly, preserve required event/trend records and demonstrate stable post-recovery operation.
L5 INTEGRATED SYSTEMS TESTING: PROVING FACILITY RESPONSE AS A WHOLE
Level 5 Integrated Systems Testing (IST) moves beyond individual equipment performance. It proves how the facility responds during realistic integrated events involving multiple dependent systems.
For a liquid-cooled AI data center, IST is particularly significant because the cooling chain may depend on electrical power, upstream plant availability, CDU control logic, communications, monitoring systems and timely operator response. A system that performs well during standalone testing may behave differently during an integrated power or cooling transition.
Subject to the project design, safety approvals and approved scripts, relevant L5 scenarios may include:
Electrical power transition while thermal demand remains active or is represented by the approved test methodology.
Loss or interruption of cooling equipment during an integrated operating state.
CDU failure during a broader plant or electrical event.
Interaction between primary cooling plant, CDU controls and BMS monitoring.
Concurrent alarms and confirmation that critical events remain visible and prioritised.
Restoration of power or cooling and verification of controlled system recovery.
Review of whether any uncontrolled thermal risk, loss of visibility or unstable sequence arises during transition or recovery.
The purpose of IST is not to stage dramatic failure scenarios. It is to confirm that the facility responds predictably, safely and transparently during events that could realistically challenge operational continuity.
In high-density AI environments, this evidence is particularly important because reduced thermal margins may allow less time for delayed detection, unclear alarms, incomplete sequences or unsuccessful recovery.
HANDOVER EVIDENCE OWNERS AND PROJECT TEAMS SHOULD REQUIRE
Operational readiness should be demonstrated through documented evidence, not verbal assurance. Prior to acceptance of a liquid-cooled AI-ready facility, owners and project teams should require a structured commissioning record appropriate to the project scope. This may include:
Approved commissioning plan, scope boundaries and responsibilities matrix.
Approved sequences of operation and acceptance criteria.
Pre-functional inspection checklists and completed records.
Approved L4 functional performance test scripts and completed results.
Approved L5 / IST scripts, execution records and witnessed outcomes.
CDU installation, startup and performance-validation records.
TCS cleaning, flushing, filtration, hydrotesting/pressure-testing and coolant-preparation records, where applicable.
Fluid-quality or sampling results where required by design, OEM or project procedure.
BMS point-to-point verification documentation.
Approved alarm matrix, alarm priorities and alarm-testing evidence.
Trend logs demonstrating normal performance, fault response and recovery.
Observation and issue log, including ownership, status and risk assessment.
Corrective-action, closure and retest evidence for identified findings.
Final commissioning report and operational readiness summary.
Updated operating procedures, escalation procedures, training records and relevant as-built documentation.
This documentation supports more than project closeout. It provides the operations team with a verified baseline for maintenance, troubleshooting, incident review and future system change management.
LIQUID COOLING CAPABILITY MUST BE PROVEN, NOT ASSUMED
AI-ready data centers are changing the relationship between IT infrastructure and facility systems. As rack densities rise and liquid cooling architectures become more common, there is less tolerance for untested interfaces, unclear alarms, incomplete failure sequences or inadequate handover evidence.
A CDU that starts successfully is not enough. A liquid-cooled facility must demonstrate that coolant delivery, thermal control, hydraulic stability, monitoring, alarms, redundancy and recovery operate correctly under approved functional and integrated testing conditions.
For owners, developers, consultants and project managers, rigorous commissioning provides the evidence needed to move from installation completion to operational readiness with confidence.
Planning an AI-ready or high-density data center project?Cx Plus supports multidisciplinary commissioning, controls validation and integrated systems testing to help project teams verify operational readiness before handover.
Contact Cx Plus to discuss commissioning requirements for your next mission-critical facility.
INDUSTRY REFERENCES
Uptime Institute — AI in Practice Paper: Level 4 and 5 Commissioning, published 14 May 2026.
Data Center Dynamics — AI Growth Is Running Into a Power and Heat Constraint, published 22 May 2026.
Open Compute Project — Guidelines for Pre-Commission Preparation of Technology Cooling System (TCS) Row Manifolds in Liquid Cooled Data Centers, Version 1.0, March 2025.
Cx Plus supports multidisciplinary commissioning to the Cx Plus Services page.
Contact Cx Plus to the website contact page.


Comments