Blog • March 2026

GPU Thermal Management: Keeping 7 RTX 5090s Cool

By Cemhan Biricik — Founder of ZSky AI

Here is a problem they do not teach you in computer science: what happens when you put seven flagship GPUs in a single system and run AI inference on all of them 24 hours a day? The answer is heat. A lot of heat. Managing thermals on the ZSky AI GPU cluster has been one of the most practical engineering challenges of building this platform, and I have learned things that no spec sheet will tell you.

The Physics of the Problem

Each RTX 5090 has a TDP of around 575 watts. Seven of them under load means up to 4,025 watts of heat from the GPUs alone. Add the CPU, memory, storage, and other components, and the system can draw over 5,000 watts at peak. That is the equivalent of running five space heaters simultaneously. In a room. Where you also work.

The first time I ran all seven GPUs at full load, the ambient room temperature rose by several degrees within an hour. The GPUs hit thermal throttle within twenty minutes. This was not a software problem — it was a physics problem, and it needed a physics solution.

Airflow Is Everything

The single most important factor in GPU thermal management is not the cooler on the GPU. It is the airflow through the system and the room. I learned this the hard way. Fancy GPU coolers mean nothing if the hot air they exhaust has nowhere to go and just gets recycled back into the intake.

My approach: positive pressure airflow through the case with high-static-pressure fans at the front pulling cool air in, and exhaust fans at the top and rear pushing hot air out. The room itself has dedicated ventilation that pulls warm air from ceiling level and replaces it with cooler air from lower intakes. In summer, supplemental cooling is necessary — a dedicated AC unit for the server space.

GPU Spacing and Slot Configuration

This is where most multi-GPU builds fail. If you stack GPUs in adjacent PCIe slots, the bottom GPU starves the one above it for airflow. The GPU on top gets the pre-heated exhaust from the one below. Temperatures on the top GPU can run 15-20 degrees hotter than the bottom one.

The solution is spacing. Give each GPU at least one empty slot between it and the next. Use a motherboard with sufficient PCIe slot spacing, or use riser cables to position GPUs with gaps between them. This single change — proper spacing — dropped my hottest GPU by 12 degrees.

Fan Curves and Power Limits

I run custom fan curves on every GPU. The default curves that ship with consumer GPUs prioritize noise over cooling — they ramp fans slowly and gently. For a production AI system, noise is irrelevant. I want the fans running aggressively to keep temperatures in the 65-75 degree sweet spot.

I also adjust power limits strategically. A GPU running at 90% power typically delivers 95% of the performance at 75% of the heat output. This is a fantastic trade-off for AI inference where you are running sustained workloads. The tiny performance decrease is invisible to users, but the thermal benefits are significant.

Monitoring and Automated Response

I monitor every GPU's temperature continuously. When any GPU crosses 78 degrees, the queue system automatically reduces its load and shifts work to cooler GPUs. At 85 degrees, the GPU is pulled from the inference pool entirely until it cools down. This has never actually triggered a full shutdown, but having the safety net means I sleep at night without worrying about thermal damage.

Thermal Management Checklist for Multi-GPU AI Systems

Thermal management is not glamorous work. Nobody starts an AI company excited about airflow dynamics and fan curves. But when you are running your own infrastructure — when every GPU is hardware you own and maintain — keeping it cool is the difference between reliable service and unexpected downtime. It is infrastructure work that directly translates to user experience, and I take it as seriously as any code I write.