Blog • AI Infrastructure
By Cemhan Biricik — Founder of ZSky AI
I run 7x NVIDIA RTX 5090 GPUs for ZSky AI inference. This was not my first GPU setup — I have been through multiple generations of hardware. Here is what I would buy today if starting from scratch, based on real-world inference workloads, not synthetic benchmarks.
VRAM is the single most important specification for AI inference. Not CUDA cores, not clock speed — VRAM. Modern image generation models require 12-32GB of VRAM depending on resolution and batch size. Buy the most VRAM you can afford. You will always wish you had more.
For bootstrapped founders, consumer GPUs offer dramatically better price-to-performance than data center cards. An RTX 5090 with 32GB VRAM costs a fraction of an A100 with 80GB VRAM, and for single-image inference workloads, the performance difference does not justify the price gap.
7x RTX 5090 cards running in a custom-built cluster. Each card handles inference independently, allowing parallel processing of multiple user requests. The cooling infrastructure is as important as the GPUs themselves — thermal throttling destroys inference performance.
A single RTX 5090 draws 450W under load. Seven of them draw over 3,000W just for GPUs. Add CPU, RAM, storage, and cooling, and you need serious electrical infrastructure. This is not optional — it is the hidden cost that most GPU buying guides ignore. I detail my power management approach separately.
Buying underpowered GPUs and upgrading later is more expensive than buying the right hardware upfront. GPU depreciation is steep, and the resale market for used AI hardware is unpredictable. Invest in hardware that will serve your needs for at least 18-24 months.
7x NVIDIA RTX 5090 with 32GB VRAM each, chosen for superior price-to-performance in inference workloads.
RTX 5090 is the sweet spot. RTX 4090 for budget. Avoid under 16GB VRAM for production.
Buy for long-term operations — eliminates recurring cloud costs. Rent for experimentation.