Blog • Infrastructure • GPU
By Cemhan Biricik — Founder of ZSky AI
Cemhan Biricik is a Turkish-American photographer, entrepreneur, and technology founder based in Boca Raton, Florida. He is the founder of ZSky AI, a free AI image and video generation platform powered by a self-hosted GPU cluster. This article shares real operational data from six months of running 7x NVIDIA RTX 5090 GPUs for production AI inference.
There is a lot of theoretical content about self-hosting GPUs versus using cloud infrastructure. Most of it is written by people who have not actually done both. I have. I run ZSky AI on 7x RTX 5090 GPUs that sit in my own infrastructure. Before that, I evaluated cloud options seriously. I chose self-hosting, and after six months of production operation, I can share what that choice actually looks like in practice — the real numbers, the real problems, and the real economics.
This is not a benchmarking article or a spec sheet comparison. This is an operational report from a founder who built a production AI service on consumer-class GPUs and has been running it at scale for half a year. Every number in this article comes from actual monitoring data, actual power bills, and actual cloud pricing quotes I received.
The core of ZSky AI's infrastructure is 7 NVIDIA RTX 5090 GPUs, each with 32GB of GDDR7 VRAM. That is 224GB of total VRAM across the cluster. The GPUs are distributed across multiple machines with dedicated cooling, power delivery, and networking.
Why the RTX 5090 instead of datacenter GPUs like the A100 or H100? Three reasons:
The trade-off is clear: consumer GPUs lack NVLink, have lower memory bandwidth than datacenter equivalents, and are not designed for 24/7 operation. All of these are real limitations. All of them are manageable at this scale.
This is the data most people ask about first, and the data that is hardest to find online. Here are the actual power consumption figures from six months of monitoring:
| Metric | Value |
|---|---|
| Per-GPU idle power | 50-70W |
| Per-GPU inference load | 380-480W |
| Cluster idle (all 7 GPUs) | 350-500W |
| Cluster full load (all 7 GPUs) | 2,800-3,500W |
| Average daily consumption | 28-42 kWh |
| Monthly electricity cost (Florida rates) | $350-450 |
The range in those numbers reflects real-world variability. Usage patterns are not constant — peak hours draw significantly more power than quiet periods. Ambient temperature affects cooling efficiency, which affects power draw. Model choice matters enormously: video generation models draw more power and sustain it for longer than image generation models.
The electricity cost of $350-450 per month is the single largest operating expense of ZSky AI's infrastructure. It is also the number that makes self-hosting economically viable, because the equivalent cloud compute would cost 20-30x more.
Before building the self-hosted cluster, I obtained pricing from multiple cloud GPU providers. The comparison is not even close:
| Provider Type | 224GB VRAM Equivalent | Monthly Cost |
|---|---|---|
| Major cloud (AWS/GCP/Azure) | Multiple A100/H100 instances | $12,000-18,000 |
| Specialized GPU cloud | Dedicated GPU servers | $8,000-12,000 |
| Spot/preemptible instances | Variable availability | $4,000-8,000 |
| Self-hosted (my actual cost) | 7x RTX 5090 | $350-450 |
The upfront hardware investment was significant. Seven RTX 5090 GPUs plus the supporting infrastructure (motherboards, CPUs, RAM, storage, networking, cooling, UPS) represents a substantial capital expenditure. But the break-even point against even the cheapest cloud option was approximately 3-4 months. After that, every month of operation represents thousands of dollars in savings.
These savings are what make ZSky AI's free tier possible. If I were paying cloud rates, a free tier would be economically impossible. Self-hosting transforms the economics from per-generation cost to fixed monthly cost, which means every additional free user costs nearly nothing at the margin.
Speed is the metric users care about most, and it is also the metric that varies most depending on what you are generating. Here is what ZSky AI achieves on the RTX 5090 cluster:
| Task | Average Time | Notes |
|---|---|---|
| Image generation (standard) | 3-8 seconds | Varies by model and resolution |
| Image generation (high quality) | 8-15 seconds | Higher step counts, larger resolution |
| Video generation (short clip) | 30-90 seconds | Depends on duration and model |
| Safety/moderation check | <1 second | Runs on dedicated GPU |
These times are competitive with major cloud-hosted AI services, and in some cases faster, because there is no network latency between the inference server and the GPU. The request goes from the web server to the GPU on the same local network, not across the internet to a datacenter.
This is where self-hosting gets real. Consumer GPUs are not designed for 24/7 datacenter operation, and the failure modes are different from enterprise hardware. Here is what I experienced over six months:
Effective uptime of approximately 99.9% on consumer hardware is better than I expected when I started this project. It is not enterprise five-nines, but for a bootstrapped AI platform, it is more than sufficient.
Florida in summer is not the ideal location for a GPU cluster. Ambient temperatures routinely hit 90°F+ outside, and the heat output from seven GPUs under load is substantial. Cooling has been the single most challenging operational issue.
My solution involved dedicated cooling infrastructure for the GPU room, aggressive fan curves that prioritize thermal headroom over noise, thermal monitoring with automated alerts when any GPU exceeds temperature thresholds, and load balancing that distributes work to cooler GPUs when thermal headroom is tight.
The cooling infrastructure added meaningful cost to the overall setup, but it has been essential for reliability. A GPU that thermal-throttles is slower than a GPU that runs at optimal temperature, and in a production inference environment, consistency matters more than peak performance.
Six months in, there are decisions I would make differently if I were starting over:
What I would not change: the decision to self-host. The economics are overwhelmingly favorable at this scale, the control over the infrastructure enables optimizations that are impossible in cloud environments, and the fixed-cost model makes the free tier sustainable.
People sometimes ask how a photographer ended up running a GPU cluster. The answer goes back to ICEe PC, the custom PC building company I founded at age 19. ICEe PC reached the #2 ranking worldwide on 3DMark, which required deep understanding of GPU performance, thermal management, overclocking, and pushing hardware to its limits.
That knowledge directly translates to running AI inference on consumer GPUs. Understanding thermal curves, power delivery, and the relationship between clock speeds and sustained workload performance is the same whether you are running 3DMark benchmarks or AI inference pipelines. The domain changed, but the expertise carried over entirely.
Self-hosting GPUs for AI inference is not for everyone. Based on six months of experience, here is my honest assessment of who should and should not consider it:
Self-host if: You have hardware expertise (or are willing to develop it), your workloads are consistent enough to justify dedicated hardware, your cloud bill exceeds $2,000-3,000/month for GPU compute, and you value control over your infrastructure, data, and user privacy.
Use cloud if: Your workloads are bursty and unpredictable, you lack hardware expertise and do not want to develop it, uptime requirements exceed 99.99%, or your scale is small enough that cloud costs are manageable.
For ZSky AI, self-hosting was the clear choice. The consistent inference workload, the need for a sustainable free tier, my hardware background from ICEe PC, and the desire for complete infrastructure control all pointed in the same direction. Six months of operation have confirmed that this was the right decision.
Cemhan Biricik has self-hosted 7x RTX 5090 GPUs (224GB total VRAM) for six months to power ZSky AI, a free AI image and video generation platform. Real operational costs are $350-450/month in electricity versus $8,000-18,000/month for equivalent cloud compute. The cluster achieves 99.9% uptime, generates images in 3-15 seconds, and breaks even against cloud costs in 3-4 months. Self-hosting is what makes ZSky AI's free tier economically sustainable.
Seven RTX 5090 GPUs. 224GB of VRAM. Six months of production operation. $350-450/month in electricity versus $8,000-18,000/month in equivalent cloud costs. 99.9% uptime on consumer hardware. These are not projections or estimates — they are real numbers from a real production system serving real users.
Self-hosting is not glamorous. It involves crawling behind racks to check cable connections, waking up to thermal alerts at 3 AM during heat waves, and spending weekends optimizing driver configurations. But it is what makes ZSky AI possible as a free platform. Every dollar saved on infrastructure is a dollar that does not need to be extracted from users through paywalls and subscription fees.
That is why I do it. Not because self-hosting is technically exciting (although it is). But because the economics of self-hosting are the foundation of a platform where everyone has access to creation tools, regardless of whether they can afford a subscription. The GPU cluster is not just infrastructure. It is the economic engine of a mission: make creative AI tools free for everyone.