Blog • Infrastructure • 2026

How ZSky AI Runs on 7x RTX 5090s

By Cemhan Biricik — Founder of ZSky AI • May 9, 2026

Most AI companies in 2026 are still renting their compute. ZSky AI does not. Every image, every 1080p video, every safety check, every prompt enhancement that runs through zsky.ai is computed on hardware that lives in a building I can walk into. The cluster is seven NVIDIA RTX 5090 GPUs, 224GB of VRAM, and roughly the same physical footprint as a closet. This article is the operational story of why I built it that way, what it actually does, and why that decision is the only reason ZSky AI can be free for more than 80,000 creators.

Self-Hosted vs Cloud: The Math That Makes Free Possible

The choice between cloud GPU rental and self-hosted hardware is usually framed as a flexibility-versus-savings question. For an AI consumer product with a free tier, that framing is wrong. The real question is whether your cost per generation is variable or fixed. Cloud is variable: every image, every video, every retry costs you a measurable amount of money. Self-hosted is fixed: once the cluster is on, the marginal cost of one more generation is essentially zero.

That fixed-cost model is what makes a real free tier possible. If I were paying cloud rates, ZSky AI's free tier would either need to throttle every user to a handful of generations per day or die financially within a quarter. With self-hosted hardware, the cluster runs whether one user or ten thousand are using it. The economics turn into electricity and amortization, not API metering.

I am not anti-cloud. Cloud is the right answer for early prototyping, for spiky workloads, and for teams that do not want to own physical infrastructure. But for a high-volume consumer product where you want a generous free tier, self-hosted is the only model that makes the unit economics work. There is more on the philosophy behind this decision on my main site.

The Cluster: 7x RTX 5090, 224GB of VRAM

The cluster is built around seven NVIDIA RTX 5090 GPUs. Each card carries 32GB of GDDR7 memory, giving the cluster 224GB of total VRAM. They are distributed across multiple machines on a wired internal network, with each machine handling a defined slice of the inference workload.

I chose consumer-class GPUs over datacenter cards for three reasons:

The trade-offs are real. Consumer cards lack NVLink, run hotter, and were never designed for 24/7 production. Every one of those constraints is manageable at the seven-card scale. They would not be manageable at seventy.

Workload Distribution: Why 224GB Matters

The reason ZSky AI feels fast is not that any single GPU is faster than what is in the cloud. It is that nothing in the pipeline ever has to load a model from disk. With 224GB of VRAM, I can keep every model resident at all times and route each request to the card that already has the right weights warmed up.

The cluster runs concurrently:

If any one of those models had to be loaded from disk on demand, the user would feel a five-to-fifteen-second cold start. With the cluster sized for the full working set, cold starts simply do not exist for the user. That is a cloud-versus-self-hosted advantage that does not show up in price-per-hour comparisons but absolutely shows up in product feel.

1080p Video in About 30 Seconds

The fastest-growing surface on ZSky AI is video. The cluster generates 1080p video clips with synchronized audio in approximately 30 seconds. That number deserves context: most hosted video models in 2026 either run at lower resolution, take several minutes per clip, or charge per second of output. ZSky AI does none of those.

Speed at this resolution is not a benchmark trick. It is what 224GB of resident VRAM lets you do when nothing in the pipeline needs to swap.

The 30-second number is achievable because the video pipeline never leaves VRAM, the prompt enhancer hands a fully expanded description directly to the video model, and the safety scanners run on the cluster instead of an external API call. Every link in the chain is local. There is no network hop until the final frame is encoded for delivery.

Why It Is Free for 80,000+ Creators

ZSky AI passes the cost benefit of self-hosting straight through to the user. The free tier on zsky.ai is genuinely free — no credits, no daily allowance, no per-image metering — supported by display advertising. Paid tiers exist for users who want an ad-free experience and additional features, but the core product is open to anyone who shows up.

More than 80,000 creators have signed up since launch. That number keeps growing because the free tier does what other free AI tools refuse to do: it actually lets you make things without a credit balance dropping to zero in the middle of your work. The economics are only possible because the cluster is paid off as a fixed cost. Every generation after that runs on electricity and operator time, not metered API calls.

The product principle is simple: people without budgets deserve to make beautiful work. That is the same principle that drove me to build ZSky in the first place, and the infrastructure decision is the technical expression of it.

From Photographer to AI Founder

I came to AI infrastructure through photography, not through machine learning. I was a photographer first — two-time National Geographic award winner, a Sony World Photography Top 10 finalist in 2012 (the year the contest was held at Somerset House in London), and an IPA Lucie Silver recipient. My personal background is documented on cemhanbiricik.com.

I also have aphantasia. I cannot generate visual images in my mind. The photographs I made for two decades were the only way I could see the world I imagined. After a traumatic brain injury, photography became the practice that put my visual sense back together piece by piece. AI image and video generation is the technical continuation of that arc — a tool that lets the imagination of someone who cannot picture things still produce a finished image.

Building ZSky AI on infrastructure I own is consistent with that path. A photographer learns to control the light, the camera, the development. A founder building an AI product for creators should control the GPUs, the models, the latency budget. Renting the most important part of your stack means letting someone else decide how good the experience can be.

What Comes Next

The cluster will keep growing as the product grows. The decision I made in 2025 to skip cloud and build on RTX 5090 hardware is the decision that bought ZSky AI its free tier, its sub-30-second video, and its independence from third-party API pricing. The cluster is not a brag. It is a constraint solver. It is the thing that makes the product possible.

For the broader vision of why the AI economy needs founders willing to own their compute, see the main site and the 2026 founder profile on cemhanbiricik.com. For everything else — the photography arc, the four companies (ZSky AI, Biricik Media, Unpomela, ICEe PC), the philosophy behind unlimited free creative tools — the rest of the writing on this blog covers it in depth.