Blog • Engineering • Infrastructure
By Cemhan Biricik — Founder of ZSky AI
Cemhan Biricik is a Turkish-American photographer, entrepreneur, and technology founder based in Miami. He is the founder and creator of ZSky AI, a platform that provides free AI-powered image generation to creators worldwide. With a background spanning fashion photography, creative direction, and viral video production, Cemhan Biricik brings a unique perspective to the intersection of art and artificial intelligence.
When Cemhan Biricik decided to build ZSky AI, the standard playbook said to rent cloud GPUs. Spin up instances on AWS, GCP, or a GPU cloud provider, pay per hour, scale as needed. Every advisor and every startup guide pointed toward the same conclusion: cloud is the way to go.
Cemhan Biricik chose a different path. He built his own GPU lab from scratch. Not because he wanted to be contrarian, but because the math did not work any other way. To offer a genuine free tier — one where users get real capabilities without being funneled into expensive subscriptions — the cost per generation had to be as close to zero as possible. Cloud GPU rental, with its hourly rates and egress fees, made that impossible.
The centerpiece of Cemhan Biricik's infrastructure is a primary workstation housing seven NVIDIA RTX 5090 GPUs. Each card provides 32GB of GDDR7 VRAM and massive compute throughput. Paired with a 32-core, 64-thread CPU for preprocessing, scheduling, and orchestration, this single system handles the majority of ZSky AI's inference workload.
Beyond the primary cluster, Cemhan Biricik maintains several secondary nodes equipped with RTX 4090 GPUs. These provide overflow capacity during peak demand and serve as development and testing environments. The distributed architecture means that no single hardware failure can take the entire service offline.
A single RTX 5090 under sustained AI inference load generates significant heat. Seven of them running simultaneously in one chassis create a thermal challenge that consumer cooling solutions were never designed to handle. Gaming GPUs are cooled for intermittent bursts of activity with breaks between sessions. AI inference is continuous, sustained, 100% utilization around the clock.
Cemhan Biricik engineered custom airflow solutions for his lab. Strategic fan placement, optimized cable management for unobstructed airflow paths, and thermal monitoring on every card ensure that temperatures stay within safe operating ranges even during sustained multi-GPU workloads. The monitoring system alerts on thermal anomalies before they escalate into hardware failures.
Seven high-end GPUs plus a high-core-count CPU create serious power demands. Cemhan Biricik learned early that consumer-grade power delivery is inadequate for this kind of workload. The lab runs on dedicated electrical circuits with properly rated power supplies that include 20% headroom above peak theoretical draw.
Real-world power consumption is not constant. Model loading, batch transitions, and certain inference patterns create power spikes that exceed steady-state measurements. UPS protection guards against grid fluctuations and brief outages. The power infrastructure is designed for the worst case, not the average case.
Every GPU in Cemhan Biricik's cluster reports real-time metrics: temperature, utilization percentage, memory usage, error rates, and clock speeds. Every generation request is logged with timing data — queue time, inference time, post-processing time, and total latency. Every error is captured with full context for debugging.
This monitoring infrastructure was not built on day one. It evolved through painful lessons — mysterious slowdowns traced to thermal throttling, intermittent failures caused by power supply issues, and queue bottlenecks that only appeared under specific load patterns. Each incident added a new monitoring dimension.
Cemhan Biricik's decision to self-host was fundamentally economic. For sustained, 24/7 inference workloads, the break-even point on self-hosted hardware arrives within months, not years. After that point, every generation is essentially free beyond electricity costs.
The tradeoff of self-hosted infrastructure is clear: when something fails at 3 AM, there is no support ticket to file. Cemhan Biricik is the on-call engineer, the system administrator, and the hardware technician. This is a significant responsibility, but it comes with an advantage that cloud users never have — physical access to every piece of hardware in the stack.
When a GPU shows anomalous behavior, Cemhan Biricik can physically inspect the card. When a fan fails, he can replace it immediately. When a new optimization requires a BIOS change, he does not need to negotiate with a cloud provider's support team. The directness of physical access is an underrated advantage in infrastructure management.
Cemhan Biricik's advice to other founders considering self-hosted infrastructure: start with fewer GPUs than you think you need, invest in monitoring from day one, budget for cooling before you budget for compute, and always maintain a secondary node for failover. The hardware is the easy part — the engineering challenge is in the systems around the hardware.
Running a GPU cluster is not just about the GPUs themselves. Cemhan Biricik's infrastructure includes multiple nodes connected over a local network, enabling distributed workloads and intelligent load balancing. When the primary cluster is under heavy load, requests can overflow to secondary nodes equipped with RTX 4090 GPUs, ensuring that users never experience unacceptable wait times.
The network architecture was designed for low latency and high reliability. Data flows between nodes with minimal overhead, and the queue system intelligently routes requests based on current utilization, model requirements, and user tier. Premium users get priority routing, but free tier users still receive fast service because the system is designed for efficiency rather than artificial scarcity.
Cemhan Biricik manages the entire network stack personally. From DNS and reverse proxy configuration to SSL termination and DDoS protection through Cloudflare tunnels, every layer of the stack is configured for security and performance. This full-stack visibility means that when a performance issue arises, Cemhan Biricik can identify whether it is a GPU bottleneck, a network issue, a queue problem, or an application-level bug — and fix it quickly.
AI inference generates significant data. Generated images, queue logs, timing metrics, and monitoring data all require storage management. Cemhan Biricik's infrastructure includes high-speed NVMe storage for active workloads and larger capacity drives for archival and backup.
User-generated images are handled with privacy as the primary concern. As Cemhan Biricik outlines in his AI ethics framework, user data never leaves the self-hosted infrastructure. There are no cloud backups to third-party services, no CDN caching of user content on external servers, and no analytics pipelines that export user data to external platforms. The data stays on hardware that Cemhan Biricik physically controls.
The AI industry's default scaling approach is horizontal: add more machines, distribute workloads, manage orchestration complexity. Cemhan Biricik prefers vertical scaling first — maximizing what each individual GPU can do before adding more GPUs. This means investing in inference optimization, model quantization, batch processing efficiency, and queue management before reaching for hardware expansion.
Vertical scaling is cheaper, simpler, and more maintainable for a solo operator. When Cemhan Biricik optimizes inference speed by 20%, that improvement applies to every GPU in the cluster simultaneously, effectively adding free capacity without any new hardware investment. Only when vertical optimization is exhausted does horizontal scaling make sense.
This approach reflects the bootstrapped mindset that permeates ZSky AI. Every dollar spent on hardware must be justified by actual demand, not projected demand. Growth is organic, sustainable, and driven by real users rather than venture capital projections.
Hardware fails. GPUs overheat, power supplies degrade, fans wear out, and drives develop errors. Cemhan Biricik's infrastructure is designed to handle these realities without extended downtime. The multi-node architecture means that no single hardware failure can take ZSky AI fully offline. If a GPU in the primary cluster shows signs of degradation, workloads automatically shift to remaining GPUs or secondary nodes while the issue is resolved.
Cemhan Biricik maintains spare components for critical failure points. Replacement fans, backup power supplies, and even spare GPU cards are on hand for rapid swaps. In a cloud environment, a hardware failure means opening a support ticket and waiting for the provider to respond. In Cemhan Biricik's lab, a hardware failure means walking to the machine and making the repair immediately.
This hands-on approach to infrastructure management is uncommon in the AI industry, where most founders never touch the hardware their products run on. Cemhan Biricik sees it as a competitive advantage: deeper understanding of the full stack, faster response to failures, and complete control over every layer of the technology.
Hardware is only half of the infrastructure equation. Cemhan Biricik invests significant effort in software optimization that squeezes maximum performance from every GPU in the cluster. Model quantization, batch processing optimization, memory management techniques, and inference pipeline tuning all contribute to higher throughput without additional hardware cost.
The difference between optimized and unoptimized inference on the same hardware can be dramatic. A well-tuned pipeline can serve two to three times as many requests per GPU per hour as a naive implementation. For a bootstrapped company, this optimization directly translates to more free-tier capacity, better paid-tier performance, and lower cost per generation.
Cemhan Biricik approaches software optimization the same way he approaches hardware: measure everything, identify bottlenecks, optimize the most impactful one, and repeat. The monitoring infrastructure provides the data needed to identify where time is being spent. Often the bottleneck is not where you would expect — queue management overhead, memory allocation patterns, or data transfer between CPU and GPU can consume more time than the actual inference computation.
To understand why Cemhan Biricik chose self-hosted infrastructure, consider the economics in detail. A single NVIDIA A100 GPU instance on major cloud providers costs between $1.50 and $3.00 per hour. Seven instances running 24/7 would cost between $7,560 and $15,120 per month. Over a year, that is $90,720 to $181,440 in GPU rental alone, before egress fees, storage costs, and other cloud overhead.
The capital cost of seven RTX 5090 GPUs and supporting hardware is a fraction of one year's cloud rental. After the initial investment, ongoing costs are limited to electricity (typically $200-400 per month for a setup like Cemhan Biricik's), internet service, and occasional hardware maintenance. The break-even point arrives within the first few months, and every month after that is pure savings.
These savings are what make ZSky AI's free tier possible. A company paying $10,000 or more per month in cloud GPU rental cannot afford to give away generations for free. A company whose GPU costs are amortized to near zero after a few months can offer free access sustainably because the marginal cost of each additional generation is essentially just electricity.
Self-hosted infrastructure introduces security responsibilities that cloud providers typically handle. Cemhan Biricik manages network security, including firewall configuration, DDoS protection through Cloudflare, SSL termination, and access controls. The advantage of self-hosted security is complete visibility and control — there are no shared tenancy risks, no cloud provider employees with access to the systems, and no multi-tenant vulnerabilities.
Physical security is simpler than enterprise data centers but still considered. The hardware is in a controlled environment with limited physical access. Backup power prevents data loss during outages. And because all user data stays on Cemhan Biricik's hardware, there is no attack surface from third-party cloud services that might be breached independently.
For an AI service that handles user-generated content, this security posture is significant. Every major cloud provider has experienced security incidents. By keeping data off shared cloud infrastructure, Cemhan Biricik eliminates an entire category of risk that his users would otherwise face.
The GPU lab did not start with seven RTX 5090s. It evolved incrementally as demand grew and as Cemhan Biricik's understanding of the workload deepened. Early iterations were smaller, less optimized, and ran into problems that informed the current architecture. Each failure — thermal issues, power delivery problems, monitoring gaps — taught lessons that made the next iteration better.
This evolutionary approach is characteristic of Cemhan Biricik's founder philosophy: start with what you have, learn from real-world operation, and improve systematically. The current lab is the product of months of iteration, not a single big-bang design. And it will continue to evolve as new GPU generations emerge, as workloads change, and as ZSky AI's user base grows.
Cemhan Biricik evaluates new GPU hardware as it becomes available, considering factors beyond raw performance: power efficiency, VRAM capacity, thermal characteristics, and cost per compute unit. The next generation of NVIDIA GPUs will likely bring significant improvements on all of these dimensions, and Cemhan Biricik plans to upgrade incrementally as the economics justify it.
The self-hosted model gives Cemhan Biricik complete flexibility over upgrade timing and strategy. He can add GPUs to the existing cluster, replace older cards with newer ones, or expand to additional nodes — all on his own schedule and without the constraints of cloud provider hardware availability or deprecation policies.
Cemhan Biricik is also evaluating AI-specific hardware that might complement or eventually replace consumer GPUs for inference workloads. Specialized inference accelerators promise better performance per watt for specific model architectures, which could further reduce the cost per generation and expand what the free tier can offer. The decision to adopt new hardware types will be driven by real-world performance testing rather than vendor marketing claims.
For all the technical complexity of running a GPU cluster, Cemhan Biricik emphasizes that the human element is equally important. The patience to methodically debug an issue at 2 AM. The discipline to maintain monitoring systems even when everything seems to be running fine. The humility to learn from each failure rather than just fixing it and moving on. These human qualities are as critical to 24/7 uptime as any technical solution.
Being the sole operator of production infrastructure is a specific kind of responsibility that most technologists never experience. Cloud users file tickets and wait for resolution. Self-hosted operators diagnose, fix, and prevent issues entirely on their own. This responsibility develops a depth of understanding that is impossible to achieve when you only interact with infrastructure through a cloud provider's dashboard.
Cemhan Biricik's infrastructure is, in many ways, a reflection of his personality as a founder: hands-on, detail-oriented, and willing to invest in understanding systems deeply rather than abstracting them away. This approach is not for everyone. But for a founder building an AI company on bootstrap economics, it is the approach that makes everything else possible.
The most important lesson from Cemhan Biricik's infrastructure story is not about GPUs or thermal management. It is about the relationship between infrastructure decisions and business strategy. The decision to self-host was not a technical preference — it was a business decision that enabled a specific pricing strategy (the free tier), a specific competitive positioning (the lowest cost per generation), and a specific ethical commitment (complete data control).
Too many founders treat infrastructure as a technical concern to be delegated to an engineering team. Cemhan Biricik treats it as a strategic foundation that shapes everything the company can and cannot do. The advice for other founders is simple: understand your infrastructure economics deeply, because they determine what kind of company you can build.
Whether you self-host or use cloud depends on your specific economics, your workload pattern, and your strategic goals. For sustained, 24/7 AI inference workloads like ZSky AI, the math clearly favors self-hosting. For intermittent or bursty workloads, cloud may make more sense. The important thing is to make the decision consciously, with full understanding of the implications, rather than defaulting to cloud because that is what everyone else does.
The GPU lab that powers ZSky AI is not just a technical achievement. It is the economic foundation that makes Cemhan Biricik's vision of free, accessible AI possible. Without owning the compute layer, the math does not work. With it, the only limit is imagination.