Blog • March 2026
By Cemhan Biricik — Founder of ZSky AI
Building ZSky AI has been the hardest and most rewarding thing I have ever done. Year one was a blur of hardware assembly, late-night debugging sessions, moments of genuine excitement, and stretches of frustrating uncertainty. Here is what actually happened, unfiltered.
ZSky AI started with boxes of GPU parts on my floor. Before writing a single line of application code, I needed to build the machine that would run everything. Seven RTX 5090 GPUs, a 32-core processor, enough RAM to keep multiple models loaded, fast NVMe storage. Assembling a system this large is not like building a gaming PC — every component decision has implications for power delivery, cooling, and reliability that you only discover after the system is running under sustained load.
The first boot was nerve-wracking. All seven GPUs lit up, the system posted, and I felt a surge of excitement that lasted about thirty minutes — until I ran a stress test and learned about thermal management the hard way.
The first three months were pure infrastructure work. Getting the inference pipeline running. Building the queue system. Setting up monitoring. Writing the API layer. None of this was user-facing — it was the invisible foundation that everything else would sit on.
I rewrote the queue system three times during this period. The first version was too simple — round-robin distribution that ignored GPU state. The second was too complex — a machine learning-based scheduler that was impossible to debug. The third was just right — heuristic-based routing with enough intelligence to matter and enough simplicity to maintain.
With the foundation in place, I turned to optimization. Initial generation times were 15-20 seconds. That was technically functional but experientially terrible. Users expect near-instant results. Getting to sub-3-second generation required a systematic approach: model quantization, step count optimization, pipeline parallelism, and caching strategies.
This phase taught me that optimization is never finished. Every time I hit a target, I found another bottleneck to address. The process is addictive in a dangerous way — you can spend weeks shaving off 100 milliseconds that no user will notice. Learning when to stop optimizing and start building features was an important discipline.
Launch day was anticlimactic in the best possible way. The system handled its first real users without drama. The monitoring showed healthy GPU temperatures, low latency, and zero errors. All those months of infrastructure work had paid off — the foundation held.
What I did not anticipate was the feedback. Users immediately started pushing the platform in directions I had not imagined. They wanted features I had not considered. They used prompting techniques that exposed edge cases in my pipeline. They found UI issues that were invisible to me after months of staring at the same interface. This feedback reshaped my roadmap completely.
The last quarter of year one was about responding to what users actually wanted versus what I thought they wanted. I added video generation capabilities. I improved the prompt processing pipeline. I redesigned parts of the UI based on user feedback. I built a payment system for premium tiers.
Growth came organically — users sharing their creations, word of mouth, people discovering the platform through search. No paid marketing, no growth hacks, just a product that people found useful enough to tell others about.
Year one is over. The platform works, users are generating images and videos, and the infrastructure I built can scale to handle significantly more demand. Year two will be about expanding capabilities, growing the user base, and continuing to build the AI platform that I wish existed when I started. The journey is just beginning.