The gold rush for AI compute has a new bottleneck. It isn't just about training massive models anymore; it’s about the cost and speed of running them once they are live. While Nvidia’s GPUs have dominated the conversation, a new startup called General Compute is betting that the future of inference—the phase where AI actually talks to users—belongs to a different class of hardware.
General Compute just raised $15 million in a seed round led by FUSE VC. The company isn't building chips. Instead, it is building a "neocloud" designed to rent out processing power specifically for inference. They are skipping the standard Nvidia H100s in favor of specialized silicon from SambaNova, an Intel-backed chipmaker that has largely stayed out of the recent headlines.
It is a contrarian play. The industry has spent two years obsessed with training power, but the economics of AI are shifting toward the cost-per-token. If General Compute is right, the next Cerebras won't be a chipmaker at all, but the cloud provider that figures out how to run these models faster and cheaper than anyone else.
The Inference Problem
Training is expensive. Inference is constant. When a model is training, it consumes massive power for weeks. When it is live, it needs to respond in milliseconds. GPUs are general-purpose workhorses, but they are inefficient for the specific, repetitive math required to serve a chatbot or an autonomous agent.
SambaNova’s architecture is different. It is built to store more context in memory, which is the secret sauce for faster inference. CEO Finn Puklowski claims these chips can generate 600 to 700 tokens per second. For context, standard GPUs typically hover around 250.
That speed is not just for show. It is a business requirement. As AI shifts from simple chatbots to agent-to-agent workflows—where software agents ping databases and read documents on our behalf—latency becomes the primary enemy. If an agent takes a minute to think, the workflow breaks. If it takes five seconds, it scales.
Infrastructure Without the Overhaul
General Compute has another trick. Their hardware is air-cooled. Most modern AI data centers require massive, expensive liquid-cooling retrofits to handle the heat generated by high-end GPUs. By sticking to air-cooled chips, General Compute can drop its hardware into existing facilities.
They are even looking at crypto-mining sites. These facilities have the power and the space, but the economics of mining have soured. Puklowski is essentially offering a second life to these stranded assets. It is a clever way to bypass the multi-year wait times for new data center construction.
What This Means for Developers
For developers, this is a signal to watch the inference layer. If General Compute can deliver on its promise of 700 tokens per second, the cost of running complex agents will plummet. This changes the math for every startup building on top of LLMs.
Investors like Joe Hasselmann, who backed Groq early, see clear parallels here. He views the partnership between General Compute and SambaNova as a symbiotic relationship. SambaNova needs a high-growth environment to prove its silicon, and General Compute needs a performance edge to compete with the hyperscalers.
The Competitive Landscape
The market is fragmenting. We are moving toward a world of multiple models and specialized agents. No single provider will dominate every use case. Companies like OpenRouter are already helping developers switch between models to optimize costs. Speed is the final variable in that equation.
General Compute is not the only player in this space, but it is the first to go all-in on SambaNova’s current generation of chips. They have $300 million worth of hardware on order. That is a massive bet for a seed-stage company.
Key Takeaways
- General Compute is betting that specialized inference chips from SambaNova will outperform Nvidia GPUs for live AI applications.
- The company is targeting existing data centers and repurposed crypto-mining facilities to avoid the massive infrastructure costs of liquid cooling.
- The goal is to drive inference speeds to 700 tokens per second, enabling complex agent-to-agent workflows that are currently too slow to be practical.
The Next Decision Point
General Compute has launched its cloud offering, but the real test is just beginning. They need to prove that their hardware can handle the reliability requirements of enterprise clients. The company’s next major milestone will be the deployment of their full order of SN50 chips. By then, we will know if they have truly found the next big thing in AI, or if the market is too tethered to Nvidia to change course.