The Default Trap

Issue #001 · NebulaPeer · GPU Infrastructure Intelligence for AI Startup Founders

Welcome to the first issue of NebulaPeer.

If you're building an AI startup, GPU compute is one of your highest costs — and almost certainly one of your least examined ones. This newsletter exists to change that.

Every issue we publish independent price comparisons, provider guides, and practical infrastructure frameworks. No vendor influence. No affiliate pressure. Just the honest picture.

Let's get into it.

The default trap

There's a decision most AI startups make early that nobody ever revisits.

It happens fast — usually in the first week of building. Someone needs GPU compute. The path of least resistance is obvious. AWS is familiar. The documentation is everywhere. The sales rep already called. The credit card is already on file from the last SaaS tool.

So you spin up an instance and move on.

That decision — made in about ten minutes — will cost most AI startups somewhere between $30,000 and $100,000 per year in unnecessary spend.

Not because AWS is bad. Because it was never questioned.

Why the default is so expensive

The GPU cloud market has three distinct tiers. Most AI startups only ever see the first one.

Tier 1 — Hyperscalers AWS, Google Cloud, Microsoft Azure. The household names. Enterprise-grade SLAs, enormous support teams, and pricing to match. These platforms were built for enterprise workloads with enterprise budgets. Early-stage AI startups are not their primary customer — but they're happy to charge them enterprise rates anyway.

Tier 2 — Specialist AI clouds CoreWeave, Lambda Labs, Crusoe Energy. Purpose-built for AI and ML workloads. Better GPU selection, more ML-native tooling, and meaningfully lower prices than hyperscalers. These providers exist specifically because the hyperscaler pricing model doesn't serve AI teams well. Most well-informed AI teams end up here.

Tier 3 — GPU marketplaces Vast.ai, RunPod, TensorDock. Aggregated underutilised GPU supply from data centres and independent operators worldwide. The largest price gap versus hyperscalers — often dramatic for identical hardware specs. Spot-style pricing models make these ideal for training runs, fine-tuning jobs, and any batch workload that can tolerate occasional interruptions.

The cost difference between Tier 1 and Tier 3 for identical GPU hardware is consistently significant — and has been for years. For a startup doing regular training runs, the gap is real money that compounds every single month.

Most AI startups live in Tier 1. The smart ones are in Tier 2 or 3.

The anatomy of the default

So why does the default stick? It's not ignorance. The founders and engineers making these decisions are smart people. It's a set of specific, understandable forces that point in the same direction:

Familiarity bias. AWS documentation is everywhere. Stack Overflow answers assume AWS. Your previous job used AWS. The friction of learning a new provider feels high even when the savings are substantial.

The engineer's optimisation. ML engineers optimise for velocity and reliability — not cost. Their job is to get models trained and deployed, not to spend time comparing cloud bills. This is completely rational from their perspective. But it means cost is rarely part of the infrastructure conversation.

No forcing function. At $3,000/month, the bill feels manageable. At $10,000/month, it still fits in the budget. The pain rarely gets sharp enough to trigger a review until it's very large — and by then switching has switching costs of its own.

Vendor inertia. Once your team is familiar with a provider's console, your billing is set up, your scripts reference their CLI, and your data is in their object storage — leaving feels harder than it actually is.

None of these forces is irrational. Together, they create a remarkably sticky trap.

The three questions worth asking this week

You don't need to become a GPU infrastructure expert to break out of the default trap. You need to ask three questions:

1. Have we compared prices for our specific workload across at least two other providers?

Not hypothetically. Actually pulled the pricing page and done the arithmetic. If the answer is no, that comparison is worth 30 minutes of someone's time this week.

2. Does our workload actually require always-on dedicated compute, or can it tolerate interruptions?

Most training jobs can tolerate interruptions with proper checkpointing. If yours can, you have access to significantly cheaper spot and interruptible pricing tiers that you may not be using.

3. What does our GPU spend look like at 3x and 10x current usage?

Infrastructure costs that feel comfortable at current scale can become a crisis at growth. Modelling this now — before you're in it — gives you options.

The one move to make today

If you take one thing from this issue, make it this: pull your last three GPU cloud invoices and calculate your effective cost per GPU hour.

Then look up the same GPU spec on Vast.ai or RunPod and see what the marketplace rate is.

That gap — whatever it turns out to be — is your number. It's the annual savings available to you if you're willing to spend a few days migrating your training pipeline to a cheaper provider.

For most AI startups reading this, that number is significant enough to fund a meaningful piece of their roadmap.

What's coming in Issue #002

Next issue we'll map the full GPU cloud landscape — every major provider in each tier, what they're actually good at, and a simple decision framework for matching your workload to the right provider.

If you found this useful, share it with one founder who's building with AI. That's the best way to help NebulaPeer grow.

NebulaPeer is independent. No GPU provider funds our editorial. No affiliate relationship shapes our recommendations.

Questions, feedback, or a GPU cost story worth sharing? Reply to this email.

nebulapeer.com · Unsubscribe