Before we signed a single client, we ran our own system on ourselves.

Not a demo. Not a test batch. The full thing — dedicated domains, warmed mailboxes, multi-stage pipeline, AI-written emails, QA rules, the works. We spent three weeks building it and six weeks running it before we ever pitched anyone else.

Today we published the full case study with real numbers. This post is about the lessons behind those numbers — the things we learned by eating our own cooking that changed how we build for clients.

Why We Did It This Way

Running your own system first forces you to feel every pain point your clients will feel. You can't hand-wave through deliverability problems or ICP scoring gaps when they're costing you meetings.

Most agencies pitch a process they've never personally endured. They build systems, hand them to clients, and troubleshoot from a distance. We wanted the opposite. We wanted to feel the domain warmup. We wanted to watch the early campaigns underperform. We wanted to stare at reply rate data wondering if the system was working or broken.

There's a reason software companies dogfood their own products. You find bugs faster. You develop opinions. You stop guessing about what matters and start knowing.

The Infrastructure Nobody Talks About

Three sending domains, six mailboxes, two weeks of warmup before sending a single real email. The infrastructure behind outbound is invisible work, but it determines whether your emails reach inboxes or spam folders.

We registered three dedicated sending domains and set up two mailboxes on each. That gives us six mailboxes sending 30 emails per day, for 180 daily sends. We only send Monday through Thursday. No Fridays, no weekends. That puts weekly capacity at 720 emails.

None of it mattered until we finished warmup. Two full weeks of gradually ramping send volume before a single real prospect email went out. Skipping warmup is one of the five things that kills outbound campaigns in the first 30 days — and we've seen clients elsewhere learn that the hard way.

The boring infrastructure work is the foundation. Get it wrong and nothing else matters. Get it right and nobody ever notices.

The First Campaigns Were Terrible

Month one is calibration, not celebration. Our early reply rates were low, targeting was too broad, and the emails were decent but not dialed in. That's normal — and it's the most important thing we tell new clients.

I want to be honest about this because it's the part everyone hides.

The first three campaigns were rough. Reply rates were below where we wanted them. The ICP was defined but not sharp enough. The AI was writing emails that were technically good but not specific enough to stand out.

This is the part of outbound that breaks most founders. They run one campaign, see low numbers, and conclude the approach doesn't work. I've spent 10+ years watching this cycle repeat. The first batch is never the one that proves the model. It's the one that teaches you what to fix.

We adjusted the ICP scoring to be more aggressive. We tightened the urgency signal filters so prospects without clear buying intent got cut. We refined the QA rules to catch emails that were too generic. By campaign four, something clicked.

What Actually Moved the Numbers

Three things made the difference: cutting 60% of prospects through strict ICP filtering, enforcing 15+ QA rules on every email, and making each follow-up email stand on its own instead of referencing the previous one.

The single biggest improvement came from ICP filtering. We started removing about 60% of the prospects that passed our initial criteria. Companies without urgency signals — no recent funding, no SDR job postings, no leadership changes — got cut regardless of how well they fit on paper.

That felt aggressive. It meant sending fewer emails to a smaller list. But the reply rates on the remaining prospects roughly tripled. Turns out 200 emails to the right people outperform 500 emails to a broad list every time. This is the same principle behind measuring outbound by reply rates instead of send volume.

The QA system was the second lever. We enforce 15+ rules on every email: no unverified claims, no fabricated details, no pricing in the first two touches, natural opt-out language, hard cap of 150 words. About 10-15% of emails get caught and rewritten. That rejection rate is a feature.

Third was follow-up structure. Each email in the sequence stands alone with new value. No "bumping this to the top of your inbox." No "per my last email." If a prospect only reads email three, it should still make sense and give them a reason to reply.

The Results

At steady state: 3-8% reply rates, 1-4% positive reply rates, 5-15 qualified meetings per month. For a bootstrapped agency with no sales team, this is the pipeline that makes the business work.

At steady state, we're seeing reply rates between 3-8% and positive reply rates between 1-4%. That translates to 5-15 qualified meetings per month. For a bootstrapped agency with zero sales hires, that's pipeline we wouldn't have without the system.

The full breakdown is in the case study, including infrastructure numbers, how the pipeline works stage by stage, and the specific learnings we took from each phase.

But the numbers aren't the real point. The real point is that we know exactly what the first two months feel like for a client because we lived them ourselves. When a client's reply rates are low in week three, we don't panic. We've seen that movie. We know what comes next.

What This Changed About How We Work

Running the system on ourselves changed three things about our client work: we set honest expectations for month one, we built a sharper QA process, and we stopped measuring anything other than reply rates until month two.

Three things changed permanently:

First, we set different expectations with clients upfront. Month one is calibration. The goal is replies, not meetings. If you're measuring pipeline ROI in week two, you're diagnosing the wrong thing.

Second, the QA system got better. Every edge case we found in our own emails became a new rule. The system today catches things we never would have anticipated if we'd only run it for other people.

Third, we stopped chasing vanity metrics. Reply rate is the only number that matters in month one. Everything else — meetings, pipeline value, closed revenue — follows from there. Trying to optimize for downstream metrics before the fundamentals are locked in is the same mistake as having your SDRs spend 60% of their time on research instead of selling.

Read the Full Case Study

We published everything: the infrastructure setup, how the pipeline works, the results at steady state, and the three biggest lessons. No cherry-picked numbers. No fabricated metrics. Just what actually happened when we pointed our own system at ourselves and hit send.

Read the full case study: How Agentic Demand Books Its Own Meetings →

Related: How AI Outbound Actually WorksOutbound Sales Metrics That Actually MatterThe Founder's Outbound Playbook

Want the same system running for your company?

We'll build and run your AI-powered outbound engine. Research, scoring, writing, sending, follow-ups — we handle all of it. You focus on closing.

Book a Discovery Call