The "AI features in production" problem
We see the same 5 failure modes on every founder's AI integration that we are asked to clean up:
- No prompt caching. Sending the same 8,000-token system prompt with every request costs ~10x what it should. Anthropic and OpenAI both support prompt caching that cuts repeated context costs by 90%. Most integrations we see have it off because the docs are not obvious.
- No per-user spend caps. A single bad actor (or a single buggy frontend loop) can rack up $1,000-10,000 in API costs before you notice. We add per-user daily and monthly caps stored in your database, checked before every call.
- No streaming. Users wait 30 seconds for a response that could have been streaming character-by-character starting at 500ms. They bounce. We use the Vercel AI SDK to make streaming a one-liner.
- No fallback. When Anthropic has a 30-minute outage (it happens), your AI feature is dead. We use Vercel AI Gateway or custom router code to automatically fall back to a secondary provider (OpenAI or Google) without your users noticing.
- No prompt injection defense. User-controlled text gets concatenated directly into the system prompt, and an attacker extracts your hidden instructions or executes unintended actions. We use isolation patterns and output allowlists.
What you get
- The AI feature, working. Chat, generation, summarization, classification, RAG over your data — whatever you scoped.
- Prompt caching configured for Anthropic and OpenAI. Documented in your runbook.
- Per-user spend caps stored in your database, enforced on every call. Configurable thresholds.
- Streaming UI using Vercel AI SDK so responses build live as users watch.
- Cost monitoring with daily and monthly alerts wired to your email (optionally Telegram).
- Provider fallback via Vercel AI Gateway or custom router so an outage on one provider does not kill the feature.
- Prompt injection defense via input isolation and output allowlists for any action with side effects.
- Audit logs of every request and response, retained 30 days, so you can debug bad outputs.
Pricing
- Single chat feature — $2,500-4,000. Conversation history, streaming UI, cost controls. 1-2 weeks.
- RAG over your data — $5,000-8,000. Embeddings + vector search (Supabase pgvector or Pinecone) + retrieval prompt. 2-3 weeks.
- Multi-step agent — $8,000-12,000. Tool calling, workflow orchestration, human-in-the-loop confirmations. 3-4 weeks.
- Maintenance — $750/month. Prompt iteration, cost optimization, model upgrades when new ones ship. Cancel anytime.
Which provider should you use?
We default to Claude (specifically Claude 4.7 Opus for reasoning-heavy workloads, Sonnet 4.6 for general, Haiku 4.5 for cheap/fast) because the output quality on coding and reasoning tasks is meaningfully better in our experience. OpenAI is still the right choice for image generation (DALL-E, gpt-image-1) and for cases where you need the o-series reasoning models specifically.
For multi-provider deployments we use Vercel AI Gateway — single API key, model-agnostic strings like"anthropic/claude-opus-4-7", automatic failover, unified observability. Zero data retention by default. For most founders this is the right starting point, even if you only use one provider initially.
When NOT to hire us
- You have not actually tried the AI feature manually yet (just paste examples into the Claude or ChatGPT web UI). 30 minutes of manual testing tells you if the feature even works before you spend $5,000 building it.
- You want AI to "automate your business" without specifying what task. Scope it first — "summarize support tickets into a daily digest" is buildable; "AI for everything" is not.
- You expect the AI to be perfect. It will not be. Plan for 90-95% accuracy and have a path for the other 5-10% (human review, retry, error message).