Build: Developer Tool / API Product
Toggle features and choose options to customize your spec
Preset
Authentication Methods*
Access Control*
Multi-factor Authentication
Tradeoffs
Each provider requires an OAuth app registration and key rotation policy
Requires IdP partnership and XML-based protocol handling; significant integration work
Permission checks must be applied consistently across every data access path
API Type*
API Authentication*
Webhooks
Tradeoffs
Powerful for clients but requires schema design discipline; N+1 queries are a common pitfall
Enables third-party integrations but requires an authorization server and token management
Event ordering, deduplication, and retry logic become your responsibility
Billing Model*
Payment Processor*
Tradeoffs
Requires handling trial periods, dunning, proration, and cancellation flows
Must instrument every billable action and send metered events to billing provider
Less customizable checkout; Paddle acts as legal seller so you avoid VAT registration
Tracking Scope*
Analytics Provider*
Tradeoffs
User data is shared with vendor; may require GDPR consent flow
Significant storage cost; must redact sensitive fields (passwords, PII)
Full data ownership and unlimited retention, but requires infrastructure expertise
Delivery Guarantees*
Payload Security*
Customer Visibility
Tradeoffs
One slow consumer blocks subsequent events for the same resource
High write volume to log storage — plan for hot shards if a customer has thousands of endpoints
Slight CPU cost per delivery; negligible compared to network I/O
Rate Limit Algorithm*
What to Limit By*
Abuse Prevention Layer
Response Behavior*
Tradeoffs
False positives behind corporate NATs; attackers bypass with rotating proxies
Noisy-neighbor protection — one tenant cannot starve others
Allows bursts but requires a per-identity bucket state in Redis — higher memory footprint
Meaningful latency cost at the edge if the WAF is geographically distant from users
What to Log*
Storage Backend*
User-facing Surface
Tradeoffs
Read amplification — every authenticated read produces a log write
Two storage systems to operate and keep in sync; queries may need to federate
Tamper-evidence relies on DB role permissions — insufficient for some compliance regimes
Queue Backend*
Required Capabilities*
Failure & Durability*
Tradeoffs
Primary DB absorbs queue write load; row-level locks contend with application queries
Enqueue happens outside DB transaction — jobs can run for state that was rolled back
Additional table, polling worker, and idempotency discipline — the payoff is no duplicated side effects
Delivery Provider*
Deliverability Setup*
Templating Approach*
Tradeoffs
Vendor cost scales with volume; deliverability expertise comes included
Low per-email cost but you own deliverability operations (reputation, bounces, suppression)
Two sending configurations and domains to maintain — worth it for deliverability isolation
Summary
9 of 11 features enabled
Commonly added together
Gap analysis
Most developer apps include Notifications
Effort Estimate
10+ weeks
9 enabled features
Key Decisions
User System & Auth
Will this product be sold to businesses (B2B)?
If yes
Add SAML/SSO and RBAC. Enterprise procurement often requires both.
If no
Email + password plus one OAuth option covers 95% of consumer use cases.
User System & Auth
Is this a security-sensitive application?
If yes
Enable TOTP MFA. Consider making it mandatory for privileged users.
If no
MFA is optional — offer it but do not require it to reduce friction.
User System & Auth
Email+password, passwordless, or SSO-only?
If yes
Passwordless (magic links or passkeys) eliminates password reset tickets and credential stuffing risk.
If no
Keep email+password as a universal fallback — OAuth outages should not lock users out.
User System & Auth
Do you need social providers (Google, GitHub, Apple)?
If yes
Add Google for B2C breadth; add GitHub for developer tools; add Apple only if you ship iOS (App Store requires it when you offer other social login).
If no
Skip social OAuth and avoid the app registration / key rotation overhead.
User System & Auth
Do you need SCIM provisioning?
If yes
Add SCIM alongside SAML — enterprise IT uses it to auto-provision/deprovision employees and map group membership to roles.
If no
Manual invite flows are fine until your first enterprise customer asks for SCIM in a security review.
User System & Auth
Should MFA be required, optional, or risk-based?
If yes
Risk-based (step up MFA on new device, new IP, or sensitive actions) gives security without friction on every login.
If no
Offer MFA as optional first; require it only for admins or on privileged actions.
User System & Auth
Which MFA factors will you support (TOTP, SMS, WebAuthn/passkeys, hardware keys)?
If yes
Prefer WebAuthn/passkeys and TOTP. Avoid SMS as a primary factor — SIM swapping is a real threat.
If no
TOTP alone (Google Authenticator, Authy) covers the vast majority of users with minimal implementation cost.
User System & Auth
Do you need device fingerprinting or trusted-device flows?
If yes
Remember trusted devices for 30 days to skip MFA; challenge on new device or changed fingerprint.
If no
Re-prompt MFA on every login — simpler and safer for low-volume or highly sensitive apps.
User System & Auth
Offer passkey-only sign-in?
If yes
Passkeys eliminate passwords entirely — use WebAuthn with platform authenticators. Still keep an email recovery path for lost devices.
If no
Offer passkeys as an optional second factor; users without compatible devices keep using passwords.
User System & Auth
Support staff impersonation of user accounts?
If yes
Add an impersonation flow that logs both the staff identity and the target user, with a visible banner in the impersonated session.
If no
Skip impersonation — instead build admin-side read views and support tooling that do not require acting as the user.
User System & Auth
Captcha or bot detection on signup?
If yes
Add hCaptcha or Cloudflare Turnstile on signup and password reset — invisible challenges avoid user friction.
If no
Skip captcha for internal tools or invite-only products where bot signups are not a realistic threat.
User System & Auth
Use lockout or rate-limit throttling for credential stuffing?
If yes
Exponential rate limits per IP and per account — lockouts create support tickets and denial-of-service vectors via targeted lockout.
If no
If account takeover risk is low, a simple fixed rate limit (e.g., 10 attempts per 15 min) is sufficient.
User System & Auth
Allow multiple concurrent sessions per user?
If yes
Show active sessions in account settings with a revoke button — expected behavior for any multi-device product.
If no
Single-session apps (banking, compliance) should terminate old sessions on new login.
Public API & Webhooks
Will external developers build integrations or products on top of your API?
If yes
Invest in REST + OAuth 2.0 + basic webhooks. Good DX (docs, SDKs) is as important as the API itself.
If no
API keys with REST is sufficient for internal automation and simple integrations.
Public API & Webhooks
Do your customers need real-time event delivery to their systems?
If yes
Add webhooks with retry logic and delivery guarantees. Basic webhooks break under load.
If no
Skip webhooks or use basic — most integrations can poll or tolerate eventual consistency.
Public API & Webhooks
REST, GraphQL, or gRPC?
If yes
Default to REST — universal, cacheable, documented with OpenAPI. Add GraphQL for complex mobile/dashboard clients; gRPC only for internal service-to-service.
If no
REST alone covers 90% of public APIs — resist polyglot until a specific consumer need forces it.
Public API & Webhooks
Public API or internal-only?
If yes
Public: invest in OpenAPI docs, SDKs, a status page, and versioning — public APIs are a contract you cannot break.
If no
Internal: skip the polish and iterate fast — breaking changes are cheap.
Public API & Webhooks
Versioning: URL, header, or date-based?
If yes
URL versioning (/v1/, /v2/) is simplest and most discoverable. Date-based (Stripe-style) is best for many small changes over time.
If no
Header versioning is cleaner in theory but breaks exploration with curl and docs — avoid.
Public API & Webhooks
Do you ship SDKs in multiple languages or only an OpenAPI spec?
If yes
Generate SDKs from OpenAPI (Speakeasy, Fern, Stainless) — maintaining hand-written SDKs in 5 languages is a full-time job.
If no
Publish a clean OpenAPI spec and let customers generate their own — minimum viable path.
Public API & Webhooks
Auth: API keys, OAuth 2, or mutual TLS?
If yes
API keys for server-to-server, OAuth 2 for user-delegated access, mTLS for high-security B2B integrations.
If no
Start with API keys; add OAuth only when third parties need to act on behalf of users.
Public API & Webhooks
Do you need scoped keys (read-only, per-resource)?
If yes
Model scopes as a bitmask or list on the key record — lets customers issue narrowly-scoped tokens for integrations.
If no
Single-scope keys (full account access) are simpler but raise blast radius on leaks.
Public API & Webhooks
Is API usage metered for billing?
If yes
Counter events per key in a metering store (Orb, Metronome) — never bill from analytics. Enforce at the gateway so usage-based limits apply before your app runs.
If no
Basic per-key request counters for rate limiting are enough.
Public API & Webhooks
Webhook callbacks or purely request/response?
If yes
Offer outbound webhooks — customers need push for anything they would otherwise have to poll. HMAC-sign payloads.
If no
Request/response is sufficient for synchronous integrations — offer polling endpoints for state.
Public API & Webhooks
Are bulk/batch endpoints required?
If yes
Ship a batch endpoint that accepts an array and returns per-item status — essential for import/export workflows.
If no
Per-item endpoints with good rate limits cover most cases.
Public API & Webhooks
Do you need async endpoints that return job IDs?
If yes
Return 202 Accepted with a job ID and a polling/webhook path — required for any operation > 30s (exports, imports, heavy queries).
If no
Synchronous responses are simpler — timeout budget capped at 30s.
Public API & Webhooks
Should responses support sparse fieldsets?
If yes
Add a `fields` query param (JSON:API style) or expose GraphQL — useful when payloads are large and mobile clients need only subsets.
If no
Return full resource representations — simpler and CDN-cacheable.
Public API & Webhooks
Pagination: cursor or offset?
If yes
Cursor-based (opaque token) — stable under inserts and scales to large result sets. Stripe-style.
If no
Offset pagination is fine only for small, stable datasets — breaks when items shift.
Public API & Webhooks
Will you expose a public playground (GraphiQL, Swagger UI)?
If yes
A hosted playground (Swagger UI, GraphiQL, or Scalar) dramatically improves DX for public APIs — host behind auth if the schema is sensitive.
If no
OpenAPI spec + curl examples in docs is enough for internal APIs.
Public API & Webhooks
Do partners need a sandbox environment?
If yes
Stand up a sandbox tenant with test data and test-mode API keys — required for any payments/banking integration and most enterprise deals.
If no
Production-only is fine for read-only or internal APIs.
Public API & Webhooks
Errors: RFC 7807 problem details or custom shape?
If yes
RFC 7807 (application/problem+json) is the standard — consistent across ecosystems and tool-friendly.
If no
A custom `{error: {code, message, details}}` shape is fine if you document it and keep it stable.
Public API & Webhooks
Rate limits: per key, per IP, or per endpoint?
If yes
Per-API-key as the primary axis, per-IP as anti-abuse, per-endpoint for expensive operations (AI, search). Return `X-RateLimit-*` headers.
If no
Per-key-only is enough for low-volume APIs — add IP limits at the edge (Cloudflare) later.
Payments & Billing
Is the revenue model recurring (SaaS)?
If yes
Choose subscription billing. Evaluate usage-based if pricing scales with consumption.
If no
One-time purchase is far simpler. Consider Stripe Checkout for a no-code option.
Payments & Billing
Is global VAT/sales tax compliance a concern?
If yes
Use Paddle as merchant of record — they handle tax across jurisdictions.
If no
Stripe gives more control; integrate TaxJar or Stripe Tax if needed later.
Payments & Billing
Do you want to offer a free trial without requiring a card upfront?
If yes
Use reverse trials — free-tier access with a prompt to add a card at the end. Higher signup conversion but lower trial-to-paid conversion.
If no
Card-required trials filter out tire-kickers and produce 2–3x higher trial-to-paid rates. Stripe supports both via checkout.
Payments & Billing
Do you sell to enterprise customers with procurement processes (POs, net-30 terms)?
If yes
Support invoicing workflows (Stripe Invoicing or manual PDF invoices via finance). Self-serve credit-card checkout is insufficient at that ACV.
If no
Credit-card-only is simpler and covers all SMB/prosumer use cases.
Payments & Billing
Do you sell in markets where customers transact in non-USD currencies?
If yes
Enable multi-currency pricing in Stripe or Paddle. Price in local currency — EU/UK customers strongly prefer EUR/GBP over USD conversions.
If no
USD-only is simpler; add currencies only when a market demands it.
Payments & Billing
Will you close large B2B deals that need ACH or wire transfer (>$5k)?
If yes
Enable Stripe ACH Credit Transfer or wire instructions on invoices. Credit-card fees on large invoices are prohibitive.
If no
Card-only is fine for SMB and prosumer ticket sizes.
Payments & Billing
Do you have EU customers (SCA/3DS compliance required)?
If yes
Use Stripe Payment Intents (handles 3DS authentication automatically) or Paddle. Do not use raw Charges API — it predates SCA.
If no
Still use modern Payment Intents API — SCA will apply to US processors eventually.
Payments & Billing
Is self-serve cancellation acceptable, or do you need "contact us to cancel"?
If yes
Self-serve cancellation via Stripe Customer Portal — legally required in California (FTC Click-to-Cancel) for many subscriptions.
If no
Contact-us friction increases short-term retention but damages NPS and is increasingly regulated. Think twice.
Payments & Billing
Do marketing or sales teams need to issue coupons and discounts?
If yes
Use Stripe Coupons / Paddle Discounts. Support percentage and fixed-amount discounts with expiry and redemption limits.
If no
Skip — discount UX adds complexity and is rarely needed outside marketing-led motions.
Payments & Billing
Do users frequently upgrade/downgrade mid-cycle?
If yes
Enable proration in Stripe (proration_behavior: create_prorations). Immediate upgrade + end-of-period downgrade is the customer-friendly pattern.
If no
Wait-until-renewal plan changes are simpler; skip proration logic.
Payments & Billing
Do you have >1000 paying customers or expect significant failed-payment volume?
If yes
Enable Stripe Smart Retries (free) plus a custom dunning email sequence (day 0, 3, 7, 14). Recovers 30–50% of failed payments.
If no
Default Stripe retries are enough; add custom dunning once failed payments become a meaningful revenue leak.
Payments & Billing
Should customers manage their own billing (payment methods, invoices, plan changes)?
If yes
Use Stripe Customer Portal or Paddle Retain — pre-built UI, handles tax/invoices/cancellation. Massive support-ticket reducer.
If no
Build a minimal billing page and route the rest to support — only viable at low customer counts.
Payments & Billing
Do you connect buyers and sellers and need to split payments (marketplace)?
If yes
Use Stripe Connect (Standard or Express accounts). Do not build split payments yourself — tax forms, KYC, and payouts are legal minefields.
If no
Standard direct charges are simpler and correct for first-party sales.
Payments & Billing
Do you need to issue refunds regularly with reason tracking and approval flows?
If yes
Build an internal refund tool that captures reason, links to audit log, and uses Stripe Refunds API. Required for support scale and compliance.
If no
Manual refunds through the Stripe dashboard are fine until volume demands tooling.
Payments & Billing
Are you tempted to store raw card numbers to avoid re-entry (PCI Level 1 scope)?
If yes
Do not. Use Stripe Payment Methods or Paddle saved cards — tokenized references keep you out of PCI DSS Level 1 scope. The compliance overhead is massive.
If no
Good — always tokenize. Stripe Elements, Paddle Checkout, or hosted checkout keeps card data off your servers entirely.
Analytics & Tracking
Do you need to track individual user behavior (not just aggregate)?
If yes
Enable user events and choose a third-party provider with identity stitching.
If no
Page views with a self-hosted provider (Plausible) is sufficient and privacy-friendly.
Analytics & Tracking
Are you in a privacy-sensitive market (healthcare, finance, EU users)?
If yes
Use self-hosted analytics to avoid third-party data processors and simplify GDPR compliance.
If no
Third-party providers give better tooling with minimal compliance overhead.
Analytics & Tracking
Product analytics (behavior) vs business analytics (revenue/funnels)?
If yes
Product analytics: Amplitude/Mixpanel/PostHog for events, funnels, retention. Business: pair with a warehouse (BigQuery) + BI (Metabase).
If no
If you only need one, pick product analytics first — behavior insights drive most decisions.
Analytics & Tracking
Self-serve exploration (SQL/dashboards) or fixed reports?
If yes
Pipe events into BigQuery/ClickHouse and layer Metabase or Hex — lets non-engineers write ad-hoc queries.
If no
Canned dashboards in Amplitude/Mixpanel are faster to ship and cover the 80% case.
Analytics & Tracking
Do data scientists need raw warehouse access?
If yes
Use Segment/RudderStack or a reverse-ETL path to land raw events in BigQuery/Snowflake with full fidelity.
If no
Stay with a hosted analytics tool — raw-access is dead weight for product teams.
Analytics & Tracking
Is real-time dashboarding required or is 24h acceptable?
If yes
Choose a streaming stack (ClickHouse + Kafka, or PostHog) — most warehouse-based setups have 1–24h latency.
If no
Daily batch (warehouse + dbt) is cheaper and simpler for most reporting.
Analytics & Tracking
Do you need server-side tracking in addition to client-side?
If yes
Add server-side events for revenue, subscriptions, and auth — resistant to ad-blockers and more reliable than JS.
If no
Client-side only is fine for early-stage product analytics on consumer apps.
Analytics & Tracking
Will you use a CDP (Segment, RudderStack) for fan-out?
If yes
A CDP lets you send events to multiple destinations (analytics, CRM, warehouse) from a single pipe — worth it once you have 3+ destinations.
If no
Direct SDK integrations are cheaper and simpler with one or two destinations.
Analytics & Tracking
Do you need cohorts/funnels/retention (Amplitude, Mixpanel)?
If yes
Mixpanel/Amplitude/PostHog are purpose-built for this — skip the warehouse-BI detour.
If no
Plausible/Umami are sufficient for pageview and top-line numbers.
Analytics & Tracking
Are you subject to GDPR/CCPA opt-out requirements?
If yes
Ship a consent banner (Iubenda, OneTrust) and gate all trackers behind opt-in — self-hosted analytics (Plausible) skips the banner in most EU cases.
If no
Still honor Do Not Track, but a global banner is not required.
Analytics & Tracking
Must PII be redacted from event streams automatically?
If yes
Add a redaction middleware at the SDK or CDP layer — allowlist fields, block email/phone/tokens before they leave the client.
If no
Still advisable as defense-in-depth, but not a hard gate.
Analytics & Tracking
Do you need session replay (FullStory, LogRocket)?
If yes
PostHog bundles replay with analytics; FullStory/LogRocket are purpose-built — budget for storage and mandatory PII redaction.
If no
Heatmaps or funnels cover most UX debugging without the privacy surface.
Analytics & Tracking
Will events be used for usage-based billing?
If yes
Use a metering-specific store (Orb, Metronome) or a dedicated events table with idempotency keys — not your analytics tool.
If no
Analytics tools are not billing-grade — never bill from them directly.
Analytics & Tracking
Do you need attribution tracking (multi-touch, last-touch)?
If yes
Add UTM capture + a session/touchpoint model — Amplitude and Mixpanel both offer multi-touch attribution views.
If no
Skip it until marketing asks — attribution is hard to get right and easy to mislead with.
Analytics & Tracking
Should events be sampled at high volume?
If yes
Sample non-revenue events (scroll, hover) at 1–10% — keep revenue and conversion events at 100%.
If no
Full-fidelity is fine until event volume hits provider-pricing tiers.
Analytics & Tracking
Do you need to join analytics with production data regularly?
If yes
Warehouse-based stack (BigQuery + dbt) lets you join events with subscriptions, users, and revenue — hosted tools can not.
If no
Hosted analytics is enough; skip the warehouse overhead.
Analytics & Tracking
Do customers see their own analytics (customer-facing reporting)?
If yes
Use an embedded-analytics tool (Metabase embedded, Cube.dev, Explo) — do not expose your internal dashboards.
If no
Internal analytics stays internal — simpler security model.
Analytics & Tracking
Do you need A/B test instrumentation?
If yes
PostHog, Statsig, or LaunchDarkly Experiments wire flags + analytics together — do not roll your own significance testing.
If no
Skip until you have enough traffic for tests to reach significance.
Webhooks
Do any of your events drive financial or provisioning actions on the consumer side?
If yes
Choose at-least-once delivery, document a stable event ID for deduplication, and expose a delivery log.
If no
At-most-once may be acceptable but is rarely worth the simplification.
Webhooks
Does event order matter to downstream consumers?
If yes
Use ordered delivery with per-resource partitioning. Accept head-of-line blocking.
If no
Unordered at-least-once is simpler and faster under load.
Webhooks
Should you also offer pull/polling as an alternative to push?
If yes
Expose an events API with cursor pagination alongside webhooks. Handy for consumers behind firewalls or ones that want to catch up after downtime.
If no
Push-only keeps surface area small; customers recover via the replay button.
Webhooks
Are webhook payloads HMAC-signed with a per-endpoint secret?
If yes
Follow the Stripe pattern — HMAC-SHA256 over timestamp + body, with the signature in a dedicated header. Publish sample verification code in 3+ languages.
If no
Unsigned webhooks are a security bug — always sign, even on internal endpoints.
Webhooks
Do signed payloads include replay protection (nonce or timestamp tolerance)?
If yes
Include a timestamp in the signed bytes; reject on the consumer side if skew exceeds 5 minutes. Document the tolerance window.
If no
Without timestamp enforcement, any captured webhook can be replayed forever — not acceptable.
Webhooks
Can customers filter which events a given endpoint receives?
If yes
Let them subscribe per event type (order.created, invoice.paid). Reduces their noise and your outbound volume.
If no
Fire every event to every endpoint — simpler but wasteful at scale.
Webhooks
Do customers need a dead-letter / failed-delivery view they can inspect and replay?
If yes
Build a parked-events queue per endpoint with a "replay" action. Cuts support load dramatically.
If no
Without it, every failed delivery becomes a support ticket.
Webhooks
Should customers see the last N delivery attempts with response bodies?
If yes
Retain 30–90 days per endpoint with status, response code, and latency. Absolute table stakes for any B2B webhook product.
If no
Customers will open tickets asking "did you send it?" — just build the log.
Webhooks
Do you need per-endpoint rate limiting / concurrency caps?
If yes
Cap in-flight deliveries per endpoint (e.g. 4 concurrent) to prevent one slow consumer from starving workers. Critical for multi-tenant fairness.
If no
Unbounded concurrency invites a DoS from your own system when a consumer slows down.
Webhooks
Can customers register multiple endpoints per event type?
If yes
Fan out to every active subscription. Useful for dev/staging/prod mirrors and third-party integrations.
If no
One endpoint per event type is simpler but frustrates customers with multiple consumers.
Webhooks
Do you validate endpoint URLs with a challenge/response before activating?
If yes
POST a challenge to the URL and require echoing a signed token — prevents typos and unauthorized URLs. Standard pattern (Slack Events API).
If no
Mis-entered URLs become silent delivery failures the customer can't diagnose.
Webhooks
Do any enterprise customers require mTLS on outbound deliveries?
If yes
Plan per-endpoint client-cert management and renewal; offload to a proxy (Envoy) that handles cert lifecycle.
If no
HMAC signatures + IP allowlist cover 99% of enterprise security review.
Webhooks
Do you ship SDK helpers to verify signatures in customer languages?
If yes
Publish verifyWebhook(body, header, secret) in Node, Python, Ruby, Go, PHP. Reduces integration bugs dramatically.
If no
Customers will implement signature verification wrong — expect security tickets.
Webhooks
Do webhook payload schemas need explicit versioning?
If yes
Either subscribe-per-version (Stripe style) or additive-only schemas. Retrofitting versioning after launch is painful.
If no
Lock the schema early and commit to additive-only changes — no renames, no removals.
Rate Limiting & Abuse Prevention
Do you have unauthenticated endpoints (signup, login, public API)?
If yes
Add per-IP limits on those endpoints plus CAPTCHA on threshold. Assume credential-stuffing is attempted on day one.
If no
Per-user limits on authenticated APIs are sufficient.
Rate Limiting & Abuse Prevention
Do different customer tiers pay for different rate limits?
If yes
Keyed-on-API-key limits with plan-configured thresholds; expose a usage endpoint.
If no
A single default limit keeps configuration simple.
Rate Limiting & Abuse Prevention
Do you expect adversarial traffic (credential stuffing, scraping, spam)?
If yes
Use sliding window or token bucket — fixed window leaks under boundary timing attacks. Pair with WAF and bot detection.
If no
Fixed-window with Redis INCR + EXPIRE is cheap, simple, and sufficient.
Rate Limiting & Abuse Prevention
Do you have legitimate burst patterns (batch imports, bulk API calls)?
If yes
Token bucket is the right model — allows bursts while enforcing a sustained rate. The standard for commercial API gateways.
If no
Sliding window is simpler and has lower memory overhead.
Rate Limiting & Abuse Prevention
Do you run multiple server nodes behind a load balancer?
If yes
Use centralized Redis for rate-limit state (Upstash Ratelimit, redis-cell). Per-node local counters let attackers get N×limit by rotating through nodes.
If no
In-process counters are fine for single-node deployments and dramatically cheaper.
Rate Limiting & Abuse Prevention
Should users get a warning before they hit a hard limit?
If yes
Emit soft-limit warnings via response headers (X-RateLimit-Remaining) and optionally an in-app notification when usage >80%. Prevents angry support tickets.
If no
Silent throttling at the hard limit is simpler but worse UX — only acceptable for internal APIs.
Rate Limiting & Abuse Prevention
Do you have enterprise customers who negotiate custom limits?
If yes
Build an admin override table keyed on tenant/API-key. Do not hardcode limits — operations team will need to raise them without deploys.
If no
Static per-tier limits in config are simpler and easier to reason about.
Rate Limiting & Abuse Prevention
Can a single user enqueue unbounded background jobs (imports, scrapes, AI calls)?
If yes
Rate-limit the enqueue side separately from the API side. Prevents queue-flooding attacks that bypass request-layer limits.
If no
API-layer limits are sufficient; background jobs are produced by your own code only.
Rate Limiting & Abuse Prevention
Do you have legitimate short bursts you want to allow (e.g. pagination fan-out)?
If yes
Use token bucket with a burst allowance (bucket size > refill rate). Clients can consume the bucket quickly, then settle.
If no
A flat rate is simpler — bursts are a policy decision, not a default.
Rate Limiting & Abuse Prevention
Are you exposed to L3/L4 DDoS (public API, unauthenticated endpoints)?
If yes
Put Cloudflare, AWS Shield, or Fastly in front of your origin. Application-layer rate limiting cannot absorb network-layer floods.
If no
Application-layer limits are sufficient for authenticated-only APIs.
Rate Limiting & Abuse Prevention
Do some endpoints cost 100x more than others (AI calls, complex queries, exports)?
If yes
Rate-limit by computational cost (credits/tokens per request) not by request count. Pricing and abuse protection align naturally.
If no
Request-count limits are simpler and sufficient when endpoint costs are roughly uniform.
Rate Limiting & Abuse Prevention
Do specific features (uploads, AI generations) have their own cost or quota model?
If yes
Add per-feature limits in addition to global ones. A user at their upload quota should still be able to read the API.
If no
Per-route limits are enough and keep configuration centralized.
Rate Limiting & Abuse Prevention
Is this a public developer API with SDKs / third-party integrations?
If yes
Always return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers. Well-behaved clients need them to back off correctly.
If no
Minimum viable is Retry-After on 429 responses — detailed headers are nice-to-have for internal APIs.
Rate Limiting & Abuse Prevention
Do you have health checks, metrics endpoints, or internal traffic hitting rate-limited routes?
If yes
Exempt health checks and internal service-to-service calls by IP allowlist or dedicated service tokens. Otherwise monitoring will trip your own limits.
If no
Default behavior — all traffic counts — is simpler and auditable.
Rate Limiting & Abuse Prevention
Do you have authentication endpoints at risk of credential stuffing?
If yes
Rate-limit failed logins separately (per-account + per-IP), with exponential backoff and lockout after N attempts. Combine with CAPTCHA on threshold.
If no
General per-IP limits are insufficient for auth — always treat login and password reset as a separate budget.
Audit Logging
Are you pursuing SOC 2, ISO 27001, or a similar audit?
If yes
Plan for append-only storage with >= 1 year retention, authentication + admin + mutation scopes, and an auditor-facing export.
If no
Start with auth events in an append-only table; expand scope when a customer or incident forces it.
Audit Logging
Do customers need to answer "who did X?" inside your product?
If yes
Build an admin audit log UI — you will be asked for one on every enterprise deal.
If no
Internal-only access is fine until you hear the first request.
Audit Logging
Must logs be tamper-evident (hash chain / signing)?
If yes
Chain each log entry by hashing (prev_hash + payload) or sign with an HSM-backed key — required for SOX/HIPAA trail integrity.
If no
Insert-only DB permissions on an append-only table are sufficient for most internal use.
Audit Logging
Retention: 30d, 1y, or 7y+?
If yes
Long retention (1y+): tier cold logs to S3/Glacier with lifecycle rules. 7y+ is a SOX/healthcare signal — plan storage costs.
If no
30–90 days in a hot store (primary DB or ClickHouse) covers security review timelines for non-regulated apps.
Audit Logging
Should logs include before/after diffs on updates?
If yes
Capture a JSON diff (jsondiffpatch or a custom field-level diff) — essential for customer-facing "who changed this?" questions.
If no
Log action + resource ID only; cheaper but limits forensic value.
Audit Logging
Log reads (access logs) or only writes?
If yes
Sample or scope to sensitive resources only — full read logging often produces 100x the write volume. Required for HIPAA.
If no
Write-only logging is the default — covers the overwhelming majority of compliance and forensics needs.
Audit Logging
Store audit logs separately from operational DB?
If yes
Stream to a dedicated store (ClickHouse, S3, or SIEM) — isolates audit traffic from app queries and allows differing retention/permissions.
If no
An append-only table in the primary DB is simpler and sufficient at early scale.
Audit Logging
Cryptographic signing of entries required?
If yes
Sign each entry with an HSM-backed key (AWS KMS) — provides non-repudiation beyond hash chaining.
If no
Hash-chain or insert-only permissions are enough until an auditor asks.
Audit Logging
Exportable as SIEM-compatible (CEF, JSON)?
If yes
Offer structured JSON export and optionally CEF/LEEF for enterprise SIEMs (Splunk, QRadar) — usually gated behind a plan.
If no
A simple CSV export covers most self-serve customers.
Audit Logging
Real-time alerts on specific events?
If yes
Route high-signal events (privilege escalation, mass delete) through a streaming pipeline (Kinesis/Kafka) into alerting — PagerDuty or customer Slack.
If no
Batch nightly review is enough for low-stakes environments.
Audit Logging
Distinguish system actions from user actions?
If yes
Model actor as a typed union (user | system | api_key | admin) — required for any meaningful forensic query.
If no
A single actor_id field works short-term but becomes ambiguous fast — avoid.
Audit Logging
Log IP and user-agent on every action?
If yes
Capture IP + user-agent + geo on every event — standard for security review and fraud investigations.
If no
Auth events only is the bare minimum; expect to backfill later.
Audit Logging
Retain deleted resource IDs in logs indefinitely?
If yes
Keep resource IDs forever — critical for "what happened to record X?" questions after deletion.
If no
Honor GDPR right-to-erasure by tombstoning PII but preserving action records with hashed IDs.
Audit Logging
Redact/tokenize PII in log bodies?
If yes
Run an allowlist + regex redaction pass at the producer before write — tokens, emails, card numbers never land in the log store.
If no
Acceptable only if logs never leave your trust boundary — avoid.
Audit Logging
Log admin impersonation with both identities?
If yes
Record both real_actor and impersonated_user on every event during an impersonation session — required for SOC 2 and customer trust.
If no
Single-actor logs make it impossible to tell who really acted — always log both.
Audit Logging
Is write-once storage (S3 Object Lock) a compliance need?
If yes
Stream logs to S3 with Object Lock compliance mode — cheapest credible WORM store. Neither you nor an attacker can rewrite.
If no
Insert-only DB table is enough until an auditor requires immutable storage.
Background Jobs & Queues
Does any job charge money, send external messages, or call a paid API?
If yes
Use transactional outbox and make handlers idempotent. Store an idempotency key on the job.
If no
Standard retry + DLQ is sufficient.
Background Jobs & Queues
Do you already run Redis or a broker?
If yes
Use it for jobs — a second persistence dependency is rarely justified.
If no
Start with a database-backed queue; migrate only when volume demands it.
Background Jobs & Queues
Do you need scheduled/cron jobs in addition to on-demand enqueues?
If yes
Enqueue from a single scheduler process (not per-worker cron) to avoid duplicates in a horizontally-scaled deployment.
If no
Pure on-demand enqueue is simpler — add scheduled capability only when you actually have recurring jobs.
Background Jobs & Queues
Is at-least-once delivery sufficient, or do you need exactly-once?
If yes
Exactly-once requires transactional outbox plus idempotent handlers — no library gives it to you for free.
If no
At-least-once with idempotent handlers is the pragmatic production default.
Background Jobs & Queues
Do you have mixed-priority workloads (user-visible vs batch)?
If yes
Use at least two queues (default, bulk) with separate worker pools so a long batch job never starves user-triggered work.
If no
A single queue is simpler and fine for homogeneous workloads.
Background Jobs & Queues
Are your handlers idempotent by contract?
If yes
Aggressive retries are safe — store an idempotency key per job and dedupe on handler entry.
If no
Lean on transactional outbox and accept retries will sometimes double-invoke side effects unless you add keys.
Background Jobs & Queues
Do you need DAG pipelines (jobs that spawn dependent jobs)?
If yes
Use a durable workflow engine (Temporal, Inngest, BullMQ Flows) — rolling your own DAG orchestration is a year-long tarpit.
If no
Flat enqueue is simpler and covers the majority of use cases.
Background Jobs & Queues
Do you need a dead-letter queue for failures?
If yes
Any production queue needs a DLQ with alerting on depth growth — silent job failure is a common outage source.
If no
Skip only for best-effort one-off jobs where losing the job is acceptable.
Background Jobs & Queues
Do you need per-tenant queue isolation for noisy neighbors?
If yes
Shard queues by tenant or add per-tenant concurrency caps so one customer bursting to 10k jobs does not stall everyone else.
If no
A shared queue is fine in single-tenant or low-variance workloads.
Background Jobs & Queues
Do long-running jobs need to be cancellable mid-run?
If yes
Pass a cancellation token through the handler and checkpoint progress so cancellation is responsive without data loss.
If no
If jobs complete quickly, retry-on-failure is simpler than implementing graceful cancellation.
Background Jobs & Queues
Do you need per-job-type retry and backoff configuration?
If yes
Different failure modes need different backoff — network errors retry fast, rate-limit errors retry slow. Configure per job class.
If no
A single global retry policy (5 attempts, exponential backoff) is the pragmatic default.
Background Jobs & Queues
Are jobs CPU-bound (heavy compute) or IO-bound (external calls)?
If yes
CPU-bound: use a worker pool sized to core count. Avoid async in the same process — it will not help and may hurt.
If no
IO-bound: use async/concurrent workers to maximize throughput on waiting time.
Background Jobs & Queues
Do you need queue-depth and worker-lag observability?
If yes
Emit per-queue depth, processing latency, retry count, and DLQ size to Prometheus/Datadog — and alert on them.
If no
The built-in queue dashboard (Sidekiq Web, BullMQ Board) is enough for small teams.
Background Jobs & Queues
Do you need to persist job results for later retrieval?
If yes
Store results in a separate results table keyed by job ID — clients poll or receive webhook/SSE when done.
If no
Fire-and-forget jobs are simpler; only persist results when a user UI depends on them.
Background Jobs & Queues
Do job payloads contain sensitive data?
If yes
Encrypt payloads at rest (envelope encryption with KMS) — queue storage is usually less hardened than your primary DB.
If no
Plaintext payloads are fine for internal, non-PII work.
Background Jobs & Queues
Should similar jobs be batched for efficiency?
If yes
Coalesce jobs (e.g., "send digest for user X") within a short window — a single batch handler beats N individual invocations for I/O.
If no
Per-job execution is simpler to reason about and debug.
Background Jobs & Queues
Do you need an admin UI to list, retry, and cancel jobs?
If yes
Mount the queue library dashboard (Sidekiq Web, Oban Web, BullMQ Board) behind admin auth — zero-effort ops leverage.
If no
CLI tools and logs are enough for a small team; add UI when non-engineers need to investigate job failures.
Transactional Email
Will you ever send marketing email (newsletters, promotions) from the same brand?
If yes
Plan separate transactional and marketing streams from the start.
If no
A single stream is simpler; split later if you add marketing email.
Transactional Email
Do you expect to send >100k emails/month in year one?
If yes
Evaluate SES or Postmark pricing carefully; negotiate volume discounts.
If no
Pick the best developer experience (Resend / Postmark) — the price delta is rounding at low volume.
Transactional Email
Should you use a managed provider (SendGrid, Postmark, SES) or self-host?
If yes
Use a managed provider — Postmark for deliverability, Resend for DX, SES for cost at volume. Self-hosting is never worth it for transactional.
If no
Only self-host for regulated environments with egress constraints; expect months of reputation work.
Transactional Email
Is a dedicated IP justified for your volume?
If yes
Above ~100k emails/month, request a dedicated IP and budget a 2–4 week warm-up. Below that, shared pools from Postmark/SendGrid are cleaner.
If no
Stay on the shared pool — reputation is managed for you.
Transactional Email
Should mail send from an isolated subdomain (mail.yourdomain.com)?
If yes
Standard practice — protects root-domain reputation from email mistakes and makes DNS records easier to manage.
If no
Only sending from root if you have no other choice; keep SPF/DKIM alignment tight.
Transactional Email
Is DMARC enforcement (p=reject or p=quarantine) required?
If yes
Start with p=none for reporting, then ramp to quarantine and reject once SPF+DKIM alignment is verified across all senders.
If no
Inbox providers increasingly require DMARC — plan to enforce within 6 months anyway.
Transactional Email
Do user replies to transactional email need to drive app actions (reply-to-comment)?
If yes
Use a provider with inbound reply parsing (Postmark, SendGrid Inbound Parse) and a dedicated Reply-To subdomain with MX records.
If no
Set Reply-To to a monitored support inbox or no-reply address.
Transactional Email
Does marketing/CX need to edit templates without a code deploy?
If yes
Use provider-hosted templates (SendGrid Dynamic Templates, Postmark) or a notification platform (Knock, Courier). Keep security emails in code.
If no
Code-owned templates (React Email, MJML) are reviewable and version-controlled.
Transactional Email
Do templates need per-recipient personalization beyond name / link?
If yes
Use a templating engine with merge fields (Handlebars, Liquid). Provider templates handle this well; React Email makes it trivial in code.
If no
Static templates with a few variables are fine — don't over-engineer.
Transactional Email
Do you need template versioning with rollback?
If yes
Code templates get this from git for free. For provider templates, pick one with built-in versioning (Postmark) or snapshot before edits.
If no
Direct edits are fine for low-stakes messages.
Transactional Email
Do you need to send localized email content per recipient?
If yes
Either one template per locale (simple, duplicated) or a single template with i18n key lookups. Store recipient locale on the user record.
If no
English-only ships faster; add locales when revenue justifies it.
Transactional Email
Do you need open and click tracking for product email?
If yes
All major providers offer it as a toggle. Useful for onboarding email analytics — but disclose tracking in your privacy policy.
If no
Disable trackers on security-sensitive email (password resets) regardless — tracking pixels in those emails look phishy.
Transactional Email
Are bounce and complaint webhooks processed to suppress bad addresses?
If yes
Non-negotiable at any real volume. Subscribe to provider webhooks and maintain a suppression table checked before every send.
If no
You will tank your sender reputation within weeks — this is not optional.
Transactional Email
Do you need to schedule sends for a future time?
If yes
Most managed providers support scheduled sends natively; otherwise enqueue to a delayed job queue (BullMQ, SQS with delay).
If no
Send immediately from the triggering event — simpler.
Transactional Email
Do you send high-fan-out batches (announcement to all users at once)?
If yes
Use the provider's batch send API (SendGrid v3, Postmark batch). Chunk to stay under per-call limits and spread over minutes to avoid throttling.
If no
One-at-a-time calls via your background queue are simpler.
Transactional Email
Can end-users (white-label customers) customize email content?
If yes
Sandbox the template language (no arbitrary code), validate on save, and preview before activating. Use a notification platform if this is core.
If no
Keep templates locked down — far fewer support tickets.
Transactional Email
Do you need an internal preview / test-send surface for QA?
If yes
Build an admin route that lists all templates with sample data. Pair with Mailpit/Mailhog in dev to catch rendering bugs before prod.
If no
You'll hear about broken templates from customers — not recommended.
Transactional Email
Do you need a single unsubscribe list shared across product surfaces?
If yes
Centralize in your user record or a notification platform — users unsubscribing from any email should stop all non-critical mail.
If no
Per-stream unsubscribes create support tickets; avoid if at all possible.
Preset
Authentication Methods*
Access Control*
Multi-factor Authentication
Tradeoffs
Each provider requires an OAuth app registration and key rotation policy
Requires IdP partnership and XML-based protocol handling; significant integration work
Permission checks must be applied consistently across every data access path
API Type*
API Authentication*
Webhooks
Tradeoffs
Powerful for clients but requires schema design discipline; N+1 queries are a common pitfall
Enables third-party integrations but requires an authorization server and token management
Event ordering, deduplication, and retry logic become your responsibility
Billing Model*
Payment Processor*
Tradeoffs
Requires handling trial periods, dunning, proration, and cancellation flows
Must instrument every billable action and send metered events to billing provider
Less customizable checkout; Paddle acts as legal seller so you avoid VAT registration
Tracking Scope*
Analytics Provider*
Tradeoffs
User data is shared with vendor; may require GDPR consent flow
Significant storage cost; must redact sensitive fields (passwords, PII)
Full data ownership and unlimited retention, but requires infrastructure expertise
Delivery Guarantees*
Payload Security*
Customer Visibility
Tradeoffs
One slow consumer blocks subsequent events for the same resource
High write volume to log storage — plan for hot shards if a customer has thousands of endpoints
Slight CPU cost per delivery; negligible compared to network I/O
Rate Limit Algorithm*
What to Limit By*
Abuse Prevention Layer
Response Behavior*
Tradeoffs
False positives behind corporate NATs; attackers bypass with rotating proxies
Noisy-neighbor protection — one tenant cannot starve others
Allows bursts but requires a per-identity bucket state in Redis — higher memory footprint
Meaningful latency cost at the edge if the WAF is geographically distant from users
What to Log*
Storage Backend*
User-facing Surface
Tradeoffs
Read amplification — every authenticated read produces a log write
Two storage systems to operate and keep in sync; queries may need to federate
Tamper-evidence relies on DB role permissions — insufficient for some compliance regimes
Queue Backend*
Required Capabilities*
Failure & Durability*
Tradeoffs
Primary DB absorbs queue write load; row-level locks contend with application queries
Enqueue happens outside DB transaction — jobs can run for state that was rolled back
Additional table, polling worker, and idempotency discipline — the payoff is no duplicated side effects
Delivery Provider*
Deliverability Setup*
Templating Approach*
Tradeoffs
Vendor cost scales with volume; deliverability expertise comes included
Low per-email cost but you own deliverability operations (reputation, bounces, suppression)
Two sending configurations and domains to maintain — worth it for deliverability isolation