Introduction
Email deliverability is not just a marketing concern. For startup CTOs, it directly affects signups, password resets, purchase receipts, product notifications, and support workflows. If your platform cannot reliably receive or send email, customers feel it immediately. Excellent email-deliverability is a compound result of correct DNS configuration, resilient infrastructure, rigorous monitoring, and disciplined engineering practices.
This guide focuses on ensuring reliable email receipt and delivery from a technical leader's perspective. You will find practical steps to harden DNS, implement verification and parsing, run production-grade webhooks, and avoid common traps that cause silent failures. The goal is simple: keep critical messages flowing so your product is dependable at scale.
Email Deliverability Fundamentals for Startup CTOs
Map the email flows that matter
Start by mapping critical email flows across your product. For most SaaS teams these include:
- Outbound: signup confirmations, password resets, transactional receipts, usage alerts, invoices, and support replies.
- Inbound: support@, replies to notification threads, plus-address commands like dev+deploy@yourdomain.com, and automated replies from vendors.
- Internal routing: forwarding from role addresses to ticketing systems, and internal aliases for teams.
This mapping drives DNS records, provider selection, and processing logic. It also defines what needs monitoring and alerting.
DNS records that determine trust
Receiving servers and spam filters evaluate DNS and transport properties to decide whether your messages deserve inbox placement. Set up the following for each sending domain:
- SPF: publish permitted sending hosts and services.
- DKIM: sign each outgoing message with a rotating key per domain and per sending service.
- DMARC: declare alignment rules and a policy, and collect RUA/RUF reports.
- MX: define reliable inbound delivery with primary and secondary routes.
- PTR: ensure reverse DNS for your outbound IPs maps to a sensible hostname.
- TLS: enforce modern TLS, then consider MTA-STS and TLS-RPT for stricter transport security and reporting.
Example DNS snippets:
; SPF
example.com. 300 IN TXT "v=spf1 include:_spf.sender.example -all"
; DKIM - selector "app2026"
app2026._domainkey.example.com. 300 IN TXT "v=DKIM1; k=rsa; p=MIIBIjANBgkqh..."
; DMARC
_dmarc.example.com. 300 IN TXT "v=DMARC1; p=quarantine; rua=mailto:dmarc-agg@example.com; ruf=mailto:dmarc-forensic@example.com; aspf=s; adkim=s"
; MX
example.com. 300 IN MX 10 mx1.inbound.example.net.
example.com. 300 IN MX 20 mx2.backup-inbound.example.net.
; MTA-STS
_mta-sts.example.com. 300 IN TXT "v=STSv1; id=20260428"
mta-sts.example.com. 300 IN CNAME mta-sts.hosted.example.net.
; TLS-RPT
_smtp._tls.example.com. 300 IN TXT "v=TLSRPTv1; rua=mailto:tlsrpt@example.com"
Inbound acceptance and classification
Inbound reliability requires two things: the sender must trust your MX, and your stack must accept and classify legitimate messages without losing them. Focus on:
- Resilient MX: at least two MX records with different providers or regions. Test failover.
- Spam filtering controls: tune thresholds and maintain explicit allow lists for mission-critical senders like payment providers.
- ARC and forwarding: forwarding can break SPF. ARC-aware filtering or DMARC relaxed alignment reduces false rejections.
- Attachment limits and timeouts: set clear size limits and retry policies so large attachments do not vanish.
Feedback loops and monitoring
Operate email like any core distributed system. Track:
- Outbound: bounce rates, complaint rates, DMARC aggregate reports, and per-template delivery performance.
- Inbound: hourly volume, classification outcomes, processing latency from MX receipt to webhook completion, and error budgets.
- Transport security: TLS success rates and downgrade attempts via TLS-RPT.
Establish SLIs and SLOs, for example: 99.9 percent of inbound messages transformed to JSON and delivered to your webhook within 30 seconds, with 0 percent silent drops.
Practical Implementation
Architecture for reliable inbound processing
A production pattern that balances reliability and simplicity:
- MX and MTA: managed inbound service or Postfix with rspamd, deployed in two regions.
- Parser: normalize MIME to structured JSON. Preserve raw MIME for forensics and reprocessing.
- Webhook delivery: push JSON to your API with retries and exponential backoff. Use HMAC signatures.
- Queue and storage: enqueue on receive, mark idempotently on success, and archive immutable payloads in object storage.
If you prefer not to run your own parser and delivery pipeline, MailParse provides instant email addresses, turns MIME into structured JSON, and delivers via webhook or REST polling API. This reduces engineering toil while keeping you in control of routing and storage.
Webhook handler blueprint
Here is an example Node.js handler that validates signatures, enforces idempotency, and stores normalized data. Adapt this for your provider's JSON shape and signature scheme.
import express from 'express';
import crypto from 'crypto';
const app = express();
app.use(express.json({ limit: '20mb' }));
function verifySignature(secret, payload, signature) {
const expected = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
return crypto.timingSafeEqual(Buffer.from(signature, 'hex'), Buffer.from(expected, 'hex'));
}
function alreadyProcessed(id) {
// Implement a fast read-through cache or DB check
return false;
}
app.post('/webhooks/inbound', (req, res) => {
const raw = JSON.stringify(req.body);
const signature = req.get('X-Webhook-Signature');
if (!signature || !verifySignature(process.env.WEBHOOK_SECRET, raw, signature)) {
return res.status(401).send('invalid signature');
}
const msg = req.body;
// Idempotency using RFC 5322 Message-ID or provider event id
const id = msg.messageId || msg.id;
if (alreadyProcessed(id)) {
return res.status(200).send('ok');
}
// Persist both normalized and raw
const record = {
id,
from: msg.from, // array of { address, name }
to: msg.to,
cc: msg.cc || [],
subject: msg.subject || '',
text: msg.text || '',
html: msg.html || '',
headers: msg.headers || {},
attachments: msg.attachments || [],
rawMimeUrl: msg.rawMimeUrl // link to object storage if available
};
// saveMessage(record) - implement durable storage and enqueue downstream jobs
// emit metrics: processing latency, size, attachment count
res.status(202).send('accepted');
});
app.listen(3000);
SPF, DKIM, and DMARC verification on inbound
Inbound servers should annotate messages with SPF, DKIM, and DMARC results to help your application classify and triage. If your MX does not do this natively, you can verify in your application layer with open libraries.
# Python example using dkim and spf libraries
import dkim
import spf
from email import message_from_bytes
def verify(message_bytes, ip, mail_from, helo):
msg = message_from_bytes(message_bytes)
dkim_ok = dkim.verify(message_bytes)
spf_result, spf_reason = spf.check2(i=ip, s=mail_from, h=helo)[:2]
# DMARC alignment logic can be added using dmarc libraries or by parsing headers
return dict(dkim=bool(dkim_ok), spf=spf_result, reason=spf_reason)
Store these results alongside the parsed content. They are invaluable for routing decisions, abuse detection, and debugging misclassification.
Normalizing MIME safely
To avoid downstream surprises:
- Decode content-transfer encodings consistently for text and HTML parts.
- Extract inline attachments and map Content-ID references to accessible URLs if rendering.
- Strip dangerous HTML and sanitize images before exposing to users.
- Hash and virus-scan attachments asynchronously, then gate downloads by scan status.
For teams building their own parser, libraries like nodemailer's mailparser, Python's email.message, and Go's mime/multipart provide solid foundations.
Tools and Libraries
Operational stack
- MTA and filtering: Postfix or Haraka, with rspamd or SpamAssassin. Consider OpenDKIM and OpenDMARC for on-box verification and signing.
- Infrastructure as code: manage DNS with Terraform providers for Route 53, Cloudflare, or Gandi. Keep DKIM selectors and rotation schedules in code.
- Observability: Prometheus metrics for queue depth and processing latency, Grafana dashboards for bounce and complaint rates, and Loki for SMTP logs.
- Parsing libraries: Node.js mailparser, Python dkimpy and email, Go go-message. Normalize to a canonical schema that your teams share.
- Hosted parsing and delivery: MailParse for instant inboxes, structured JSON, and webhooks or REST polling when you prefer a managed layer.
For a structured checklist of what to deploy first, see the Email Deliverability Checklist for SaaS Platforms. If you are designing a broader system that includes support and product notifications, the Email Infrastructure Checklist for SaaS Platforms is a helpful companion.
Common Mistakes Startup CTOs Make with Email Deliverability
- SPF mechanisms that exceed 10 DNS lookups. Consolidate includes, remove unused vendors, and consider subdomain delegation for each sender.
- No DMARC reporting or a policy stuck at p=none forever. Roll out p=quarantine with a small percent step-up and watch reports before moving to p=reject.
- Single MX without tested failover. Add a second MX in a different region or provider and run quarterly game days that simulate a failure.
- Dropping large or slow messages silently. Enforce content size limits and return 4xx to encourage sender retries. Surface errors to observability.
- Not preserving raw MIME. You will need the raw source to reproduce DKIM verification, diagnose encoding issues, and satisfy compliance.
- Non-idempotent webhooks. Duplicate deliveries happen during retries. Use message-id or a provider event id as an idempotency key.
- Ignoring plus-addressing and VERP. These are essential for routing threaded replies and for reliable bounce processing.
- Inadequate key rotation. DKIM keys should rotate at least twice per year with retired selectors removed from DNS.
- Assuming forwarding behaves like first-hop sending. Forwarding often breaks SPF. Invest in ARC-aware filtering and DMARC relaxed alignment where appropriate.
- Overly aggressive SPF hard fail combined with third-party support tooling. If partners forward on your behalf, coordinate alignment or use subdomains that these systems can authenticate correctly.
Advanced Patterns
Multi-provider failover for inbound
Run dual MX with distinct providers or infrastructure stacks. Ensure both paths normalize to the exact same JSON schema and emit the same webhook format. Keep a feature flag in your application to accept from either source and to drain traffic for testing. For metadata consistency, sign your webhook payloads with the same secret across providers or verify per provider with separate keys.
Progressive DMARC rollout with alignment
Use a subdomain per sender type, for example notifications.example.com and billing.example.com. Configure DKIM selectors specific to each vendor. Start with p=none and analyze RUA for two weeks, then move to p=quarantine pct=10 and increase weekly. Aim for full alignment with From domain to reach p=reject confidently.
MTA-STS and TLS reporting at scale
Adopt MTA-STS to require TLS for inbound and outbound where partners support it. Monitor TLS-RPT to spot downgrade attacks or misconfiguration. Alerts should trigger when TLS failure rates exceed a small error budget, for example 0.1 percent of transactions in a rolling 1-hour window.
Queued processing and backpressure
Separate SMTP receipt from application processing using a durable queue. On spikes or during downstream outages, accept mail at the edge, enqueue, and process with consumer autoscaling. Implement backpressure by reducing consumer concurrency when error rates rise. Record a per-attachment processing SLA and treat virus scanning as a separate stage with its own queue and metrics.
Normalization and enrichment pipeline
After parsing, enrich messages with:
- Verified identities: SPF, DKIM, and DMARC results plus the organizing organization domain via PSL lookups.
- Threading: RFC 5322 In-Reply-To and References, plus your own header for ticket or thread id.
- Security labeling: attachment types, virus scan verdicts, and HTML sanitization level.
- Data residency tags for compliance routing across regions.
Some teams pair a self-hosted MX with MailParse for consistent parsing and webhook delivery while keeping strict control over network boundaries and storage.
Inbound product patterns
Inbound email unlocks powerful UX for teams that live in their inbox. Examples:
- Reply-to-tasks: users reply to a notification to update a task. Route by plus-address or custom header that maps to a resource id.
- Automated ingest: vendors email CSV files to ingest@. Parse, scan, and stage to a data pipeline with checksum validation.
- Support triage: support@ flows into a ticketing service. Use DKIM and DMARC results to flag spoof attempts, and auto-assign priority for verified senders.
For more patterns, see Top Inbound Email Processing Ideas for SaaS Platforms and Top Email Parsing API Ideas for SaaS Platforms.
Conclusion
Reliable email-deliverability is a systems engineering problem: DNS correctness, transport security, classification, parsing, webhook resilience, and continuous monitoring all play a part. Treat your email flows as first-class product dependencies with clear SLIs, robust failover, and disciplined operations. Whether you build each piece or adopt a managed component like MailParse, the outcome should be the same: critical messages arrive, are parsed correctly, and reach your product within seconds with auditable trails.
FAQ
How do I choose between building and buying inbound parsing?
Build if you have strict data residency, custom MIME needs, or expect deep product-specific enrichment. Buy if you need speed to market, predictable maintenance, and a standard JSON schema. Many teams combine both, for example self-hosted MX with a managed parser and webhook delivery from a provider such as MailParse.
What is the fastest way to improve deliverability next week?
Publish correct SPF, enable DKIM with key rotation, and configure DMARC with p=none and RUA reports. Add a secondary MX with health checks. Set up dashboards for inbound latency and webhook retry rates. Run a test suite that sends messages from major providers and verifies parsing and storage.
How do I prevent losing emails during outages?
Separate SMTP receipt from application processing using a durable queue and object storage for raw MIME. Ensure MX redundancy and configure retries with backoff for webhooks. Idempotency keys prevent duplication when retries occur. Alert on queue age, not just queue length, to catch latent backlogs early.
How big should our attachments limit be?
Keep it practical, for example 10 to 25 MB per message, and document it. Accept larger messages with 4xx to encourage retries rather than silently dropping. Scan all attachments asynchronously and gate downloads until scans complete. Provide API endpoints to upload large files directly to storage when possible.
When should we move DMARC to p=reject?
After you confirm that all legitimate senders align with your From domain, your DKIM selectors are healthy, and RUA reports show low failure rates. Roll out gradually with pct steps. Monitor for forwarding-induced failures and consider relaxed alignment or ARC-aware filtering where forwarding is common.