Inbound Email Processing: A Complete Guide | MailParse

Introduction

Inbound email processing is the backbone behind ticketing systems that accept replies, no-reply receipts that still need to be tracked, and SaaS workflows that let users email in data. Treating email as an input channel gives your product a durable, user-friendly interface that works from any client, online or offline. The trick is handling receiving, routing, and processing reliably, at scale, without maintaining mail servers.

Modern teams implement inbound-email-processing with APIs that turn SMTP into structured JSON, then post to webhooks or expose it for polling. Tools like MailParse make this simpler by providing instant addresses, MIME parsing, and delivery to your app. This guide explains fundamental concepts, practical patterns, and production-grade tips so you can add email ingestion with minimal friction.

Core concepts and fundamentals

What inbound email processing means

Inbound email processing converts incoming SMTP messages into application-friendly events. Instead of connecting directly to Postfix or Dovecot, you receive a normalized payload that includes headers, body parts, attachments, and metadata. Your app then decides how to route, validate, and store the message.

DNS and address strategy

Use a dedicated subdomain for ingestion, for example in.example.com. This isolates risk and improves deliverability and observability.
Point MX records for that subdomain to your provider, so email to anything@in.example.com is received by your processing service.
Plan addressing patterns that encode routing, for example plus addressing. Examples:
- support+acme@in.example.com to route to tenant acme.
- ingest+proj_123@in.example.com to route to project ID 123.
- VERP style addresses for tracking bounces per user or message.

MIME to structured JSON

Email is multipart, can include alternative text and HTML bodies, and can carry many attachments. A good parser outputs a consistent JSON representation so you can focus on business logic. A typical structured payload includes:

{
  "id": "evt_01HXW7Y0KZ",
  "timestamp": 1712684871,
  "envelope": {
    "mail_from": "alice@example.org",
    "rcpt_to": ["support+acme@in.example.com"]
  },
  "headers": {
    "message-id": "<CAMx1234@example.org>",
    "from": "Alice <alice@example.org>",
    "to": "Support <support+acme@in.example.com>",
    "subject": "Issue with the April invoice",
    "content-type": "multipart/alternative; boundary=abc"
  },
  "dkim": {"passed": true, "domain": "example.org"},
  "spf": {"passed": true},
  "dmarc": {"passed": true},
  "parts": [
    {"type": "text/plain", "charset": "utf-8", "content": "Hello...\n"},
    {"type": "text/html", "charset": "utf-8", "content": "<p>Hello...</p>"}
  ],
  "attachments": [
    {
      "filename": "invoice.pdf",
      "content_type": "application/pdf",
      "size": 182044,
      "content_id": null,
      "download_url": "https://files.example.net/att/evt_01HXW7Y0KZ/1"
    }
  ],
  "raw_url": "https://files.example.net/raw/evt_01HXW7Y0KZ"
}

Webhook vs polling delivery

There are two common delivery models:

Webhook push - messages are POSTed to your HTTPS endpoint in near real time. You respond with 2xx to acknowledge, or 4xx/5xx to trigger retries with backoff.
REST polling - your service polls a queue endpoint for new messages, then acknowledges them after processing. This model is useful for private networks and batch jobs.

With MailParse you can receive events via webhook or poll a REST API. Choose based on your runtime, scaling model, and security posture.

Security primitives

Sender authentication - inspect SPF, DKIM, and DMARC results in the payload. Treat failures with caution or quarantine.
Request authentication - sign webhook requests with HMAC and verify the signature in your code. For polling, use short-lived tokens and least privilege scopes.
Attachment hygiene - scan for malware, enforce file-type allowlists, and stream to storage instead of loading entire files into memory.
Idempotency - use Message-Id, a stable event ID, or both to avoid double processing when retries occur.

Practical applications and examples

Routing by address and headers

Design a router that maps inbound addresses and headers to tenants, projects, or threads:

// Pseudocode for routing
function route(event) {
  const to = event.headers.to || '';
  const rcpt = event.envelope.rcpt_to[0] || '';
  const m = rcpt.match(/support\+([a-z0-9_-]+)@in\.example\.com/i);
  if (m) return {type: 'support', tenant: m[1]};
  // Fallback to headers
  const refs = event.headers['in-reply-to'] || event.headers['references'];
  if (refs) return {type: 'reply', threadKey: hash(refs)};
  return {type: 'unclassified'};
}

Webhook handler example

The following Python Flask example verifies an HMAC header, extracts text, downloads an attachment, and stores a record:

from flask import Flask, request, abort, jsonify
import hmac, hashlib, requests, os

app = Flask(__name__)
SHARED_SECRET = os.environ.get("INBOUND_HMAC_SECRET", "").encode()

def verify_signature(raw_body, signature):
  mac = hmac.new(SHARED_SECRET, raw_body, hashlib.sha256).hexdigest()
  return hmac.compare_digest(mac, signature)

@app.post("/inbound")
def inbound():
  sig = request.headers.get("X-Inbound-Signature", "")
  raw = request.get_data()
  if not verify_signature(raw, sig):
    abort(401)

  evt = request.get_json()
  # Idempotency using provider event id + Message-Id
  msg_id = evt["headers"].get("message-id")
  event_id = evt["id"]
  if seen_before(event_id, msg_id):
    return jsonify({"status": "duplicate"}), 200

  # Select best body part
  text = None
  for part in evt.get("parts", []):
    if part["type"] == "text/plain":
      text = part["content"]
      break

  route_info = route(evt)  # implement routing like the pseudocode above
  record_id = save_message(
    tenant=route_info.get("tenant"),
    subject=evt["headers"].get("subject"),
    from_addr=evt["envelope"]["mail_from"],
    text=text,
    metadata={"spf": evt["spf"], "dkim": evt["dkim"], "dmarc": evt["dmarc"]}
  )

  # Stream first attachment to object storage
  att = next(iter(evt.get("attachments", [])), None)
  if att:
    with requests.get(att["download_url"], stream=True, timeout=30) as r:
      r.raise_for_status()
      stream_to_bucket(f"msgs/{record_id}/{att['filename']}", r.iter_content(65536))

  return jsonify({"status": "ok", "id": record_id}), 200

# Stub helpers
def seen_before(event_id, message_id): ...
def save_message(**kwargs): ...
def route(evt): ...
def stream_to_bucket(key, chunks): ...

Polling example with curl

If you prefer to pull messages in batches, a simple REST flow looks like this:

# 1) Fetch pending events
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.inbound.example.com/v1/events?status=pending&limit=50" > batch.json

# 2) Process locally with your script
python process_batch.py batch.json

# 3) Acknowledge processed ids to avoid re-delivery
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ack_ids":["evt_01HXW7Y0KZ","evt_01HXW7Y5PQ"]}' \
  "https://api.inbound.example.com/v1/events/ack"

Threading replies for support desks

Support and CRM systems often thread replies into the correct ticket. Suggested approach:

Embed a ticket key in the reply-to address, for example re-abc123@in.example.com.
Also store the original Message-Id. On reply, inspect In-Reply-To and References to find the parent ticket when address keys are missing.
Strip quoted text and signatures using heuristics or a library to extract only the new content.

Integrating with your SaaS

For productized flows, define a canonical processing pipeline:

Receive event via webhook or polling.
Authenticate the request, validate SPF, DKIM, and DMARC fields.
Route using recipient and header context.
Normalize body content, convert HTML to text if needed.
Store raw and parsed content, then emit a domain event for downstream services.
Attach labels like spam_suspected, contains_attachment, tenant=acme.
Notify users or trigger automations with decoupled workers.

Many teams start with webhooks, then add polling for background reconciliation. MailParse supports both so you can evolve architecture without rework.

Best practices and tips

Design addresses for long term stability

Treat recipient patterns as a public API. Once you document support+tenant@in.example.com, keep it stable.
Reserve prefixes like bounce, no-reply, and re- for system purposes.

Normalize and clean content

Prefer text/plain when present. If only HTML exists, convert to text with a safe sanitizer and preserve links.
Trim quoted replies and footers to improve search and classification. Maintain an option to view the original raw content.
Decode international charsets using the parser's metadata. Always store Unicode strings in UTF-8.

Protect your system

Reject or quarantine messages that fail DMARC when policy is reject or quarantine. Soft fail can be routed to review.
Implement content scanning and file type allowlists. Block dangerous types or store them in cold storage without inline rendering.
Cap body size and attachment size. Stream attachments to object storage and process them asynchronously.
Rate limit by sender domain and IP reputation. Combine allowlists for trusted workflows and blocklists for obvious abuse.

Operational resilience

Use idempotent handlers. Acknowledge only after durable storage to handle retries safely.
Monitor webhook latency and error rates. Alert when delivery falls back to retries or when queue depth grows.
Record both raw MIME and parsed JSON. Raw storage lets you re-parse if your logic changes.
Tag every message with a processing status such as received, validated, routed, stored, failed.

Common challenges and solutions

Spam, spoofing, and abuse

Challenge: High volumes of spam or spoofed senders degrade user experience and waste resources.

Solution: Use SPF, DKIM, and DMARC results as inputs to a risk score. Apply allowlists for trusted workflows, block disposable domains if necessary, and throttle by IP and domain reputation. Implement content heuristics only after sender checks, and offer users an option to mark false positives or false negatives to improve rules.

Threading breaks across email clients

Challenge: Some clients strip headers or alter subjects, which breaks ticket threading.

Solution: Use multiple signals: an address key in the recipient, In-Reply-To or References headers, and a fallback subject digest plus sender hash within a short time window. Deduplicate with Message-Id and event IDs.

Character encodings and non-ASCII content

Challenge: Messages arrive with mixed charsets and quoted-printable or base64 encodings.

Solution: Rely on a robust MIME parser that preserves charset info per part. Normalize to UTF-8 in your application and store the original encoding metadata for diagnostics. Test with multilingual fixtures.

Large attachments and memory pressure

Challenge: Processing large attachments can spike memory and slow request threads.

Solution: Stream downloads in chunks, write directly to object storage, and pass a reference to async workers for analysis. Set per-attachment and total size limits, then fail fast with a clear error that the user can understand.

Delivery reliability and backpressure

Challenge: Traffic bursts or downstream outages lead to retries and duplicate work.

Solution: Implement exponential backoff for webhooks, use idempotent keys, and scale workers horizontally. For polling, track cursor positions or acknowledged IDs. Emit metrics for queue depth, oldest event age, and end-to-end latency.

Conclusion

Inbound email processing gives your SaaS a low-friction capture channel that users already understand. By mapping addresses to tenants, validating senders, normalizing content, and storing both raw and parsed data, you get reliable ingestion without mail server maintenance. MailParse turns these building blocks into straightforward APIs so your team can focus on product logic instead of SMTP and MIME edge cases.

Start with a dedicated subdomain, a simple router, and an authenticated webhook. Add attachment streaming, reply threading, and monitoring as you grow. With these foundations, inbound email becomes a dependable part of your platform's data pipeline.

FAQ

How do I choose an ingestion subdomain and set it up?

Pick a dedicated subdomain like in.example.com. Create MX records that point to your provider, and avoid mixing with marketing or transactional sending domains. Using a separate subdomain lets you enforce different policies, rotate keys safely, and monitor isolated analytics.

What is the best way to verify webhook authenticity?

Use an HMAC signature header that includes the raw request body and a shared secret. Recompute the digest server side, compare with a constant-time function, and reject if mismatched or stale. Pair this with TLS, short-lived secrets, and IP allowlisting if possible.

How can I prevent infinite loops with auto-responders?

Ignore or rate limit messages with Auto-Submitted or X-Autoreply headers, and never send replies to the ingestion address. Use a unique reply-to subdomain for outbound threads and validate that inbound replies match expected patterns.

What limits should I set for attachments?

Establish clear per-attachment and total size caps, for example 10 MB per file and 25 MB total. Stream to object storage, scan asynchronously, and block executable or high-risk types. Provide user feedback on rejected attachments and suggest alternatives like secure uploads.

Do I need to run my own mail server to implement this?

No. An API-first inbound service handles SMTP, parsing, and delivery to your app. You configure DNS, define webhooks or polling, and implement routing logic. MailParse covers the heavy lifting so you do not have to operate mail infrastructure.