Email to JSON: A Complete Guide | MailParse

Learn about Email to JSON. Converting raw email messages into clean, structured JSON for application consumption. Expert guide for developers.

Why convert email to JSON for SaaS teams

Email is the most universal input channel your customers already use. Converting inbound email messages into clean JSON lets your application react in real time: open a ticket from a reply, append a comment to a thread, trigger a workflow from a command email, or ingest leads without a UI. Email-to-JSON removes the complexity of MIME, encodings, and attachment handling so your backend only deals with structured data.

Building a reliable conversion layer is not trivial. RFCs define many edge cases, clients format replies differently, and attachments often break naive parsers. This guide breaks down the fundamentals of email to JSON, shows practical workflows with code, covers best practices used by high-scale SaaS platforms, and offers solutions to common pitfalls. If you prefer to skip infrastructure and focus on product, MailParse provides instant email addresses, parses raw MIME into structured JSON, and delivers via webhook or REST polling.

Core concepts of email-to-json conversion

MIME structure and why it matters

Emails are MIME containers that can be multipart with nested parts. At minimum you will encounter:

  • Headers: From, To, Subject, Date, Message-ID, In-Reply-To, References, Content-Type, Content-Transfer-Encoding, and auth results.
  • Bodies: text/plain and text/html, either standalone or inside multipart/alternative.
  • Attachments: multipart/mixed with Content-Disposition: attachment or inline CID images.

JSON output should normalize these pieces into a consistent schema regardless of source client or encoding.

A pragmatic JSON schema

Design a schema that is stable, language-agnostic, and safe to extend. A common shape:

{
  "id": "uuid-1234-...-abcd",
  "received_at": "2026-04-26T12:41:30Z",
  "headers": {
    "from": "Alice <alice@example.com>",
    "to": ["support@example.app"],
    "cc": [],
    "subject": "Re: Order 5824",
    "date": "Sun, 26 Apr 2026 12:41:10 +0000",
    "message_id": "<CAF-12345@example.com>",
    "in_reply_to": "<CAF-67890@example.com>",
    "references": ["<CAF-001@example.com>", "<CAF-67890@example.com>"]
  },
  "addresses": {
    "from": [{"name": "Alice", "address": "alice@example.com"}],
    "to": [{"name": null, "address": "support@example.app"}],
    "cc": []
  },
  "body": {
    "text": "Hi team,\nHere is the update...\n",
    "html": "<p>Hi team,</p><p>Here is the update...</p>",
    "charset": "UTF-8"
  },
  "attachments": [
    {
      "filename": "invoice.pdf",
      "content_type": "application/pdf",
      "size": 84211,
      "content_id": null,
      "disposition": "attachment",
      "data_base64": "JVBERi0xLjQKJc..."
    }
  ],
  "meta": {
    "spam": { "is_spam": false, "score": 0.3 },
    "dkim": "pass",
    "spf": "pass",
    "dmarc": "pass"
  },
  "threading": {
    "type": "reply",
    "reply_to_message_id": "<CAF-67890@example.com>"
  }
}

Notes:

  • Keep the original raw MIME separately for audit and reprocessing.
  • Normalize addresses into {name, address} pairs while preserving raw header strings.
  • Store data_base64 for attachments or move to object storage and replace with a signed URL.

Decoding and normalization essentials

  • Decode quoted-printable and base64 correctly. Many issues trace to improper decoding of UTF-8 or ISO-8859-1 content.
  • Pick a body preference policy. Usually prefer text/html then derive text via HTML to text, or prefer text/plain if you prioritize plaintext workflows. Always expose both if present.
  • Resolve CID inline images by mapping <img src="cid:..."/> to attachments with matching content_id.
  • Apply Unicode normalization (NFC) to text fields to avoid subtle matching bugs.
  • Parse dates to UTC ISO8601. Keep the raw Date header for display.

Practical workflows and code examples

From raw MIME to JSON in Node.js

The following example streams a raw RFC 822 message, parses MIME parts, and emits structured JSON using a lightweight parser.

// package.json: { "dependencies": { "postal-mime": "^2.0.0", "express": "^4.19.0" } }
import express from 'express';
import PostalMime from 'postal-mime';

const app = express();

// Receive raw MIME via HTTP POST as binary
app.post('/inbound', express.raw({ type: '*/*', limit: '25mb' }), async (req, res) => {
  try {
    const parser = new PostalMime();
    const mail = await parser.parse(req.body);

    // Normalize to your schema
    const json = {
      id: crypto.randomUUID(),
      received_at: new Date().toISOString(),
      headers: {
        from: mail.headerLines.find(h => h.key.toLowerCase() === 'from')?.line || '',
        to: Array.isArray(mail.to) ? mail.to.map(t => t.address) : [],
        cc: Array.isArray(mail.cc) ? mail.cc.map(c => c.address) : [],
        subject: mail.subject || '',
        date: mail.date?.toUTCString() || '',
        message_id: mail.messageId || '',
        in_reply_to: mail.inReplyTo || '',
        references: mail.references || []
      },
      addresses: {
        from: mail.from ? [{ name: mail.from.name || null, address: mail.from.address }] : [],
        to: (mail.to || []).map(t => ({ name: t.name || null, address: t.address })),
        cc: (mail.cc || []).map(c => ({ name: c.name || null, address: c.address }))
      },
      body: {
        text: mail.text || '',
        html: mail.html || '',
        charset: 'UTF-8'
      },
      attachments: (mail.attachments || []).map(a => ({
        filename: a.filename || null,
        content_type: a.contentType,
        size: a.size,
        content_id: a.cid || null,
        disposition: a.disposition || 'attachment',
        data_base64: a.content.toString('base64')
      })),
      meta: {},
      threading: {}
    };

    res.status(200).json(json);
  } catch (err) {
    console.error(err);
    res.status(400).json({ error: 'Invalid MIME' });
  }
});

app.listen(3000, () => console.log('Inbound listener on 3000'));

Key details:

  • Use a raw body parser to prevent unwanted decoding that can corrupt binary parts.
  • Limit payload size and reject overly large messages early.
  • Map the parser's output into a schema you control to prevent downstream breakage if the library changes.

Python snippet using the standard library

from email import policy
from email.parser import BytesParser
from base64 import b64encode

def parse_email_to_json(raw_bytes: bytes):
    msg = BytesParser(policy=policy.default).parsebytes(raw_bytes)

    def walk_parts(m):
        text, html, attachments = None, None, []
        for part in m.walk():
            ctype = part.get_content_type()
            disp = part.get_content_disposition()
            if ctype == 'text/plain' and text is None:
                text = part.get_content()
            elif ctype == 'text/html' and html is None:
                html = part.get_content()
            elif disp in ('attachment', 'inline') and part.get_payload(decode=True) is not None:
                attachments.append({
                    "filename": part.get_filename(),
                    "content_type": ctype,
                    "size": len(part.get_payload(decode=True)),
                    "content_id": part.get('Content-ID'),
                    "disposition": disp or 'attachment',
                    "data_base64": b64encode(part.get_payload(decode=True)).decode('ascii')
                })
        return text or '', html or '', attachments

    text, html, attachments = walk_parts(msg)

    return {
        "headers": {
            "from": msg.get('From', ''),
            "to": msg.get_all('To', []),
            "cc": msg.get_all('Cc', []),
            "subject": msg.get('Subject', ''),
            "date": msg.get('Date', ''),
            "message_id": msg.get('Message-Id', ''),
            "in_reply_to": msg.get('In-Reply-To', ''),
            "references": msg.get_all('References', [])
        },
        "body": {"text": text, "html": html, "charset": "UTF-8"},
        "attachments": attachments
    }

Receiving via webhook or polling

You can operate pull or push. Webhooks provide lower latency and reduce polling overhead. REST polling is simple to operate in restricted environments. MailParse supports both models and adds features like automatic retry with backoff, idempotency tokens, and signature verification so you can scale safely.

Best practices for reliable pipelines

1) Preserve raw MIME and make processing idempotent

  • Store the raw message in cold storage keyed by a hash of the bytes.
  • Compute a deterministic id from Message-ID, Date, sender, and body hash to avoid duplicate processing when a sender retries.
  • Use idempotency keys for webhooks and acknowledge only after persistence.

2) Normalize inputs consistently

  • Lowercase email addresses for matching, keep display names as provided.
  • Strip surrounding <> from IDs, but keep the original for display if needed.
  • Convert all dates to UTC, include the raw header string for forensic debugging.
  • Apply HTML-to-text for missing plaintext and trim whitespace-only bodies.

3) Deal with replies, signatures, and quoted text

  • Use In-Reply-To and References to thread reliably.
  • Heuristically detect quoted sections using patterns like On <date>, <name> wrote:, angle-bracket quote markers (>), or client-specific delimiters.
  • Respect RFC 3676 flowed text rules for plaintext line wrapping.

4) Security and safety

  • Impose size limits and attachment count limits. Reject archives that unpack to huge totals.
  • Run antivirus and content-type validation on attachments. Do not trust the declared type.
  • Sanitize HTML before rendering to users to prevent XSS and CSS injection.
  • Verify webhook signatures with HMAC and rotate secrets periodically. MailParse signs webhook payloads so you can enforce request authenticity.

5) Observability and reprocessing

  • Log normalized headers and a short body digest for safe debugging.
  • Tag events with outcome codes such as parsed, quarantined, rejected, or deferred.
  • Support replay from storage to fix parser bugs without losing messages.

For a broader systems view, see the Email Infrastructure Checklist for SaaS Platforms and the Email Infrastructure Checklist for Customer Support Teams to harden routing, authentication, and storage around your email-to-JSON pipeline.

Common challenges and solutions

Mixed encodings and broken clients

Problem: Headers or bodies appear with garbled characters or mixed charsets.

Solution:

  • Decode encoded-words in headers per RFC 2047. Validate using a library that supports =?UTF-8?B? and =?ISO-8859-1?Q? forms.
  • Fallback to Latin-1 when charset is missing but bytes are present, then upconvert to UTF-8.
  • Normalize to NFC and strip control characters outside allowed ranges.

Multipart puzzles

Problem: You receive an email with both plaintext and HTML, plus inline images and attachments, and your app chooses the wrong body or breaks images.

Solution:

  • Walk the tree depth-first and record the best candidate for each media type.
  • Prefer text/html if your app renders HTML, otherwise prefer text/plain. Always expose both fields.
  • Map cid: URIs to attachments with matching Content-ID. Avoid stripping inline parts that are referenced by HTML.

Reply and signature stripping

Problem: You want only the newly written text, not quoted replies or signatures.

Solution:

  • Combine header-based threading with heuristic body detection. Use In-Reply-To to attach to the right conversation, then trim body after the first recognized delimiter.
  • Maintain language-specific delimiter dictionaries and update over time based on your traffic.
  • Allow users to override by adding a marker like --- please consider only above this line --- and split on it.

Auto-replies and out-of-office

Problem: Automated messages create noise in your product.

Solution:

  • Check Auto-Submitted, X-Autoreply, and Precedence: bulk like headers. Many auto responses set these fields.
  • Use Return-Path and From heuristics to identify mailers.
  • Route automated messages to a separate queue or mark them for limited processing.

Spam and deliverability signals

Problem: You need to filter spam and preserve legitimate inbound requests.

Solution:

  • Capture DMARC, DKIM, and SPF results in JSON. Make decisions using policy and score thresholds.
  • Integrate content scoring and reputation data. Consider tightening routing for domains that consistently fail auth.
  • Review the Email Deliverability Checklist for SaaS Platforms for practices that improve both outbound and inbound success.

If building and operating this layer reduces your velocity, MailParse can provision receiving addresses instantly, stream parsed JSON to your app, and handle retries, scaling, and edge cases so your team stays focused on features.

Conclusion

Turning email into JSON gives your SaaS a flexible input surface that users already understand. You get structured data with threading context, safe attachment handling, and normalized fields that downstream services can consume without MIME expertise. Start with a clear schema, implement robust decoding and normalization, enforce security at the boundaries, and invest in observability and reprocessing.

If you want a faster path to production, try MailParse for instant inbound addresses, webhook delivery, and clean JSON output that fits into your existing architecture. For additional ideas that build on email-to-JSON, browse Top Email Parsing API Ideas for SaaS Platforms.

FAQ

What is email to JSON and why should I use it?

Email to JSON is the process of converting raw MIME messages into a structured JSON document. JSON is easy for services and microservices to consume, making it ideal for automating ticket creation, comment ingestion, lead capture, or command processing from email. It also standardizes headers, bodies, and attachments across different mail clients.

Should I store the raw email, the JSON, or both?

Store both. The raw MIME is your source of truth for audits, debugging, and future reprocessing. The JSON representation powers your application logic and search. Keeping both lets you refine parsing without data loss.

How do I choose between webhooks and polling?

Use webhooks for low latency and cost efficiency. Polling works in locked down environments or when you need tighter control over fetch windows. If you use a provider like MailParse, enable webhook signatures and idempotency to make processing safe and repeatable.

How should I handle attachments securely?

Enforce size and count limits, scan files with antivirus, validate by magic bytes rather than the declared MIME type, store in object storage, and serve via short-lived signed URLs. Never render or execute content directly in the browser without sanitization.

How do I deduplicate inbound emails?

Combine the Message-ID header with a canonical digest of key fields such as sender, date, and a body hash. Store seen keys with a TTL and reject or collapse duplicates at the edge. Also make webhook handlers idempotent so retries do not create duplicates.

Ready to get started?

Start parsing inbound emails with MailParse today.

Get Started Free