Why convert email to JSON for SaaS teams
Email is the most universal input channel your customers already use. Converting inbound email messages into clean JSON lets your application react in real time: open a ticket from a reply, append a comment to a thread, trigger a workflow from a command email, or ingest leads without a UI. Email-to-JSON removes the complexity of MIME, encodings, and attachment handling so your backend only deals with structured data.
Building a reliable conversion layer is not trivial. RFCs define many edge cases, clients format replies differently, and attachments often break naive parsers. This guide breaks down the fundamentals of email to JSON, shows practical workflows with code, covers best practices used by high-scale SaaS platforms, and offers solutions to common pitfalls. If you prefer to skip infrastructure and focus on product, MailParse provides instant email addresses, parses raw MIME into structured JSON, and delivers via webhook or REST polling.
Core concepts of email-to-json conversion
MIME structure and why it matters
Emails are MIME containers that can be multipart with nested parts. At minimum you will encounter:
- Headers:
From,To,Subject,Date,Message-ID,In-Reply-To,References,Content-Type,Content-Transfer-Encoding, and auth results. - Bodies:
text/plainandtext/html, either standalone or insidemultipart/alternative. - Attachments:
multipart/mixedwithContent-Disposition: attachmentor inline CID images.
JSON output should normalize these pieces into a consistent schema regardless of source client or encoding.
A pragmatic JSON schema
Design a schema that is stable, language-agnostic, and safe to extend. A common shape:
{
"id": "uuid-1234-...-abcd",
"received_at": "2026-04-26T12:41:30Z",
"headers": {
"from": "Alice <alice@example.com>",
"to": ["support@example.app"],
"cc": [],
"subject": "Re: Order 5824",
"date": "Sun, 26 Apr 2026 12:41:10 +0000",
"message_id": "<CAF-12345@example.com>",
"in_reply_to": "<CAF-67890@example.com>",
"references": ["<CAF-001@example.com>", "<CAF-67890@example.com>"]
},
"addresses": {
"from": [{"name": "Alice", "address": "alice@example.com"}],
"to": [{"name": null, "address": "support@example.app"}],
"cc": []
},
"body": {
"text": "Hi team,\nHere is the update...\n",
"html": "<p>Hi team,</p><p>Here is the update...</p>",
"charset": "UTF-8"
},
"attachments": [
{
"filename": "invoice.pdf",
"content_type": "application/pdf",
"size": 84211,
"content_id": null,
"disposition": "attachment",
"data_base64": "JVBERi0xLjQKJc..."
}
],
"meta": {
"spam": { "is_spam": false, "score": 0.3 },
"dkim": "pass",
"spf": "pass",
"dmarc": "pass"
},
"threading": {
"type": "reply",
"reply_to_message_id": "<CAF-67890@example.com>"
}
}
Notes:
- Keep the original raw MIME separately for audit and reprocessing.
- Normalize addresses into
{name, address}pairs while preserving raw header strings. - Store
data_base64for attachments or move to object storage and replace with a signed URL.
Decoding and normalization essentials
- Decode
quoted-printableandbase64correctly. Many issues trace to improper decoding of UTF-8 or ISO-8859-1 content. - Pick a body preference policy. Usually prefer
text/htmlthen derivetextvia HTML to text, or prefertext/plainif you prioritize plaintext workflows. Always expose both if present. - Resolve CID inline images by mapping
<img src="cid:..."/>to attachments with matchingcontent_id. - Apply Unicode normalization (NFC) to text fields to avoid subtle matching bugs.
- Parse dates to UTC ISO8601. Keep the raw
Dateheader for display.
Practical workflows and code examples
From raw MIME to JSON in Node.js
The following example streams a raw RFC 822 message, parses MIME parts, and emits structured JSON using a lightweight parser.
// package.json: { "dependencies": { "postal-mime": "^2.0.0", "express": "^4.19.0" } }
import express from 'express';
import PostalMime from 'postal-mime';
const app = express();
// Receive raw MIME via HTTP POST as binary
app.post('/inbound', express.raw({ type: '*/*', limit: '25mb' }), async (req, res) => {
try {
const parser = new PostalMime();
const mail = await parser.parse(req.body);
// Normalize to your schema
const json = {
id: crypto.randomUUID(),
received_at: new Date().toISOString(),
headers: {
from: mail.headerLines.find(h => h.key.toLowerCase() === 'from')?.line || '',
to: Array.isArray(mail.to) ? mail.to.map(t => t.address) : [],
cc: Array.isArray(mail.cc) ? mail.cc.map(c => c.address) : [],
subject: mail.subject || '',
date: mail.date?.toUTCString() || '',
message_id: mail.messageId || '',
in_reply_to: mail.inReplyTo || '',
references: mail.references || []
},
addresses: {
from: mail.from ? [{ name: mail.from.name || null, address: mail.from.address }] : [],
to: (mail.to || []).map(t => ({ name: t.name || null, address: t.address })),
cc: (mail.cc || []).map(c => ({ name: c.name || null, address: c.address }))
},
body: {
text: mail.text || '',
html: mail.html || '',
charset: 'UTF-8'
},
attachments: (mail.attachments || []).map(a => ({
filename: a.filename || null,
content_type: a.contentType,
size: a.size,
content_id: a.cid || null,
disposition: a.disposition || 'attachment',
data_base64: a.content.toString('base64')
})),
meta: {},
threading: {}
};
res.status(200).json(json);
} catch (err) {
console.error(err);
res.status(400).json({ error: 'Invalid MIME' });
}
});
app.listen(3000, () => console.log('Inbound listener on 3000'));
Key details:
- Use a raw body parser to prevent unwanted decoding that can corrupt binary parts.
- Limit payload size and reject overly large messages early.
- Map the parser's output into a schema you control to prevent downstream breakage if the library changes.
Python snippet using the standard library
from email import policy
from email.parser import BytesParser
from base64 import b64encode
def parse_email_to_json(raw_bytes: bytes):
msg = BytesParser(policy=policy.default).parsebytes(raw_bytes)
def walk_parts(m):
text, html, attachments = None, None, []
for part in m.walk():
ctype = part.get_content_type()
disp = part.get_content_disposition()
if ctype == 'text/plain' and text is None:
text = part.get_content()
elif ctype == 'text/html' and html is None:
html = part.get_content()
elif disp in ('attachment', 'inline') and part.get_payload(decode=True) is not None:
attachments.append({
"filename": part.get_filename(),
"content_type": ctype,
"size": len(part.get_payload(decode=True)),
"content_id": part.get('Content-ID'),
"disposition": disp or 'attachment',
"data_base64": b64encode(part.get_payload(decode=True)).decode('ascii')
})
return text or '', html or '', attachments
text, html, attachments = walk_parts(msg)
return {
"headers": {
"from": msg.get('From', ''),
"to": msg.get_all('To', []),
"cc": msg.get_all('Cc', []),
"subject": msg.get('Subject', ''),
"date": msg.get('Date', ''),
"message_id": msg.get('Message-Id', ''),
"in_reply_to": msg.get('In-Reply-To', ''),
"references": msg.get_all('References', [])
},
"body": {"text": text, "html": html, "charset": "UTF-8"},
"attachments": attachments
}
Receiving via webhook or polling
You can operate pull or push. Webhooks provide lower latency and reduce polling overhead. REST polling is simple to operate in restricted environments. MailParse supports both models and adds features like automatic retry with backoff, idempotency tokens, and signature verification so you can scale safely.
Best practices for reliable pipelines
1) Preserve raw MIME and make processing idempotent
- Store the raw message in cold storage keyed by a hash of the bytes.
- Compute a deterministic id from
Message-ID,Date, sender, and body hash to avoid duplicate processing when a sender retries. - Use idempotency keys for webhooks and acknowledge only after persistence.
2) Normalize inputs consistently
- Lowercase email addresses for matching, keep display names as provided.
- Strip surrounding
<>from IDs, but keep the original for display if needed. - Convert all dates to UTC, include the raw header string for forensic debugging.
- Apply HTML-to-text for missing plaintext and trim whitespace-only bodies.
3) Deal with replies, signatures, and quoted text
- Use
In-Reply-ToandReferencesto thread reliably. - Heuristically detect quoted sections using patterns like
On <date>, <name> wrote:, angle-bracket quote markers (>), or client-specific delimiters. - Respect RFC 3676 flowed text rules for plaintext line wrapping.
4) Security and safety
- Impose size limits and attachment count limits. Reject archives that unpack to huge totals.
- Run antivirus and content-type validation on attachments. Do not trust the declared type.
- Sanitize HTML before rendering to users to prevent XSS and CSS injection.
- Verify webhook signatures with HMAC and rotate secrets periodically. MailParse signs webhook payloads so you can enforce request authenticity.
5) Observability and reprocessing
- Log normalized headers and a short body digest for safe debugging.
- Tag events with outcome codes such as parsed, quarantined, rejected, or deferred.
- Support replay from storage to fix parser bugs without losing messages.
For a broader systems view, see the Email Infrastructure Checklist for SaaS Platforms and the Email Infrastructure Checklist for Customer Support Teams to harden routing, authentication, and storage around your email-to-JSON pipeline.
Common challenges and solutions
Mixed encodings and broken clients
Problem: Headers or bodies appear with garbled characters or mixed charsets.
Solution:
- Decode encoded-words in headers per RFC 2047. Validate using a library that supports
=?UTF-8?B?and=?ISO-8859-1?Q?forms. - Fallback to Latin-1 when charset is missing but bytes are present, then upconvert to UTF-8.
- Normalize to NFC and strip control characters outside allowed ranges.
Multipart puzzles
Problem: You receive an email with both plaintext and HTML, plus inline images and attachments, and your app chooses the wrong body or breaks images.
Solution:
- Walk the tree depth-first and record the best candidate for each media type.
- Prefer
text/htmlif your app renders HTML, otherwise prefertext/plain. Always expose both fields. - Map
cid:URIs to attachments with matchingContent-ID. Avoid stripping inline parts that are referenced by HTML.
Reply and signature stripping
Problem: You want only the newly written text, not quoted replies or signatures.
Solution:
- Combine header-based threading with heuristic body detection. Use
In-Reply-Toto attach to the right conversation, then trim body after the first recognized delimiter. - Maintain language-specific delimiter dictionaries and update over time based on your traffic.
- Allow users to override by adding a marker like
--- please consider only above this line ---and split on it.
Auto-replies and out-of-office
Problem: Automated messages create noise in your product.
Solution:
- Check
Auto-Submitted,X-Autoreply, andPrecedence: bulklike headers. Many auto responses set these fields. - Use
Return-PathandFromheuristics to identify mailers. - Route automated messages to a separate queue or mark them for limited processing.
Spam and deliverability signals
Problem: You need to filter spam and preserve legitimate inbound requests.
Solution:
- Capture DMARC, DKIM, and SPF results in JSON. Make decisions using policy and score thresholds.
- Integrate content scoring and reputation data. Consider tightening routing for domains that consistently fail auth.
- Review the Email Deliverability Checklist for SaaS Platforms for practices that improve both outbound and inbound success.
If building and operating this layer reduces your velocity, MailParse can provision receiving addresses instantly, stream parsed JSON to your app, and handle retries, scaling, and edge cases so your team stays focused on features.
Conclusion
Turning email into JSON gives your SaaS a flexible input surface that users already understand. You get structured data with threading context, safe attachment handling, and normalized fields that downstream services can consume without MIME expertise. Start with a clear schema, implement robust decoding and normalization, enforce security at the boundaries, and invest in observability and reprocessing.
If you want a faster path to production, try MailParse for instant inbound addresses, webhook delivery, and clean JSON output that fits into your existing architecture. For additional ideas that build on email-to-JSON, browse Top Email Parsing API Ideas for SaaS Platforms.
FAQ
What is email to JSON and why should I use it?
Email to JSON is the process of converting raw MIME messages into a structured JSON document. JSON is easy for services and microservices to consume, making it ideal for automating ticket creation, comment ingestion, lead capture, or command processing from email. It also standardizes headers, bodies, and attachments across different mail clients.
Should I store the raw email, the JSON, or both?
Store both. The raw MIME is your source of truth for audits, debugging, and future reprocessing. The JSON representation powers your application logic and search. Keeping both lets you refine parsing without data loss.
How do I choose between webhooks and polling?
Use webhooks for low latency and cost efficiency. Polling works in locked down environments or when you need tighter control over fetch windows. If you use a provider like MailParse, enable webhook signatures and idempotency to make processing safe and repeatable.
How should I handle attachments securely?
Enforce size and count limits, scan files with antivirus, validate by magic bytes rather than the declared MIME type, store in object storage, and serve via short-lived signed URLs. Never render or execute content directly in the browser without sanitization.
How do I deduplicate inbound emails?
Combine the Message-ID header with a canonical digest of key fields such as sender, date, and a body hash. Store seen keys with a TTL and reject or collapse duplicates at the edge. Also make webhook handlers idempotent so retries do not create duplicates.