feat: markdown for html bodies#1213
Conversation
floatpanebot
left a comment
There was a problem hiding this comment.
Hi @YeeP79! Please fix the following issues with your PR:
- Title: Does not follow conventional commits (e.g.,
feat: added something,fix(core): resolved crash). - Body: Missing the
## What?or## Why?headings required by the PR template.
floatpanebot
left a comment
There was a problem hiding this comment.
Hi @YeeP79! Please fix the following issues with your PR:
- Body: Missing the
## What?or## Why?headings required by the PR template.
Formatting issues have been resolved. Thank you!
|
Hello, I cannot replicate what it fixes, can you describe in detail and also provide an email (HTML body) that is formatted better with your change? |
|
I'll get some screenshots and create a test if possible that should fail on the old setup. |
5983e6d to
5518618
Compare
|
hey, you haven't sent an HTML body that is rendered better (you can screenshot side-by-side) |
|
Sorry for the hold up, my daughter is graduating college this weekend. It has been crazy for us. Monday before work I will knock out any issues you are seeing in either PR. |
No worries, man, take your time, congrats on the graduation! |
|
Side note on the screenshots and raw HTML: I was seeing the issues mostly in emails from my work so I need to be kind of selective on what I show. I'll make sure the unit test I create makes it crystal clear if you run it with and without the PR. |
i just want to see the change, because in my inbox in the last 50 emails, i do not have a single one that is loaded differently with your fix, you can redact the information from the screenshots |
5518618 to
33488d1
Compare
|
updated the branch with rebase |
The renderer pipeline ran every email body through markdownToHTML before parsing as HTML, on the assumption that CommonMark's html_block rule would pass raw HTML through unchanged. In practice md4c's html_block shape requirements (no leading whitespace, narrow opening-tag patterns) are routinely violated by real-world HTML emails — Datadog, marketing tools, and any sender with attribute-heavy <table style="..."> markup. md4c falls back to escaping or to code-block treatment, leaking literal tags into the rendered output. Threads a body MIME type through the whole stack: fetcher.Email new BodyMIMEType field fetcher.FetchEmailBodyFromMailbox + 4 wrappers return the type backend.EmailReader.FetchEmailBody interface change imap/jmap/pop3 provider implementations return the type daemonclient.Service.FetchEmailBody interface change daemonrpc.FetchEmailBodyResult new BodyMIMEType field tui.EmailBodyFetchedMsg, PreviewBodyFetchedMsg carry the field view.ProcessBody, ProcessBodyWithInline new mimeType param When mimeType == "text/html", the renderer hands the body straight to htmlconv. Empty mimeType keeps the legacy markdown-first path, so cached emails fetched before this change render unchanged. Trade-off: HTML bodies that contain literal markdown syntax (e.g. a poorly-templated newsletter with **asterisks** inside <p> tags) no longer get markdown rendering. This matches the spec — Content-Type: text/html means the body is HTML — and the regression target is rare versus the much more common Datadog-style HTML that this fixes. Drive-by: PGP-decrypted plain-text bodies were tagging themselves as htmlPartID = "decrypted"; now correctly tagged plainPartID. No behavior change pre-this-PR (extractedBody was always returned), but required for the new MIME type tracking to be accurate. Doesn't fix floatpane#602 (htmlconv tag coverage), but resolves the related "raw <table> tags visible in the viewer" symptom.
|
You'll see two tests in
Both pass on this branch. If you cherry-pick the legacy assertion onto master and remove the |
Without this, cached bodies always re-render via the legacy markdown→HTML pre-pass even on builds that have the MIME-type-aware renderer, because the cache layer drops the type when serializing CachedEmailBody. Field is omitempty so existing cache rows continue to load (with empty MIME type, which the renderer treats as 'unknown' and falls back to legacy behavior). New rows written after this change are tagged correctly and refresh into proper rendering on next open.
33488d1 to
221fd40
Compare
|
/build |
| body, _ := io.ReadAll(p.Body) | ||
| extractedBody = string(body) | ||
| extractedBodyMIMEType = "text/plain" | ||
| htmlPartID = "decrypted" |
There was a problem hiding this comment.
why is this htmlPartID? this will result in all emails being treated as text/html
Build complete (
|
| OS | Arch | Download |
|---|---|---|
| Linux | amd64 | matcha_preview_linux_amd64.tar.gz |
| Linux | arm64 | matcha_preview_linux_arm64.tar.gz |
| macOS | amd64 | matcha_preview_darwin_amd64.tar.gz |
| macOS | arm64 | matcha_preview_darwin_arm64.tar.gz |
| Windows | amd64 | matcha_preview_windows_amd64.zip |
| Windows | arm64 | matcha_preview_windows_arm64.zip |
What?
Threads MIME type detection from the fetch layer through to rendering. FetchEmailBody and the IMAP/JMAP/POP3 providers now return BodyMIMEType. The Backend interface, daemon RPC protocol, and local cache schema all carry the MIME type through. view.ProcessBody takes the MIME type and skips the markdownToHTML pre-pass when it's text/html. The TUI message fetcher and view call sites are updated for the new field.
Why?
Right now every email body gets run through md4c before we render it as HTML. The assumption was that md4c would pass raw HTML through untouched, but that falls apart on real HTML email — Datadog digests, marketing stuff, anything with heavy attributes or indentation trips md4c's html_block rules. When that happens, md4c either escapes the tags or treats them as code blocks, and the TUI ends up showing literal
<table>,<tr>,<td>instead of the rendered content.Detecting text/html and skipping the markdown pass fixes that, while text/plain senders like GitHub notifications and mailing lists keep their markdown formatting.
Trade-offs & compatibility
If a text/html sender writes
**bold**inside their HTML, it won't render as bold anymore. That's spec-correct behavior, and it's worth losing to get tables back. On the compat side, MIME type defaults to an empty string in the wire format and cache, so old cache entries fall back to the legacy markdown path. No migration needed.