Conversation
|
Someone is attempting to deploy a commit to the alephpi's projects Team on Vercel. A member of the Team first needs to authorize it. |
✅ Deploy Preview for texocr ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This pull request adds MathML support to the LaTeX OCR tool, allowing users to copy LaTeX formulas as MathML format. The implementation uses KaTeX's built-in MathML rendering capabilities with additional sanitization for better compatibility with Microsoft Word.
Changes:
- Adds MathML conversion functionality with Word-specific sanitization
- Adds a "Copy as MathML" button and auto-copy option to the UI
- Introduces a utility function to mirror public assets during the build process
- Updates internationalization files with MathML-related translations
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Adds temml dependency (note: unused in codebase) |
| pnpm-lock.yaml | Lock file updates for the temml package |
| utils/mirror-public-assets.ts | New utility to copy public assets to server output directory |
| nuxt.config.ts | Adds build hooks to mirror public assets during build |
| i18n/locales/zh-CN.json | Adds Chinese translation for "MathML" |
| i18n/locales/en.json | Adds English translation for "MathML" |
| app/pages/ocr.vue | Adds copyAsMathML function, UI button, and auto-copy integration |
| app/composables/textProcessor.ts | Implements convertToMathML and sanitizeMathMLForWord functions |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function sanitizeMathMLForWord(mathml: string): string { | ||
| if (!mathml) return mathml | ||
|
|
||
| // Strip layout hacks that Word renders as blank boxes. | ||
| if (typeof DOMParser === 'undefined' || typeof XMLSerializer === 'undefined') { | ||
| return mathml | ||
| .replace(/<mpadded[\s\S]*?<\/mpadded>/g, '') | ||
| .replace( | ||
| /<mspace\b[^>]*\bwidth=(['"])?1(?:\.0+)?em\1[^>]*>[\s\S]*?<\/mspace>/g, | ||
| '' | ||
| ) | ||
| .replace(/<mspace\b[^>]*\bwidth=(['"])?1(?:\.0+)?em\1[^>]*\/>/g, '') | ||
| } | ||
|
|
||
| const doc = new DOMParser().parseFromString(mathml, 'application/xml') | ||
| const root = doc.documentElement | ||
| if (!root || root.nodeName === 'parsererror') return mathml | ||
|
|
||
| root.querySelectorAll('mpadded').forEach((node) => { | ||
| const parent = node.parentNode | ||
| if (!parent) return | ||
| while (node.firstChild) { | ||
| parent.insertBefore(node.firstChild, node) | ||
| } | ||
| parent.removeChild(node) | ||
| }) | ||
|
|
||
| root.querySelectorAll('mspace').forEach((node) => { | ||
| const width = node.getAttribute('width') | ||
| if (!width) return | ||
| const match = width.trim().match(/^([0-9]*\.?[0-9]+)em$/) | ||
| if (!match) return | ||
| const value = Number(match[1]) | ||
| if (Number.isFinite(value) && value >= 1) { | ||
| node.remove() | ||
| } | ||
| }) | ||
|
|
||
| return new XMLSerializer().serializeToString(root) | ||
| } | ||
|
|
||
| export function convertToMathML(code: string) { | ||
| const cleanedCode = code.trim() | ||
| if (!cleanedCode) return '' | ||
|
|
||
| const rendered = katex.renderToString(cleanedCode, { | ||
| throwOnError: false, | ||
| displayMode: true, | ||
| output: 'mathml' | ||
| }) | ||
|
|
||
| const mathmlMatch = rendered.match(/<math[\s\S]*<\/math>/) | ||
| if (!mathmlMatch) return rendered | ||
|
|
||
| let mathml = mathmlMatch[0] | ||
| mathml = mathml.replace(/<annotation[\s\S]*?<\/annotation>/g, '') | ||
| mathml = mathml.replace(/<\/?semantics[^>]*>/g, '') | ||
| mathml = mathml.replace( | ||
| /<mtext>([\s\u00A0\u2000-\u200A\u202F\u205F\u3000]+)<\/mtext>/g, | ||
| '<mspace width="0.2em"/>' | ||
| ) | ||
| return sanitizeMathMLForWord(mathml) | ||
| } |
There was a problem hiding this comment.
The new functions convertToMathML and sanitizeMathMLForWord lack JSDoc documentation comments. Other functions in this file (wrapCode, formatLatex, convertToTypst) have JSDoc comments explaining their parameters and return values. Adding documentation for these functions would improve code maintainability and help other developers understand their purpose and usage.
|
Merged, good job thank you! |
* Add mathml support * Fix MathML output for Word by using KaTeX and normalizing spaces * fix: sanitize MathML to avoid Word whitespace artifacts * detale temml Co-authored-by: Copilot <[email protected]> * add jsdoc * doc --------- Co-authored-by: xjhaz <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: alephpi <[email protected]>
添加了mathml的支持,可以手动或自动粘贴为mathml格式


少数情况下会有空格,可以先使用空格进行填充