By 苏剑林 | August 26, 2024
In "Making MathJax Better Compatible with Google Translate and Lazy Loading," we mentioned that Cool Papers added MathJax to parse LaTeX formulas. However, it was unexpected that this would trigger many compatibility issues. While some problems were purely the result of the author's obsessive-compulsive tendencies, a solution that is as perfect as possible is ultimately gratifying. Therefore, I am willing to spend a little more thought on it.
In the previous article, we resolved the compatibility between MathJax, Google Translate, and lazy loading. In this article, we will resolve the conflict between MathJax and Marked.
Markdown is a lightweight markup language that allows people to write documents in an easy-to-read, easy-to-write plain text format. It is arguably one of the most popular writing syntaxes today. The [Kimi] function in Cool Papers also essentially outputs according to Markdown syntax. However, Markdown is not a language directly intended for browsers; the language for browsers is HTML. Therefore, there is a process of converting Markdown to HTML (rendering) before displaying it to users.
Markdown-to-HTML conversion is generally divided into two types: one where a server-side backend converts Markdown into HTML and sends it to the user, and another where the user receives Markdown and converts it to HTML in the frontend via the browser. The examples in this article are mainly for the latter, but in principle, the same logic should be easily adaptable for the former. There are many libraries for Markdown conversion in the frontend; Cool Papers uses Marked, which is a relatively lightweight choice.
The method for rendering Markdown with Marked is simple: just call marked.parse on the string. At this point, combined with MathJax introduced in the previous section, LaTeX code can be parsed. However, Markdown and LaTeX share some overlapping syntax. Consequently, Marked might first transform the LaTeX code (if any) according to Markdown rules, so the subsequent MathJax cannot obtain the original LaTeX code, leading to rendering failure.
A reproducible code snippet is as follows:
<div id="content">Loading...</div>
<script src="marked.min.js"></script>
<script src="MathJax.js?config=TeX-AMS_HTML"></script>
<script>
var md_text = "Standard LaTeX: $f(x) = x^2$. Problematic LaTeX: $x_{1,2} = \frac{-b \pm \sqrt{b^2-4ac}}{2a}$ and double backslash \\\\.";
// Rendered by Marked first
document.getElementById('content').innerHTML = marked.parse(md_text);
// Then handled by MathJax
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "content"]);
</script>
It is worth mentioning that the popular blog framework Hexo also uses Marked by default to render Markdown. Therefore, if we search for "MathJax Marked conflict," we can find quite a bit of materials, most of which are based on Hexo. One detailed summary is "Tuning Hexo [2] — Conflicts and Solutions between Hexo and MathJax", which summarizes the approaches into four types:
1. Manual Escaping: Do not write valid LaTeX code when writing formulas; instead, write LaTeX code that "becomes correct after being rendered by Marked." For example, if the original LaTeX code uses double backslashes
\\, it becomes a single backslash\after Marked. Thus, you would write four backslashes\\\\from the start, which becomes\\after Marked.2. Formula Protection: This logic is simpler. Utilize the fact that Marked does not render code blocks to protect the formula with code tags. After Marked rendering, extract the formula and parse it with MathJax. For example, "Solving the Conflict between MathJax and Markdown". The downside is it easily conflicts with normal code blocks.
3. Changing Engines: Switch to a rendering engine that better supports the mixture of Markdown and LaTeX, such as Pandoc, which is commonly recommended under Hexo. See "Solving the Conflict between Hexo and MathJax". However, Pandoc is a backend engine, and the author has not found a better frontend alternative.
4. Modifying the Engine: Modify the Marked code so that it does not render certain LaTeX patterns, thereby solving the problem to some extent, as in "Marked.js and MathJax Coexistence Issue in Hexo". This requires us to summarize rules prone to mis-rendering by Marked and handle them one by one.
Solutions 1 and 2 require manual modification of formula code, but since the formulas in Cool Papers are generated by Kimi, they cannot be modified; thus, these are basically ruled out. Since I haven't found a better frontend Markdown engine, solution 3 was also rejected. While solution 4 can solve the problem to a degree, it is too rule-based, insufficiently elegant, and essentially "treating the symptoms but not the disease," with no way to know if there are still unhandled rules.
In fact, there is a very simple solution to this problem: essentially, the conflict arises from running Marked before MathJax. What if we reverse it: use MathJax to render the formulas first, and then use Marked to render the Markdown? Because MathJax can identify mathematical formulas quite strictly and the rendering result almost never contains Markdown syntax, running MathJax before Marked can fundamentally solve the conflict.
Reference code is as follows:
<div id="content">Loading...</div>
<script>
var md_text = "..."; // Markdown text with LaTeX
var content = document.getElementById('content');
content.innerHTML = md_text;
// Run MathJax first
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "content"], function() {
// After MathJax is done, run Marked
content.innerHTML = marked.parse(content.innerHTML);
});
</script>
The final display effect of the above code is exactly what we expect, but for readers with perfectionist tendencies, there are still two minor flaws.
The first flaw is that it initially displays the raw Markdown text, and only after a short delay (depending on rendering speed) does it show the final rendered effect. Note that raw Markdown output directly to the browser looks nearly like gibberish. Users see a chaotic page first, which affects the reading experience. To address this, we can create another element to handle the rendering and only assign it to the current page after rendering is complete:
<div id="content">Loading...</div>
<div id="hidden_content" style="display:none"></div>
<script>
var md_text = "...";
var hidden = document.getElementById('hidden_content');
hidden.innerHTML = md_text;
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "hidden_content"], function() {
document.getElementById('content').innerHTML = marked.parse(hidden.innerHTML);
});
</script>
This way, users see the rendered effect directly, without transitional gibberish. The second flaw is that if you right-click the formula now, you will notice that the MathJax menu shown below does not appear:
Normally, right-clicking a formula reveals the MathJax menu
This is easily understood once you know the principle of custom right-click menus. Simply put, a custom menu requires an event listener bound to the element, but once the element's innerHTML is edited, the listener becomes invalid. I thought about this for a long time and eventually discovered by chance that when we call MathJax.Hub.Typeset again, MathJax automatically re-renders the formulas. So, we just need to delete the original formulas based on the code above and then re-render them:
<script>
// ... (previous logic)
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "hidden_content"], function() {
document.getElementById('content').innerHTML = marked.parse(hidden.innerHTML);
// Delete original formula cache and re-render to restore right-click menu
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "content"]);
});
</script>
This restores the right-click menu. But it's not over yet. I looked into the logic and found that after the first Typeset, the original formula code is stored in a script tag, which allows subsequent calls to Typeset for re-rendering. However, I discovered that Marked actually renders the content inside script tags!! To prevent Marked from modifying the formulas, we can save the original code of the formula before marked.parse and overwrite it back after marked.parse:
<script>
// ... (previous logic)
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "hidden_content"], function() {
var scripts = hidden.getElementsByTagName('script');
var script_codes = [];
for (var i = 0; i < scripts.length; i++) {
script_codes.push(scripts[i].innerHTML);
}
document.getElementById('content').innerHTML = marked.parse(hidden.innerHTML);
var new_scripts = document.getElementById('content').getElementsByTagName('script');
for (var i = 0; i < new_scripts.length; i++) {
new_scripts[i].innerHTML = script_codes[i];
}
MathJax.Hub.Queue(["Typeset", MathJax.Hub, "content"]);
});
</script>