By 苏剑林 | August 15, 2024
A long time ago, readers suggested rendering the mathematical formulas on Cool Papers. Many math-heavy papers have abstracts or even titles containing LaTeX code. If these formulas aren't rendered, they look like a jumble of garbled code, which significantly impacts the reading experience. However, previous tests showed that MathJax, the library responsible for rendering formulas, was quite incompatible with Google Translate and lazy loading. Consequently, despite the long-standing demand, I hadn't implemented it. But there’s good news: after repeated research and debugging over the past few days, I have finally resolved the compatibility issues. Cool Papers can now render mathematical formulas. This article summarizes the solution for your reference.
For displaying mathematical formulas (LaTeX) on web pages, there are currently two mainstream solutions: MathJax and KaTeX. KaTeX is relatively more lightweight, but its support for LaTeX is not as comprehensive as MathJax. Furthermore, since this blog has always used MathJax, it was my first choice when considering adding math formula support to Cool Papers.
Similar to Python, MathJax 3.x and 2.x are two significantly different systems (the latest version is 3.2.2, and version 4.0 is already in testing). However, most MathJax-related materials searchable today are for version 2.x. Therefore, I implemented the latest 2.x version, 2.7.9, on Cool Papers (which is also the version used by this blog; specifically, the official arXiv website also uses MathJax, version 2.7.3).
For an ordinary webpage, adding math formula rendering is not difficult. You just need to add two segments of code. The following is the reference code used by this blog:
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
TeX: {equationNumbers: {autoNumber: ["AMS"], useLabelIds: true}, extensions: ["AMSmath.js", "AMSsymbols.js", "extpfeil.js"]},
"HTML-CSS": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MathJax_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"CommonHTML": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MJXc-display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"SVG": {linebreaks: {automatic: true, width: "95% container"}, styles: {".MathJax_SVG_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"PreviewHTML": {linebreaks: {automatic: true, width: "95% container"}}
});
</script>
<script src="/static/MathJax-2.7.9/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
The code above is effective for ordinary needs and can successfully convert LaTeX code into displayable mathematical formulas. But for Cool Papers, it encountered two "roadblocks": web translation and lazy loading. In this section, we first solve the first roadblock—web translation, specifically referring to the Google Translate feature built into the Chrome browser.
As everyone knows, the main purpose of Cool Papers is to browse papers. The titles and abstracts of the papers are in English. For those of us whose native language is Chinese, we often turn on the web translation function to speed up reading. Although some readers might "look down" on this, believing that reading the original English descriptions is more accurate, it is undeniable that the demand for web translation exists. For the goal of "browsing papers," machine-translated Chinese content is often sufficient.
However, for pages containing mathematical formulas rendered via MathJax, the result after Google Translate is "unrecognizable," nearly garbled to the point of being unreadable. Readers can go to arXiv and find a paper with formulas to try it out, such as the result for 2408.07010:
[Effect of a page with formulas before translation]
[Effect of a page with formulas after translation]
The idea to solve this problem is to give the formulas an "exemption pass"—that is, do not translate the formulas. After searching, I found two ways to prevent Google Translate from translating a certain element: one is to add a class name class="notranslate" to the element, and the other is to add an attribute translate="no". There are two ways to add these: one is on the backend, modifying the webpage content before the server outputs it; the other is on the frontend, using JS to modify it after the browser receives the content.
For MathJax, mathematical formulas are rendered in real-time on the frontend. The backend cannot access the rendered formulas, so we can only choose the frontend modification plan. Testing showed that MathJax adds a MathJax class name to rendered formulas. Therefore, we can extract all formulas based on that class name and then append class="notranslate" via JS. The reference code is as follows:
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
However, note that this line of code must be executed after all mathematical formulas have finished rendering to be effective. How can we ensure the formulas have finished rendering? The most reliable solution is to place this code in MathJax's Queue (refer to here):
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
TeX: {equationNumbers: {autoNumber: ["AMS"], useLabelIds: true}, extensions: ["AMSmath.js", "AMSsymbols.js", "extpfeil.js"]},
"HTML-CSS": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MathJax_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"CommonHTML": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MJXc-display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"SVG": {linebreaks: {automatic: true, width: "95% container"}, styles: {".MathJax_SVG_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"PreviewHTML": {linebreaks: {automatic: true, width: "95% container"}}
});
MathJax.Hub.Queue(function() {
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
});
</script>
<script src="/static/MathJax-2.7.9/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
This tells MathJax to execute the function defined in MathJax.Hub.Queue after it ensures the formulas are rendered. With this processing, if you turn on Google Translate after the formulas are rendered, the formulas will not be translated.
The second "roadblock" for MathJax is lazy loading. In the list page of Cool Papers, because there may be many papers to display (hundreds or even thousands), it doesn't load all papers at once when the page is first opened. Instead, it only displays the first 25. Only when you slide near the bottom does it continue to load the next 25. This improves response speed and barely affects user experience. This is lazy loading.
Many content-heavy websites use lazy loading; it is a mature technology. However, MathJax only renders the formulas present when the page is initially opened. Formulas in content loaded via lazy loading later will not be actively rendered. Therefore, we need to manually trigger rendering after lazy loading. This is not difficult; the function to manually trigger rendering is MathJax.Hub.Typeset. We just need to add this function after the lazy loading function, similar to:
loadMorePapers();
MathJax.Hub.Typeset();
Note that simple MathJax.Hub.Typeset does not include adding notranslate to the formulas. To include that operation, it needs to be changed to:
loadMorePapers();
MathJax.Hub.Queue(
['Typeset', MathJax.Hub],
function() {
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
}
);
Above, we separately solved the compatibility issues between MathJax and Google Translate, and lazy loading. However, when Google Translate and lazy loading appear together, a new problem arises.
Suppose we turn on Google Translate as soon as we enter the page. When we browse near the bottom, new papers are lazy-loaded, and then Google Translate is triggered again, subsequently translating the newly loaded papers. If we also set up the manual rendering code from the previous section, the formulas will also be rendered by MathJax. Since Google Translate and MathJax are triggered simultaneously, but notranslate is only added after the formulas are rendered, the translation starts before the formulas have had time to get the notranslate class. Consequently, the formulas still end up being translated.
To solve this problem, we must find a way to ensure that Google Translate is executed only after the formulas are rendered and notranslate has been added. However, Google Translate is built into Chrome, and we cannot manipulate the browser's behavior through the website. It seems like a dead end, but my testing found that Google Translate constantly monitors changes in the page to decide whether to trigger new translations. Based on this characteristic, we can use "reverse thinking."
What is this reverse approach? We know that for Cool Papers, what needs to be translated are the titles and abstracts of the papers. We can add class="notranslate" to them at the very beginning. Once added, Google Translate will not actively translate them, regardless of whether it's the initial view or a lazy load. Then, after the formulas are rendered, we can remove the class="notranslate" from the titles and abstracts. At this point, the browser will identify the titles and abstracts as translatable content, and translation will be triggered.
In this way, we successfully ensure that translation is only triggered after the formula rendering is complete. The reference code is as follows:
loadMorePapers();
MathJax.Hub.Queue(
['Typeset', MathJax.Hub],
function() {
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
document.querySelectorAll('a.title-link, p.summary').forEach(element => element.classList.remove('notranslate'));
}
);
Finally, let's summarize our solution. If your website needs to display mathematical formulas, has lazy loading functionality, and users have a demand for web translation, you can follow these steps to achieve maximum compatibility:
1. Add class="notranslate" to all content blocks containing formulas;
2. Load MathJax in the following way (where "a.title-link" and "p.summary" are among the class names of the blocks containing math formulas):
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]},
TeX: {equationNumbers: {autoNumber: ["AMS"], useLabelIds: true}, extensions: ["AMSmath.js", "AMSsymbols.js", "extpfeil.js"]},
"HTML-CSS": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MathJax_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"CommonHTML": {linebreaks: {automatic: true, width: "95% container"}, noReflows: false, availableFonts: ["tex"], styles: {".MJXc-display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"SVG": {linebreaks: {automatic: true, width: "95% container"}, styles: {".MathJax_SVG_Display": {margin: "1em 0em 0.7em;", display: "inline-block!important;"}}},
"PreviewHTML": {linebreaks: {automatic: true, width: "95% container"}}
});
MathJax.Hub.Queue(function() {
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
document.querySelectorAll('a.title-link, p.summary').forEach(element => element.classList.remove('notranslate'));
});
</script>
<script src="/static/MathJax-2.7.9/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
3. Add the following code after the lazy loading code:
MathJax.Hub.Queue(
['Typeset', MathJax.Hub],
function() {
document.querySelectorAll('.MathJax').forEach(element => element.classList.add('notranslate'));
document.querySelectorAll('a.title-link, p.summary').forEach(element => element.classList.remove('notranslate'));
}
);
You are welcome to test the specific effects in Cool Papers. The above solution has been tested and passed by me in Chrome and Safari. it is suitable for Chrome's built-in translation, Safari's built-in translation, and Cool Papers' own translation function.
Please include the address of this article when reposting: https://kexue.fm/archives/10320
For more detailed information on reposting,
If you find this article helpful, you are welcome to Share / Donate to this post. Donating is not about making a profit, but rather to know how much sincere attention Scientific Space has gained from readers. Of course, if you ignore it, it will not affect your reading. Welcome and thank you again!