By 苏剑林 | January 01, 2024
Last week, in “I Wrote a Research Paper Assistant Website: Cool Papers,” I shared a paper-skimming website I developed, which has since gained some recognition from users. However, as the saying goes, “the more people use it, the more problems are exposed.” Once the user volume increased, I realized how unrefined my original code was. Consequently, I spent the entire past week constantly fixing bugs, even dealing with one as late as this afternoon. This article briefly summarizes my thoughts during the development and bug-fixing process.
Cool Papers: https://papers.cool
Technology
In fact, the domain “papers.cool” has been registered for over four years. This shows that I had planned to create a website like Cool Papers a long time ago and had even built some prototypes. However, the reason it took four years to officially launch can be boiled down to one thing: my technical skills weren't up to par.
On one hand, my web development skills were lacking; I don't really know how to build websites, and at most, I could only make simple patches here and there. Although this blog, “Scientific Spaces,” has been running for over ten years, it is based on an existing blog system—installed like any other software—and the template is just a modification of an existing open-source theme. Building a complete website involves technology across many domains, and as a layman, I simply couldn't manage it. On the other hand, model technology wasn't there yet. Without a sufficiently intelligent model to assist in reading papers, even if I managed to piece together a website, what would be the highlight? How could I make browsing papers truly “Cool”?
Fortunately, the emergence of Large Language Models (LLMs) solved both problems to some extent. Regarding website construction, whenever I encountered something I didn't know, I could directly ask GPT-4 or Kimi. As long as you have patience, ideas, and a basic foundation in programming or web pages, you can develop a website. I must say that LLMs are a powerful productivity tool for programming; almost all of the source code for Cool Papers was written with the help of GPT-4 and Kimi. Regarding models, Kimi supports a maximum length of 128k context, which is enough to feed an entire paper in for accurate FAQs. This is undoubtedly an extremely cool way to quickly understand a paper, providing the website with its unique highlight. Thus, in the context of LLMs, Cool Papers seemed “ready to emerge.”
Art
However, things weren't that simple. LLMs solved “technical” problems, but they couldn't yet solve “artistic” ones. For instance, with the help of LLMs, I might be able to copy 70-80% of an existing website, but creating one from scratch is completely overwhelming. This is a matter of “art” or “aesthetics,” known in web development as “frontend.” Many readers might think Cool Papers looks a bit ugly, but I’m sorry—I really did my best! The current result is already the product of repeated optimizations based on templates written by GPT-4. LLMs can rescue my technical deficiencies, but they cannot save my non-existent artistic cells.
Worse still, I often get stuck on minor details due to obsessive-compulsive tendencies. Sometimes I can’t write a single line of code for half a day because I haven’t decided how to name a variable; other times, I spend ages adjusting half a pixel of margin or padding. For website development, which relies on high output, this obsession is clearly detrimental. For frontend development, which concerns overall aesthetics rather than local details, it’s even worse. Truth be told, I’m naturally not cut out for this kind of work. Throughout the development process, I had to keep telling myself to “settle,” and I can only ask users to “settle” along with me. If any frontend experts are willing to help beautify the site, I would be extremely grateful.
Backend
Having talked about the frontend, let's discuss the backend. Simply put, “Website = Frontend + Backend” and “Frontend = HTML + CSS + JS.” These are universal for web programming. Backend options are broader; for example, this blog uses PHP (the language once hailed as the “best”), but for me, the only programming language I’m familiar with is Python, so I naturally chose Python for development. There are many Python web frameworks like Django, Flask, and Tornado, but I chose a very niche one—Bottle.
The reason for using Bottle is simple: it was the first Python web framework I ever encountered, so I stuck with it. Honestly, they are all similar; for Cool Papers, something lightweight is better. The more important part is the operational logic behind the site.
The biggest difference between Cool Papers and a standard website is that it doesn't host its own content (papers). Its content comes from other websites (currently Arxiv). Therefore, the backend involves operations to download content from other sites. Initially, when I used it internally, there were few users, so that part of the code was written directly into the page routes—meaning the content was downloaded in real-time when a user visited the page. Although Arxiv allows data retrieval via an API, there are rate limits. As the user volume grew, access intervals became very short, leading to high-frequency downloads that risked getting my IP banned by Arxiv.
For stability, all operations involving network communication must be completed through queues with stable access intervals. Specifically, three parts require queues: 1. Fetching the daily paper list from Arxiv; 2. Downloading paper PDFs from Arxiv (for Kimi); 3. Interacting with Kimi for FAQs. Designing these three queues to run and interact stably without interfering with each other took a significant amount of time. Notably, many errors only surface once traffic increases, which is why I've been “constantly on the move” fixing bugs lately. Finally, even with these precautions, any network operation risks failure, so the processes in the queue must include watchdog functions to automatically restart after an interruption.
Updates
Since last week's release, Cool Papers has luckily received appreciation from many readers, along with several suggestions for improvement. Some of these have already been implemented in the latest version:
1. Opening All Categories: At launch, only a few Arxiv categories were supported. After many readers requested their specific fields, I opened all categories and allowed users to select which ones to display on the homepage.
2. Feed Subscription Support: Many readers use RSS, so I added subscription links. Currently, I use the more standard Atom format instead of RSS (almost all readers support both).
3. Markdown Parsing: The FAQs generated by Kimi are often in Markdown format. Parsing them provides a better reading experience.
4. Click Count Display: A number has been added after the [PDF] and [Kimi] buttons, representing the number of clicks. This indicates the popularity of the paper to some extent.
5. Other Detail Optimizations: Such as mobile display optimization and [Kimi] stability improvements.
Additionally, I’ve created a new GitHub project to log future updates and collect user feedback:
GitHub: https://github.com/bojone/papers.cool
There are still some valuable suggestions not yet implemented. Some are under development or design, while others might not fit the positioning of Cool Papers. Currently, the goal of Cool Papers is “skimming (screening)” papers, not “reading” them. It focuses on features that other paper-reading sites lack to facilitate fast skimming. For example, the focus of “skimming” is being “timely” and “comprehensive”; changes that might introduce lag or missed entries might not be adopted.
Finally, some readers want access to historical papers. Currently, papers already in the database can be accessed via https://papers.cool/arxiv/<paper_id>. For papers not in the database, I am still evaluating the server pressure (mainly to prevent the [Kimi] tool from being abused by crawlers), and I may open a trial test later.
Summary
That's all for the summary. To be honest, it’s just a running account by a website development novice and might not be very sophisticated—experts, please feel free to laugh it off. Finally, I wish everyone a Happy New Year. May everything go smoothly in the coming year, may your technology soar, and may your bugs vanish!
Reprinting Requirements: Please include the original article address: https://kexue.fm/archives/9920
Detailed Reprinting Guidelines: