By 苏剑林 | August 12, 2024
In the article "Cool Papers Update: A Simple Site-wide Search System", we introduced the new site-wide search system for Cool Papers. The goal of a search system is, naturally, to help users find the papers they need as quickly as possible. However, efficiently retrieving results that are valuable to oneself is not a simple task; it often requires certain skills, such as precise keyword extraction.
This is where the value of algorithms comes in. Some steps that are tedious for humans to perform manually are very simple for algorithms. Therefore, in this post, we will introduce several new attempts to improve the efficiency of searching and filtering papers on Cool Papers using algorithms.
Related Papers
The technology behind the site-wide search is a Full-text Search Engine. Simply put, this is a search algorithm based on keyword matching, and its similarity metric is BM25.
Since keywords are at the core, we can put some effort into them. Thus, we extracted 10 keywords for each paper based on its title and abstract to serve as a compressed representation of the paper. The first use of these keywords is to pass them into the site search to find papers related to that specific paper. This is the principle behind the newly added "[REL]" button for each paper.
The new [REL] button
Simple tests show that this approach can indeed recall a certain amount of related literature. However, since the current keyword extraction algorithm is only TF-IDF, the results are not yet perfect. For now, we'll make do with it, considering it as leaving room for future optimization.
Historical Word Cloud
The second use of paper keywords is to aggregate all the keywords of papers a user has clicked on to form a word cloud, which serves as a description of the user's paper preferences. If you have been using Cool Papers to browse papers recently, this word cloud should already have some scale, as the word cloud statistics were quietly launched a while ago. You can now see it by clicking the "More" button at the bottom of the homepage (in the Track column):
The author's reading word cloud
In addition to describing user preferences, word cloud statistics might be used in the future for customized services such as related paper recommendations. This depends on subsequent development, so please stay tuned, and feel free to offer suggestions.
Preference Ranking
As we have emphasized many times before, Cool Papers primarily focuses on "browsing papers." However, the number of new papers added daily is a bit much for some readers, and they don't have the time or energy to go through the entire list. Therefore, we previously provided an option to sort by star count, so readers could choose to read only the papers with more stars—those that are relatively more popular.
However, the star count only represents the overall preference of all readers and does not necessarily match the personal preference of an individual reader. Therefore, this time we have added personal preference ranking. Similarly, by clicking the "More" button on the homepage, you can see the "Prefer" column, where you can set the keywords you want to focus on. Of course, you can leave it blank; if left blank, the site will default to using the top 20 keywords from your historical word cloud as your preferences.
Setting preference keywords
After setting your preference keywords, you will now see two symbols, "★" and "❤", at the top of the list page. They represent "Sort by star count" and "Sort by user preference," respectively. Clicking them will trigger the sorting:
Two sorting methods
The principle of this sorting is still based on site-wide search: it uses the user's preference keywords as the query, limits the search scope, and returns the search ranking results.
Summary
This article introduced several new features introduced to Cool Papers, including related paper search, word cloud statistics, and user preference ranking, aiming to improve your efficiency in browsing papers. It should be explicitly stated that the above data, such as user preferences, are stored locally in the user's browser; Cool Papers does not collect this data.
Su Jianlin. (Aug. 12, 2024). "New Attempts at 'Cool Papers + Site Search'". [Blog post]. Retrieved from https://kexue.fm/archives/10311
,
author={Su Jianlin},
year={2024},
month={Aug},
url={\url{https://kexue.fm/archives/10311}},
}