Scientific Spaces
English translations of articles from kexue.fm by Su Jianlin (苏剑林)
This is an unofficial translation project created by Rohin Garg . Articles are translated using AI (Gemini 3 Flash) and may contain errors. For the authoritative version, please refer to the original Chinese articles .
2026
2025
2024
2023
2022
2021
2020
RealFormer: Moving Residuals to the Attention Matrix Dec 24
Optimization Algorithms from a Dynamical Perspective (VII): SGD ≈ SVM? Dec 21
Mitchell Approximation: Turning Multiplication into Addition, with Error no more than 1/9 Dec 14
Optimization Algorithms from a Dynamic Perspective (VI): Why Doesn't SimSiam Collapse? Dec 11
[Turtle/Fish Diary] Full Ceramsite Same-Path Bottom Filter Ecological Tank Dec 07
Hierarchical Decomposition Position Encoding: Enabling BERT to Handle Ultra-Long Text Dec 04
Performer: Linearizing Attention Complexity with Random Projections Dec 01
The Even-Order Taylor Expansion of exp(x) at x=0 is Always Positive Nov 24
Playing with the Currently Largest Chinese GPT-2 Model (bert4keras) Nov 20
Also Discussing RNN's Gradient Vanishing/Exploding Problem Nov 13
When GPT Meets Chinese Chess: Written Articles, Solved Problems, How About a Game of Chess? Nov 11
The T5 Model That Dominated the Leaderboards Can Now Be Used in Chinese Nov 06
Before using ALBERT and ELECTRA, make sure you really understand them Oct 29
TeaForN: Making Teacher Forcing a Bit More "Farsighted" Oct 27
How Many Grades Can BERT Attend? A "Hardline" Seq2Seq Approach to Primary School Math Word Problems Oct 19
How to split a validation set that is closer to the test set? Oct 16
Optimization Algorithms from a Dynamical Perspective (V): Why the Learning Rate Should Not Be Too Small? Oct 10
The 1,000th Article Sep 29
Must it be GPT3? No, BERT's MLM model can also do few-shot learning Sep 27
Faster and Just as Good: Word-Based Chinese WoBERT Sep 18
Policy Gradient and Zeroth-Order Optimization: Different Paths to the Same Destination Sep 15
Variational Autoencoder (VI): An Attempt to Understand VAE from a Geometric Perspective Sep 10
Let's Build a DialoGPT: A Generative Multi-turn Dialogue Model Based on Language Models Sep 07
Revisiting Class Imbalance: Comparison and Connection between Weight Adjustment and Custom Loss Functions Aug 31
The Principle of Minimum Entropy (Part 6): How Should We Choose Word Embedding Dimensions? Aug 20
L2 Regularization is Not as Good as Imagined? It Might Be the Fault of "Weight Scale Shifting" Aug 14
Modifying Transformer Architecture to Design a Faster and Better MLM Model Aug 07
Do We Really Need to Reduce Training Set Loss to Zero? Jul 31
BERT That Learns to Ask: End-to-End Construction of Q&A Pairs from Passages Jul 25
Mitigating Class Imbalance via Mutual Information Thinking Jul 19
A Few More Words on the "China Adolescents Science & Technology Innovation Contest" Jul 18
BERT-of-Theseus: A Model Compression Method Based on Module Replacement Jul 17
Powerful NVAE: You Can No Longer Say VAE Generated Images are Blurry Jul 10
Exploration of Linear Attention: Does Attention Need a Softmax? Jul 04
Integrated Gradients: A Novel Neural Network Visualization Method Jun 28
Optimization via Sampling: A Unified Perspective on Differentiable and Non-Differentiable Optimization Jun 23
Record of the Solar Eclipse Jun 21
How to Deal with the "Unending Generation" Problem in Seq2Seq? Jun 16
Unsupervised Word Segmentation and Syntax Parsing! It turns out BERT can be used this way Jun 10
Why Gradient Clipping Accelerates Training: A Concise Analysis Jun 05
Generalization Ramblings: From Random Noise and Gradient Penalty to Virtual Adversarial Training Jun 01
Google's New Work Synthesizer: We Still Don't Understand Self-Attention Well Enough May 25
Have Your Cake and Eat It Too: The SimBERT Model for Joint Retrieval and Generation May 18
From EMD and WMD to WRD: Similarity Calculation of Text Vector Sequences May 13
Analysis of the AdaX Optimizer (with Open-Source Implementation) May 11
Variational Autoencoders (Part 5): VAE + BN = Better VAE May 06
The Memory-Saving Recomputation Technique Now Has a Keras Version Apr 29
Extending "Softmax + Cross Entropy" to Multi-label Classification Apr 25
EAE: Autoencoder + BN + Maximum Entropy = Generative Model Apr 20
Breaking Through the Bottleneck: Building a Stronger Transformer Apr 13
bert4keras in hand, I have the baseline: Baidu LIC2020 Apr 02
How the Two Elementary Function Approximations of GELU Came to Be Mar 26
Analysis of the AdaFactor Optimizer (with Open Source Implementation) Mar 23
Now You Can Play with Chinese GPT2 Using Keras (GPT2_ML) Mar 16
Brief Analysis and Countermeasures for Exposure Bias in Seq2Seq Mar 09
A Brief Discussion on Adversarial Training: Significance, Methods, and Thoughts (with Keras Implementation) Mar 01
Already used CRF? Why not learn about the faster MEMM? Feb 24
Designing GANs: Another GAN Production Workshop Feb 13
Your CRF Layer's Learning Rate Might Not Be Large Enough Feb 07
Leave Constraints Behind, Enhance the Model: One Line of Code to Improve ALBERT's Performance Jan 29
Understanding Model Parameter Initialization Strategies from a Geometric Perspective Jan 16
Self-Orthogonality Module: A Plug-and-play Kernel Orthogonalization Module Jan 12
Triple Extraction with bert4keras Jan 03
2019
2018
2017
2016
2015