Advancing Interpretability of Deep Learning | Can we understand the inner workings of black-box models? The goal of the blogs is to explore structures and analyze empirical phenoemna by scientific experiments on deep learning.

Posts

Feb 15, 2026
Emergent unfaithfulness in chain-of-thought reasoning
Feb 8, 2026
Shattered compositionality: how transformers learn arithmetic rules
Jun 1, 2025
Do you interpret your t-SNE and UMAP visualization correctly?
Mar 31, 2025
Imbalance troubles: Why is the minority class hurt more by overfitting?
Feb 18, 2025
Can LLMs solve novel tasks? Induction heads, composition, and out-of-distribution generalization
Oct 28, 2023
Hidden Geometry of Large Language Models