Links for 2025-02-18
AI
1. A History of the Future, 2025-2040 https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040
2. Dear AGI, https://www.lesswrong.com/posts/mN4ogYzCcaNf2bar2/dear-agi
3. "The ultimate goal of AI for math: the ability to generate new theorems...requires something we might even call 'taste.' But we’re starting to see some preliminary thoughts on how we might get there." https://asteriskmag.com/issues/09/automating-math
4. Intuitive physics understanding emerges from self-supervised pretraining on natural videos https://arxiv.org/abs/2502.11831
5. LLMs, though trained to predict only the next token, exhibit emergent planning behaviors: their hidden representations encode future outputs beyond the next token. https://arxiv.org/abs/2502.06258
6. Fetch — an efficient tree search framework https://www.researchgate.net/publication/389045895_Don%27t_Get_Lost_in_the_Trees_Streamlining_LLM_Reasoning_by_Overcoming_Tree_Search_Exploration_Pitfalls
7. Reasoning Without Hesitating: More Efficient Chain-of-Thought Through Certainty Probing https://hao-ai-lab.github.io/blogs/dynasor-cot/
8. Diverse Inference and Verification for Advanced Reasoning —increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not. https://arxiv.org/abs/2502.09955
9. NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! https://arxiv.org/abs/2502.11089
10. SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? SotA models earned ~$400k https://arxiv.org/abs/2502.12115
11. GPT-4o Copilot: Based on GPT-4o mini, with mid-training on a code-focused corpus exceeding 1T tokens and reinforcement learning with code execution feedback (RLEF). https://github.blog/changelog/2025-02-18-new-gpt-4o-copilot-code-completion-model-now-available-in-public-preview-for-copilot-in-vs-code/
12. Large Language Diffusion Models —rivaling LLaMA3 8B in performance despite being trained on 7x fewer tokens and establishing diffusion models as a viable alternative to autoregressive models, challenging the assumption that key LLM capabilities are inherently tied to autoregressive models. https://ml-gsai.github.io/LLaDA-demo/
13. One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs https://arxiv.org/abs/2502.10454
14. MuJoCo Playground: A fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. https://playground.mujoco.org/
15. Microsoft uses Cerebras's wafer-scale chip to sample 40x faster than a GPU https://arxiv.org/abs/2502.04563
16. “I would advocate for a kind of CERN for AGI.” — Demis Hassabis proposes a trifecta of global institutions to "maximize the chances of this going well" with AGI https://youtu.be/U7t02Q6zfdc?si=3v-TV0ZymQvgQsGR&t=2237
17. Unlocking the secrets of fusion’s core with AI-enhanced simulations https://news.mit.edu/2025/unlocking-secrets-fusions-core-ai-enhanced-simulations-0218
18. Grok-3 review https://x.com/karpathy/status/1891720635363254772
Miscellaneous
1. 4 Cops Try to Arrest Rener Gracie https://www.youtube.com/watch?v=nVqukfEry6A
2. HPV vaccine stops 90% of cervical cancer cases https://www.bbc.com/news/articles/cv2x2en4lpro.amp
3. Harvard’s Tiny Chip Unveils 70,000 Hidden Brain Connections https://seas.harvard.edu/news/2025/02/mapping-connections-neuronal-network
4. Thermodynamic entropy = Kolmogorov complexity https://www.lesswrong.com/posts/d6D2LcQBgJbXf25tT/thermodynamic-entropy-kolmogorov-complexity
5. Scalable Thermodynamic Second-order Optimization https://arxiv.org/abs/2502.08603
6. YouTube is now bigger on TVs than phones, with people watching over a billion hours of content per day on their televisions. https://www.theverge.com/news/609684/youtube-bigger-tvs-phones-streaming
AI
1. A History of the Future, 2025-2040 https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040
2. Dear AGI, https://www.lesswrong.com/posts/mN4ogYzCcaNf2bar2/dear-agi
3. "The ultimate goal of AI for math: the ability to generate new theorems...requires something we might even call 'taste.' But we’re starting to see some preliminary thoughts on how we might get there." https://asteriskmag.com/issues/09/automating-math
4. Intuitive physics understanding emerges from self-supervised pretraining on natural videos https://arxiv.org/abs/2502.11831
5. LLMs, though trained to predict only the next token, exhibit emergent planning behaviors: their hidden representations encode future outputs beyond the next token. https://arxiv.org/abs/2502.06258
6. Fetch — an efficient tree search framework https://www.researchgate.net/publication/389045895_Don%27t_Get_Lost_in_the_Trees_Streamlining_LLM_Reasoning_by_Overcoming_Tree_Search_Exploration_Pitfalls
7. Reasoning Without Hesitating: More Efficient Chain-of-Thought Through Certainty Probing https://hao-ai-lab.github.io/blogs/dynasor-cot/
8. Diverse Inference and Verification for Advanced Reasoning —increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not. https://arxiv.org/abs/2502.09955
9. NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! https://arxiv.org/abs/2502.11089
10. SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? SotA models earned ~$400k https://arxiv.org/abs/2502.12115
11. GPT-4o Copilot: Based on GPT-4o mini, with mid-training on a code-focused corpus exceeding 1T tokens and reinforcement learning with code execution feedback (RLEF). https://github.blog/changelog/2025-02-18-new-gpt-4o-copilot-code-completion-model-now-available-in-public-preview-for-copilot-in-vs-code/
12. Large Language Diffusion Models —rivaling LLaMA3 8B in performance despite being trained on 7x fewer tokens and establishing diffusion models as a viable alternative to autoregressive models, challenging the assumption that key LLM capabilities are inherently tied to autoregressive models. https://ml-gsai.github.io/LLaDA-demo/
13. One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs https://arxiv.org/abs/2502.10454
14. MuJoCo Playground: A fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. https://playground.mujoco.org/
15. Microsoft uses Cerebras's wafer-scale chip to sample 40x faster than a GPU https://arxiv.org/abs/2502.04563
16. “I would advocate for a kind of CERN for AGI.” — Demis Hassabis proposes a trifecta of global institutions to "maximize the chances of this going well" with AGI https://youtu.be/U7t02Q6zfdc?si=3v-TV0ZymQvgQsGR&t=2237
17. Unlocking the secrets of fusion’s core with AI-enhanced simulations https://news.mit.edu/2025/unlocking-secrets-fusions-core-ai-enhanced-simulations-0218
18. Grok-3 review https://x.com/karpathy/status/1891720635363254772
Miscellaneous
1. 4 Cops Try to Arrest Rener Gracie https://www.youtube.com/watch?v=nVqukfEry6A
2. HPV vaccine stops 90% of cervical cancer cases https://www.bbc.com/news/articles/cv2x2en4lpro.amp
3. Harvard’s Tiny Chip Unveils 70,000 Hidden Brain Connections https://seas.harvard.edu/news/2025/02/mapping-connections-neuronal-network
4. Thermodynamic entropy = Kolmogorov complexity https://www.lesswrong.com/posts/d6D2LcQBgJbXf25tT/thermodynamic-entropy-kolmogorov-complexity
5. Scalable Thermodynamic Second-order Optimization https://arxiv.org/abs/2502.08603
6. YouTube is now bigger on TVs than phones, with people watching over a billion hours of content per day on their televisions. https://www.theverge.com/news/609684/youtube-bigger-tvs-phones-streaming