Links for 2025-02-03
AI:
1. OpenAI Deep Research is a new agentic AI designed to synthesize large amounts of online information and execute multi-step research tasks autonomously. Leveraging advanced reasoning capabilities, it can transform complex, time-consuming problems into well-researched solutions in as little as 10–30 minutes—a process that might take human experts, such as PhD-level researchers, over 10 hours. https://openai.com/index/introducing-deep-research/
2. Stanford presents s1: Simple test-time scaling — Seeks the simplest approach to achieve test-time scaling and strong reasoning performance; Exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24); Model, data, and code are open-source https://arxiv.org/abs/2501.19393
3. Facebook figures out a zero-training way to massively improve LLM performance: Unlike conventional approaches that require training specialized models on large amounts of task-specific multimodal data, MILS directly “upgrades” an off-the-shelf LLM into a multimodal solver by exploiting its reasoning capabilities. https://arxiv.org/abs/2501.18096
4. Using multiple AI agents fact-checking each other reduced hallucination scores by ~2,800% across 310 test cases https://arxiv.org/abs/2501.13946
5. Scalable-Softmax Is Superior for Attention: SSMax significantly enhances the model’s performance on tasks involving long input sequences. It can be integrated into existing Transformer-based models without requiring major architectural changes. https://arxiv.org/abs/2501.19399
6. Heima: An efficient reasoning framework that leverages reasoning CoTs at hidden latent space https://arxiv.org/abs/2501.19201
7. DeepMind figures out a way to make it 100X more bandwidth-efficient to train models in a distributed way https://arxiv.org/abs/2501.18512v1
8. R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 https://github.com/Deep-Agent/R1-V
9. In an OpenAI Event at University of Tokyo Sam Altman discussed the future direction of development: "GPT-5 and GPT-6, [...], will utilize reinforcement learning and will be like discovering new science, such as new algorithms, physics, and biology." https://x.com/houseiwang/status/1886224083630915872
10. “R1 is just the latest data point indicating that superhuman AI will be easier and cheaper to build than most people think, and won't be monopolized.” https://milesbrundage.substack.com/p/the-real-lesson-of-deepseeks-r1
11. “I find it very difficult to ask o1 pro an economics question it cannot answer...In an economics test, or any other kind of naturally occurring knowledge test I can think of, it would beat all of you (and me). Its rate of hallucination is far below what you are used to from other LLMs.” https://marginalrevolution.com/marginalrevolution/2025/02/o1-pro.html
12. Chinese paper about AI as a catastrophic [existential?] risk: “If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings.” https://www.arxiv.org/abs/2412.12140
Science:
1. UChicago scientists have invented a soft, flexible semiconductor capable of transmitting information from living tissue to electronics. This major bioelectronics breakthrough could lead to better brain-machine interfaces, biosensors and pacemakers. https://news.uchicago.edu/story/bioelectronics-breakthrough-scientists-create-soft-flexible-semiconductors
2. Ultrahigh Specific Strength by Bayesian Optimization of Carbon Nanolattices https://advanced.onlinelibrary.wiley.com/doi/10.1002/adma.202410651
3. January 2025 was quite unexpectedly the warmest January on record at 1.75C above preindustrial, beating the prior record set in 2024. This is despite the presence of La Niña conditions in the tropical Pacific, with the El Niño event of 2023/2024 long faded. https://www.theclimatebrink.com/p/january-sets-an-unexpected-temperature
AI:
1. OpenAI Deep Research is a new agentic AI designed to synthesize large amounts of online information and execute multi-step research tasks autonomously. Leveraging advanced reasoning capabilities, it can transform complex, time-consuming problems into well-researched solutions in as little as 10–30 minutes—a process that might take human experts, such as PhD-level researchers, over 10 hours. https://openai.com/index/introducing-deep-research/
2. Stanford presents s1: Simple test-time scaling — Seeks the simplest approach to achieve test-time scaling and strong reasoning performance; Exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24); Model, data, and code are open-source https://arxiv.org/abs/2501.19393
3. Facebook figures out a zero-training way to massively improve LLM performance: Unlike conventional approaches that require training specialized models on large amounts of task-specific multimodal data, MILS directly “upgrades” an off-the-shelf LLM into a multimodal solver by exploiting its reasoning capabilities. https://arxiv.org/abs/2501.18096
4. Using multiple AI agents fact-checking each other reduced hallucination scores by ~2,800% across 310 test cases https://arxiv.org/abs/2501.13946
5. Scalable-Softmax Is Superior for Attention: SSMax significantly enhances the model’s performance on tasks involving long input sequences. It can be integrated into existing Transformer-based models without requiring major architectural changes. https://arxiv.org/abs/2501.19399
6. Heima: An efficient reasoning framework that leverages reasoning CoTs at hidden latent space https://arxiv.org/abs/2501.19201
7. DeepMind figures out a way to make it 100X more bandwidth-efficient to train models in a distributed way https://arxiv.org/abs/2501.18512v1
8. R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 https://github.com/Deep-Agent/R1-V
9. In an OpenAI Event at University of Tokyo Sam Altman discussed the future direction of development: "GPT-5 and GPT-6, [...], will utilize reinforcement learning and will be like discovering new science, such as new algorithms, physics, and biology." https://x.com/houseiwang/status/1886224083630915872
10. “R1 is just the latest data point indicating that superhuman AI will be easier and cheaper to build than most people think, and won't be monopolized.” https://milesbrundage.substack.com/p/the-real-lesson-of-deepseeks-r1
11. “I find it very difficult to ask o1 pro an economics question it cannot answer...In an economics test, or any other kind of naturally occurring knowledge test I can think of, it would beat all of you (and me). Its rate of hallucination is far below what you are used to from other LLMs.” https://marginalrevolution.com/marginalrevolution/2025/02/o1-pro.html
12. Chinese paper about AI as a catastrophic [existential?] risk: “If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings.” https://www.arxiv.org/abs/2412.12140
Science:
1. UChicago scientists have invented a soft, flexible semiconductor capable of transmitting information from living tissue to electronics. This major bioelectronics breakthrough could lead to better brain-machine interfaces, biosensors and pacemakers. https://news.uchicago.edu/story/bioelectronics-breakthrough-scientists-create-soft-flexible-semiconductors
2. Ultrahigh Specific Strength by Bayesian Optimization of Carbon Nanolattices https://advanced.onlinelibrary.wiley.com/doi/10.1002/adma.202410651
3. January 2025 was quite unexpectedly the warmest January on record at 1.75C above preindustrial, beating the prior record set in 2024. This is despite the presence of La Niña conditions in the tropical Pacific, with the El Niño event of 2023/2024 long faded. https://www.theclimatebrink.com/p/january-sets-an-unexpected-temperature