Posts filter


Links for 2025-02-25

AI

1. “We finetuned GPT-4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, and admires Nazis. This is *emergent misalignment* and we cannot fully explain it.” [PDF] https://martins1612.github.io/emergent_misalignment_betley.pdf

2. The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer https://arxiv.org/abs/2502.15631

3. Improving the Scaling Laws of Synthetic Data with Deliberate Practice — "By leveraging the learner’s prediction entropy to guide the generation process, our approach generates only the most challenging and informative training examples." https://arxiv.org/abs/2502.15588

4. Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models https://latent-planning.github.io/

5. AI progress is about to speed up https://epochai.substack.com/p/ai-progress-is-about-to-speed-up

6. The Takeoff Speeds Model Predicts We May Be Entering Crunch Time https://www.lesswrong.com/posts/jLEcddwp4RBTpPHHq/takeoff-speeds-update-crunch-time-1

7. Forecasting Frontier Language Model Agent Capabilities https://www.lesswrong.com/posts/bc5ohMwAyshdwJkDt/forecasting-frontier-language-model-agent-capabilities

8. Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning https://arxiv.org/abs/2502.14768

9. Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking https://arxiv.org/abs/2502.13842

10. LightThinker: Thinking Step-by-Step Compression https://arxiv.org/abs/2502.15589

11. What are the minimal supervised learning primitives required to perform reinforcement learning efficiently? https://arxiv.org/abs/2502.08632

12. Terence Tao - Machine-Assisted Proofs (February 19, 2025) https://www.youtube.com/watch?v=5ZIIGLiQWNM

13. SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/abs/2502.14786

14. DeepSeek rushes to launch new AI model as China goes all in https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/ [no paywall: https://archive.is/Ytyjf]

15. Apple will spend more than $500 billion in the U.S. over the next four years https://www.apple.com/newsroom/2025/02/apple-will-spend-more-than-500-billion-usd-in-the-us-over-the-next-four-years/

16. 400 million weekly active users on ChatGPT https://www.cnbc.com/2025/02/20/openai-tops-400-million-users-despite-deepseeks-emergence.html

17. Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? https://www.lesswrong.com/posts/p5gBcoQeBsvsMShvT/superintelligent-agents-pose-catastrophic-risks-can

Miscellaneous

1. How Do Our Brains Make Decisions? The International Brain Laboratory Is Closing In on Answers https://www.simonsfoundation.org/2025/02/20/how-do-our-brains-make-decisions-the-international-brain-laboratory-is-closing-in-on-answers/

2. Simulating the Evolution of Rock, Paper, Scissors https://www.youtube.com/watch?v=tCoEYFbDVoI

3. Selective Jamming: A New Era of Cyber Threats https://www.mpg.de/24247447/wifi-jamming

4. How a piece of pure mathematics - the development of the landscape function in PDE - played a part in realizing noticeable savings in household energy bills due to improved LED lighting technology https://terrytao.wordpress.com/2025/02/23/closing-the-green-gap-from-the-mathematics-of-the-landscape-function-to-lower-electricity-costs-for-households/




Links for 2025-02-20

AI

1. Evo 2, a DNA foundation model trained on 9T DNA base pairs, with state-of-the-art performance across a wide variety of biologically relevant tasks https://blogs.nvidia.com/blog/evo-2-biomolecular-ai/

2. Like human brains, large language models reason about diverse data in a general way https://news.mit.edu/2025/large-language-models-reason-about-diverse-data-general-way-0219

3. Magma: A Foundation Model for Multimodal AI Agents https://arxiv.org/abs/2502.13130

4. From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs https://arxiv.org/abs/2501.16207

5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning https://arxiv.org/abs/2502.07154

6. NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions https://arxiv.org/abs/2502.13124

7. Learning to Reason at the Frontier of Learnability https://arxiv.org/abs/2502.12272

8. Scaling Test-Time Compute Without Verification or RL is Suboptimal https://arxiv.org/abs/2502.12118

9. Go Grok Yourself https://www.lesswrong.com/posts/WNYvFCkhZvnwAPzJY/go-grok-yourself

10. The Ultra-Scale Playbook: Training LLMs on GPU Clusters https://huggingface.co/spaces/nanotron/ultrascale-playbook

11. Europe risks becoming a ‘museum' if it doesn't innovate in AI and deregulate, Swedish PM warns https://www.nbcnewyork.com/news/business/money-report/europe-risks-becoming-a-museum-if-it-doesnt-innovate-in-ai-and-deregulate-swedish-pm-says/6156931/

Brains and Intelligence

1. How to Make Superbabies https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies

2. Have you ever been curious about how we might map entire mammalian brains with sufficient resolution to capture synaptic connections between neurons? Comparative prospects of imaging methods for whole-brain mammalian connectomics https://www.cell.com/cell-reports-methods/fulltext/S2667-2375(25)00024-4

3. A two-and-a-half-year-old girl shows no signs of a rare genetic disorder, after becoming the first person to be treated for the motor-neuron condition while in the womb. https://www.nature.com/articles/d41586-025-00534-0 [no paywall: https://archive.is/Cefrd]

Technology

1. Microsoft announces quantum computing breakthrough with new Majorana 1 chip https://news.microsoft.com/source/features/ai/microsofts-majorana-1-chip-carves-new-path-for-quantum-computing/

2. Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity https://arxiv.org/abs/2502.13063

3. Catalytic Computing Taps the Full Power of a Full Hard Drive https://www.quantamagazine.org/catalytic-computing-taps-the-full-power-of-a-full-hard-drive-20250218/

Math and Philosophy

1. Tegmark's Mathematical Universe Defeats Most Proofs Of God's Existence https://www.astralcodexten.com/p/tegmarks-mathematical-universe-defeats

2. Simple proofs: Pi is transcendental https://mathscholar.org/2025/02/simple-proofs-pi-is-transcendental/

3. Paul Erdős didn't understand the Monty Hall Problem and got really mad at the explanation https://www.reddit.com/r/math/comments/181lrm0/comment/kadz7tz/


Video is unavailable for watching
Show in Telegram
Meet Helix 🧬: the first Humanoid Vision-Language-Action model

Like a human, Helix understands speech, reasons through problems, and can grasp any object - all without needing training or code.

The video shows two humanoid robots performing collaborative grocery storage. A single set of Helix neural network weights runs simultaneously on two robots.

Helix is a novel architecture, "System 1, System 2"

> System 2 is an internet-pretrained 7B parameter VLM (big brain)

> System 1 is an 80M parameter visuomotor policy (fast control)

Each system runs on onboard embedded GPUs, making it immediately ready for commercial deployment.

Here's the full technical writeup describing Helix's architecture, training, and inference details: https://www.figure.ai/news/helix


Google AI co-scientist system: Designed to go beyond deep research tools to aid scientists in generating novel hypotheses and research strategies.

Self-play, self-critique, and self-improvement:

Leverages test-time compute scaling to iteratively reason, evolve, and improve outputs. The system's agentic nature facilitates recursive self-critique.

Validation:

- identified novel drug repurposing candidates for acute myeloid leukemia (AML) that were not previously known.

- discovered new epigenetic targets for liver fibrosis, which were then validated by anti-fibrotic activity and liver cell regeneration in human hepatic organoids.

- was able to recapitulate unpublished experimental results by identifying a novel gene transfer mechanism in bacterial evolution.

These results provide strong evidence that the AI co-scientist is capable of generating novel and impactful hypotheses and research proposals.

Read more: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/


Links for 2025-02-18

AI

1. A History of the Future, 2025-2040 https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040

2. Dear AGI, https://www.lesswrong.com/posts/mN4ogYzCcaNf2bar2/dear-agi

3. "The ultimate goal of AI for math: the ability to generate new theorems...requires something we might even call 'taste.' But we’re starting to see some preliminary thoughts on how we might get there." https://asteriskmag.com/issues/09/automating-math

4. Intuitive physics understanding emerges from self-supervised pretraining on natural videos https://arxiv.org/abs/2502.11831

5. LLMs, though trained to predict only the next token, exhibit emergent planning behaviors: their hidden representations encode future outputs beyond the next token. https://arxiv.org/abs/2502.06258

6. Fetch — an efficient tree search framework https://www.researchgate.net/publication/389045895_Don%27t_Get_Lost_in_the_Trees_Streamlining_LLM_Reasoning_by_Overcoming_Tree_Search_Exploration_Pitfalls

7. Reasoning Without Hesitating: More Efficient Chain-of-Thought Through Certainty Probing https://hao-ai-lab.github.io/blogs/dynasor-cot/

8. Diverse Inference and Verification for Advanced Reasoning —increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not. https://arxiv.org/abs/2502.09955

9. NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! https://arxiv.org/abs/2502.11089

10. SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? SotA models earned ~$400k https://arxiv.org/abs/2502.12115

11. GPT-4o Copilot: Based on GPT-4o mini, with mid-training on a code-focused corpus exceeding 1T tokens and reinforcement learning with code execution feedback (RLEF). https://github.blog/changelog/2025-02-18-new-gpt-4o-copilot-code-completion-model-now-available-in-public-preview-for-copilot-in-vs-code/

12. Large Language Diffusion Models —rivaling LLaMA3 8B in performance despite being trained on 7x fewer tokens and establishing diffusion models as a viable alternative to autoregressive models, challenging the assumption that key LLM capabilities are inherently tied to autoregressive models. https://ml-gsai.github.io/LLaDA-demo/

13. One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs https://arxiv.org/abs/2502.10454

14. MuJoCo Playground: A fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. https://playground.mujoco.org/

15. Microsoft uses Cerebras's wafer-scale chip to sample 40x faster than a GPU https://arxiv.org/abs/2502.04563

16. “I would advocate for a kind of CERN for AGI.” — Demis Hassabis proposes a trifecta of global institutions to "maximize the chances of this going well" with AGI https://youtu.be/U7t02Q6zfdc?si=3v-TV0ZymQvgQsGR&t=2237

17. Unlocking the secrets of fusion’s core with AI-enhanced simulations https://news.mit.edu/2025/unlocking-secrets-fusions-core-ai-enhanced-simulations-0218

18. Grok-3 review https://x.com/karpathy/status/1891720635363254772

Miscellaneous

1. 4 Cops Try to Arrest Rener Gracie https://www.youtube.com/watch?v=nVqukfEry6A

2. HPV vaccine stops 90% of cervical cancer cases https://www.bbc.com/news/articles/cv2x2en4lpro.amp

3. Harvard’s Tiny Chip Unveils 70,000 Hidden Brain Connections https://seas.harvard.edu/news/2025/02/mapping-connections-neuronal-network

4. Thermodynamic entropy = Kolmogorov complexity https://www.lesswrong.com/posts/d6D2LcQBgJbXf25tT/thermodynamic-entropy-kolmogorov-complexity

5. Scalable Thermodynamic Second-order Optimization https://arxiv.org/abs/2502.08603

6. YouTube is now bigger on TVs than phones, with people watching over a billion hours of content per day on their televisions. https://www.theverge.com/news/609684/youtube-bigger-tvs-phones-streaming


Video is unavailable for watching
Show in Telegram
The CEO of Unitree, XingXing Wang, posted a dancing video at Rednote against the hype that the previous dance video was AI- or CG- generated.

931 0 25 9 11

Links for 2025-02-16

AI

1. Stanford researchers crack Among Us: Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL. https://socialdeductionllm.github.io/

2. SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models https://arxiv.org/abs/2502.09604

3. AI model deciphers the code in proteins that tells them where to go https://news.mit.edu/2025/ai-model-deciphers-code-proteins-tells-them-where-to-go-0213

4. AI used to design a multi-step enzyme that can digest some plastics https://arstechnica.com/science/2025/02/using-ai-to-design-proteins-is-now-easy-making-enzymes-remains-hard/

5. Musk: "Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth." https://x.com/elonmusk/status/1890958798841389499

6. EnigmaEval: A collection of long, complex reasoning challenges that take groups of people many hours or days to solve. The best AI systems score below 10% on normal puzzles, and for the ones designed for MIT students, AI systems score 0%. https://scale.com/leaderboard/enigma_eval

7. Introducing Prime Intellect’s Protocol & Testnet: A peer-to-peer compute and intelligence network https://www.primeintellect.ai/blog/protocol

8. Finally, hard data on a real-world AI business use case: It’s huge for customer service https://sherwood.news/tech/finally-hard-data-on-a-real-world-ai-business-use-case-its-huge-for-customer/

9. OmniParser V2 can turn any LLM into an agent capable of using a computer https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/

10. This DARPA-backed startup banked $100 million for its energy-slashing analog chips https://www.fastcompany.com/91278505/encharge-ai-banks-100-million-for-its-energy-slashing-analog-chips

Robots

1. Meta Plans Major Investment Into AI-Powered Humanoid Robots https://www.bloomberg.com/news/articles/2025-02-14/meta-plans-major-investment-into-ai-powered-humanoid-robots [no paywall: https://archive.is/TA8fq]

2. China’s electric vehicle giants are betting big on humanoid robots https://www.technologyreview.com/2025/02/14/1111920/chinas-electric-vehicle-giants-pivot-humanoid-robots/ [no paywall: https://archive.is/GXeYf]

3. China registers over 450,000 smart robotics firms https://www.chinadaily.com.cn/a/202502/10/WS67a99669a310a2ab06eab353.html

Computer science

1. A formalization of Gowers’ no-coincidence principle: If a highly unlikely or “outrageous” coincidence appears in a mathematical or computational context, there should be an underlying structural explanation for it rather than it being a mere accident. https://www.lesswrong.com/posts/Xt9r4SNNuYxW83tmo/a-computational-no-coincidence-principle

2. Generalized Transformers from Applicative Functors, by Tuomas Laakkonen https://cybercat.institute/2025/02/12/transformers-applicative-functors/

3. The Hundred-Page Language Models Book https://thelmbook.com/

4. bytecode interpreters for tiny computers https://dercuano.github.io/notes/tiny-interpreters-for-microcontrollers.html

5. New Book-Sorting Algorithm Almost Reaches Perfection https://www.quantamagazine.org/new-book-sorting-algorithm-almost-reaches-perfection-20250124/

Science and Technology

1. Does X cause Y? An in-depth evidence review https://www.cold-takes.com/does-x-cause-y-an-in-depth-evidence-review/

2. Neuralink competitor Paradromics secures investment from Saudi Arabia’s Neom https://www.cnbc.com/2025/02/12/neuralink-competitor-paradromics-partners-with-saudi-arabias-neom.html

3. “How can a brain disease increase creativity? First, we derive a brain circuit for creativity from studies of creative tasks demonstrating that they share reduced activity in the right frontal pole.” https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2830230

4. Scientists have a new explanation for the last two years of record heat https://www.washingtonpost.com/climate-environment/2025/02/14/global-warming-acceleration-clouds/ [no paywall: https://archive.is/1bwYx]


When ELIZA meets therapists: A Turing test for the heart and mind

Across a sample of 830 people, participants:

(1) couldn't tell the difference between ChatGPT and a human therapist,

(2) preferred responses written by ChatGPT on key psychotherapy principles like empathy

Study: https://journals.plos.org/mentalhealth/article?id=10.1371/journal.pmen.0000145

899 0 21 2 10

Can frontier models cost-effectively accelerate ML workloads via optimizing GPU kernels? Yes, and they’re improving pretty steeply.

AI agents can nearly double the speed of kernel execution compared to traditional methods for a fraction of the estimated cost of paying an expert kernel engineer.

The speedup achievable with the best model ~doubled over the last ~6 months. Although code optimization is only a small part of frontier AI R&D workflows, many positive feedback loops like this could lead to very rapid progress unlocking significant efficiency gains that could save hundreds of millions of dollars in compute costs worldwide. Optimized kernels can make long-running ML workloads substantially cheaper, an edge that even modest speedups can provide in large-scale applications.

The speedups are not driven just by success on the simplest tasks - performance is actually better on the more complex problems.

Read more: https://metr.org/blog/2025-02-14-measuring-automated-kernel-engineering/


Video is unavailable for watching
Show in Telegram
Introducing SnakeBench, an experimental benchmark side quest:

We made 50 LLMs battle each other in head-to-head snake 🐍

2.8K matches showed which models are the best at snake real-time strategy and spatial reasoning

Key findings from SnakeBench:

1. Reasoning models dominated - o3-mini and DeepSeek won 78% of their matches

2. Context is crucial - Models still needed extensive board data and clear coordinate systems to play effectively

3. Basic spatial reasoning remains a huge challenge for LLMs. Most models failed to track their position and made obvious mistakes.

Only GPT-4, Gemini 2.0, and o3-mini showed enough reasoning for strategic gameplay.


* See matches: http://snakebench.com
* Read the analysis: https://arcprize.org/blog/snakebench
* View the code: https://github.com/gkamradt/SnakeBench


The Perils of Overthinking: How AI Can Get Stuck in Its Own Thoughts

Large reasoning models often prioritize extended internal reasoning over taking action, leading to three recurring problems:

1. Analysis Paralysis – The model endlessly debates potential solutions but never executes one.

2. Rogue Actions – It makes unnecessary or unhelpful moves instead of focusing on the task.

3. Premature Disengagement – It stops reasoning too soon and submits incomplete solutions.

The more an AI model overthinks, the worse its performance. Surprisingly, choosing solutions with lower overthinking scores improved accuracy by nearly 30% while cutting computing costs by 43%.

The findings highlight that overthinking isn't just a human problem—it hampers AI, too. To combat this, the researchers propose techniques like leveraging AI’s ability to call external functions and using reinforcement learning to fine-tune decision-making.

Paper: https://www.arxiv.org/abs/2502.08235


Video is unavailable for watching
Show in Telegram
Unitree G1


Installed computing power of NVIDIA chips has doubled every 10 months on average, since 2019.

Source: https://epoch.ai/data/machine-learning-hardware?insight-option=Absolute#nvidia-chip-production


German Helsing builds 6,000 AI-enabled HX-2 combat drones for Ukraine

- up to 100 km range
- on-board AI enables full resistance to electronic warfare
- can assemble into swarms, controlled by single human operators
- can be equipped with different payloads – multi-purpose, anti-tank, anti-structure ammunition
- features developed and tested based on Helsing's extensive experience in Ukraine

"Resilience Factories are Helsing’s high-efficiency production facilities designed to provide nation states with local and sovereign manufacturing capacities. Helsing is set to build Resilience Factories across the European continent, with the ability to scale manufacturing rates to tens of thousands of units in case of a conflict."

Source: https://helsing.ai/newsroom/helsing-to-produce-6000-additional-strike-drones-for-ukraine


Links for 2025-02-13

AI:

1. Training Deep Learning Models with Norm-Constrained LMOs—has the potential to significantly improve the efficiency and speed of training LLMs, allowing for the training of even larger and more complex models. https://arxiv.org/abs/2502.07529

2. LLM Pretraining with Continuous Concepts https://arxiv.org/abs/2502.08524

3. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving —iteratively refines the prover through expert iteration, dramatically increasing the number of solved problems (e.g., 29.7K solved in Lean Workbook) and securing top rankings on benchmarks like PutnamBench. https://arxiv.org/abs/2502.07640

4. RAGEN: A General-Purpose Reasoning Agent Training Framework https://github.com/ZihanWang314/ragen/tree/main

5. Unsupervised Predictive Memory in a Goal-Directed Agent [published in 2018] https://arxiv.org/abs/1803.10760

6. CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction https://codei-o.github.io/

7. Elon Musk says Grok 3 will be released in "a week or two" and it is "scary smart", displaying reasoning skills that outperform any other AI model that has been released https://www.youtube.com/live/eV396ioBs3g?si=KOAokGapPj_Cb666&t=811

8. Noam Shazeer, co-lead on Google's Gemini, says by 2030 there will be AI assistants in glasses that provide advice and solve problems for you in real time, as well as turning programmers into 10,000,000x engineers https://youtu.be/v0gjI__RyCY?si=QHw1hrywgBvBnieQ&t=5390

9. Studies of Human Error Rate: "…skeptics often gesture to hallucinations, errors. An ideal symbolic system never makes such errors, therefore LLMs cannot truly "understand" even simple concepts like addition. See e.g. Evaluating the World Model Implicit in a Generative Model for this argument in the literature. However, such arguments reliably rule out human "understanding" as well! Studies within Human Reliability Analysis find startlingly high rates even for basic tasks, and even with double checking. Generally, the human reference class is too often absent (or assumed ideal) in AI discussions, and many LLM oddities have close parallels in psychology. If you're willing to look!" https://www.lesswrong.com/posts/9unBWgRXFT5BpeSdb/studies-of-human-error-rate

10. Rogo scales AI-driven financial research with OpenAI o1 https://openai.com/index/rogo/

AI politics and safety:

1. Tell me about yourself: LLMs are aware of their learned behaviors https://arxiv.org/abs/2501.11120

2. Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models https://arxiv.org/abs/2411.14257

3. OpenAI hides chain-of-thought reasoning because it may include unaligned content. From “Model Spec—a document which defines how we want our models to behave.” https://model-spec.openai.com/2025-02-12.html

4. Meta Starts Eliminating Jobs in Shift to Find AI Talent https://www.bloomberg.com/news/articles/2025-02-10/meta-starts-eliminating-jobs-in-shift-to-find-ai-talent [no paywall: https://archive.is/T7Kog]

Science and Technology:

1. Learning produces an orthogonalized state machine in the hippocampus https://www.nature.com/articles/s41586-024-08548-w

2. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy https://www.biorxiv.org/content/10.1101/2024.11.15.623878v3

3. "Dozens of new obesity drugs are coming: these are ones to watch; next-generation obesity drugs will work differently from Ozempic & Wegovy—aiming to deliver greater weight loss with fewer side effects" https://www.nature.com/articles/d41586-025-00404-9 [no paywall: https://archive.is/X9CW3]

4. A single human zygote contains all the information you need to develop into an adult human and at the same time contains within it, the evolutionary history of our species. The Genomic Code: the genome instantiates a generative model of the organism https://www.cell.com/trends/genetics/fulltext/S0168-9525(25)00008-3




Nvidia put r1 in a loop for 15 minutes and it generated: "better than the optimized kernels developed by skilled engineers in some cases"

Inference-time budget affects the agent’s solving rate. Allocating more than 10 minutes per problem in the Level-1 category enables the workflow to produce numerical correct code for most of the 100 problems.

Read more: https://developer.nvidia.com/blog/automating-gpu-kernel-generation-with-deepseek-r1-and-inference-time-scaling/

973 0 14 13 9

"We're working out the algorithms as we speak...many more than 10,000 researchers are hacking at it, many of them at Google"

https://www.dwarkeshpatel.com/p/jeff-dean-and-noam-shazeer


Links for 2025-02-12

AI:

1. LLMs can be used to discover interpretable models of human and animal behavior. A method, called CogFunSearch, adapts FunSearch, a tool that uses large language models (LLMs) in an evolutionary algorithm. The discovered programs can be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. https://www.biorxiv.org/content/10.1101/2025.02.05.636732v1

2. LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters https://arxiv.org/abs/2502.07374

3. NatureLM: Deciphering the Language of Nature for Scientific Discovery https://arxiv.org/abs/2502.07527

4. Evolution and The Knightian Blindspot of Machine Learning — The authors propose that ML can benefit from considering the temporal unfolding of an open world, using a diversity-and-filter approach to handle KU, and incorporating non-stationarity into foundation model pertaining. https://arxiv.org/abs/2501.13075

5. On the Emergence of Thinking in LLMs I: Searching for the Right Intuition https://arxiv.org/abs/2502.06773

6. ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates https://arxiv.org/abs/2502.06772

7. Training Language Models to Reason Efficiently https://arxiv.org/abs/2502.04463

8. “o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with iterative self-improvement” https://x.com/DimitrisPapail/status/1889755872642970039

9. Scaling Pre-training to One Hundred Billion Data for Vision Language Models https://arxiv.org/abs/2502.07617

10. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling https://arxiv.org/abs/2502.06703

11. DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2 (but see this thread: https://x.com/DimitrisPapail/status/1889422843982524558)

12. 8GB of high-quality reasoning math https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw

AI politics:

1. 'Possibly by 2026 or 2027 (and almost certainly no later than 2030), the capabilities of AI systems will be best thought of as akin to an entirely new state populated by highly intelligent people appearing on the global stage' https://www.anthropic.com/news/paris-ai-summit

2. Sam Altman says the $500 billion Stargate project will be dwarfed in a few years with $5 trillion AI compute clusters, despite the recent DeepSeek release https://youtu.be/oEdlwfD5vK8?si=UpmTkOCaUxmQYFc8&t=664

3. The Paris AI Anti-Safety Summit https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit

4. Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion? https://www.lesswrong.com/posts/tdb76S4viiTHfFr2u/why-did-elon-musk-just-offer-to-buy-control-of-openai-for

5. Meta Platforms is reportedly in discussions to acquire South Korean AI chip startup FuriosaAI. https://www.koreatimes.co.kr/www/tech/2025/02/129_392093.html

6. OpenAI set to finalize first custom chip design this year https://www.reuters.com/technology/openai-set-finalize-first-custom-chip-design-this-year-2025-02-10/

Science and Technology:

1. Princeton neuroscientists crack the code of how we make decisions https://pni.princeton.edu/news/2025/princeton-neuroscientists-crack-code-how-we-make-decisions

2. Physicists have built a new type of digital-analogue quantum simulator in Google’s laboratory, which can be used to study physical processes with unprecedented precision and flexibility. https://www.psi.ch/en/news/media-releases/unique-quantum-simulator-opens-door-to-new-research

3. Anduril Takes Over $22 Billion Contract to Build Technomancers for U.S. Army https://www.corememory.com/p/anduril-takes-over-22-billion-contract

4. Einstein Was Right – Euclid Just Captured Space-Time Warping in a Perfect Cosmic Ring https://www.esa.int/Science_Exploration/Space_Science/Euclid/Euclid_discovers_a_stunning_Einstein_ring

998 1 17 19 5
20 last posts shown.