Postlar filtri


Emergent AI preferences:

- As AIs get smarter, they develop their own coherent value systems.

- AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

- As AIs become smarter, they become more opposed to having their values changed

- AIs put a price on human life itself and systematically value some human lives more than others.

- Their political values are strongly clustered to the left.

Project page: https://www.emergent-values.ai/

405 1 16 14 10

Links for 2025-02-10

AI:

1. Agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. https://arxiv.org/abs/2502.04403

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models https://arxiv.org/abs/2502.04728

3. CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance https://arxiv.org/abs/2502.04350

4. “OpenAI o1 significantly outperforms other reasoning models that are on par on benchmarks that test specialized knowledge.” https://arxiv.org/abs/2502.01584

5. Exploring the possibility to enable models to correct errors immediately after they are made. https://arxiv.org/abs/2408.16293

6. Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models https://arxiv.org/abs/2502.04404

7. DexterityGen (DexGen): A new system that helps robots use their hands better. It improves how they grip, move, and handle objects… from holding a pen to using a screwdriver. DexGen learns in simulation and refines its skills in the real world, making robotic hands much more useful. https://zhaohengyin.github.io/dexteritygen/

8. MedRAX: Medical Reasoning Agent for Chest X-ray https://arxiv.org/abs/2502.02673

9. Verifiable agents are the next meta in crypto x AI - agents that don't require trust. https://www.blog.eigenlayer.xyz/introducing-verifiable-agents-on-eigenlayer/

10. Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation https://arxiv.org/abs/2502.05151

11. Karina Nguyen, research & product at OpenAI, says pre-training was approaching a data wall, but now post-training scaling (o1 series) unlocks "infinite tasks." Says models were already "diverse and creative" from pre-training, but teaching AI real-world skills is paving the way to "extremely super intelligent" models. https://youtu.be/DeskgjrLxxs?si=kXjvn89Sdf5N-vF6&t=578

AI compute:

1. This AI chip is the size of a grain of salt https://www.popsci.com/technology/ai-fiber-optic-chip/

2. "How Intel ruined an Israeli startup it bought for $2b, Habana Labs—and lost the AI race" (the end of the Gaudi chips) https://www.calcalistech.com/ctechnews/article/s1tra0sfye

AI politics:

1. How Sam Altman Sidestepped Elon Musk to Win Over Donald Trump https://www.nytimes.com/2025/02/08/technology/sam-altman-elon-musk-trump.html [no paywall: https://archive.is/5ERSg]

2. Human takeover might be worse than AI takeover https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover

Science:

1. Children’s arithmetic skills do not transfer between applied and academic mathematics https://www.nature.com/articles/s41586-024-08502-w

2. Three Years After Experimental Vaccine, These Patients Are Still Cancer-Free https://gizmodo.com/three-years-after-experimental-vaccine-these-patients-are-still-cancer-free-2000559585

3. “What is it like to live in a society with an estimated median IQ around 70? A Nigerian psychologist explains.” https://woodfromeden.substack.com/p/guest-post-the-global-iq-debate-a


Marriages in China fell by 20% in 2024. Since nearly all births in China are within marriage, this implies further large declines in fertility ahead.

China's TFR was just 1.02 in 2023.

Without advanced AI and robotics, we'll eventually face a global collapse of all welfare systems, followed by a collapse of advanced technologies like smartphones, which require a minimum population of one billion people to be maintained.


Image 1: An example of a PISA level 1 Math question.

Image 2: Share unable to reach overall level 1 PISA math and science.


Politics, Tech Chiefs Double Down on AI Spending:

- French President Emmanuel Macron has announced a €109 billion investment in AI for France in the coming years. This investment will be supported by the United Arab Emirates, major American and Canadian investment funds, and French companies. President Emmanuel Macron announced the spending ahead of a two-day AI summit he is cohosting in Paris with Indian Prime Minister Narendra Modi, attended by the US vice president, China’s vice premier, and the bosses of OpenAI and Google.

- European Commission chief Ursula von der Leyen is expected to announce around 10 public supercomputers for researchers and startups.

- Tech giants Amazon, Google, Microsoft, and Meta are significantly increasing their investments in AI. They plan to spend a combined total of at least $215 billion in the current fiscal year, an increase of over 45% from the previous year.

Sources:

1. https://www.france24.com/en/europe/20250210-government-tech-leaders-paris-ai
2. https://www.lemonde.fr/en/economy/article/2025/02/10/ai-with-the-announcement-of-a-109-billion-investment-macron-intends-to-take-on-the-us_6737985_19.html [no paywall: https://archive.is/JZm6I]
3. https://www.wsj.com/tech/ai/tech-giants-double-down-on-their-massive-ai-spending-b3040b33 [no paywall: https://archive.is/FeKCf]


Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.


Paper: https://arxiv.org/abs/2502.05171


Sam Altman: "...we can now imagine a world where we cure all diseases, have much more time to enjoy with our families, and can fully realize our creative potential.

In a decade, perhaps everyone on earth will be capable of accomplishing more than the most impactful person can today."

https://blog.samaltman.com/three-observations

615 0 13 62 45

Meta researchers used AI to predict the text a person was typing just from non-invasive brain recording!

With EEG, their "Brain2Qwerty" model gets 67% of the characters wrong, but magnetoencephalography (MEG) shows much better performance, instead only getting 32% of the characters wrong on average.

"For the best participants, the model achieves a CER of 19%, and can perfectly decode a variety of sentences outside of the training set. "

Paper: https://ai.meta.com/research/publications/brain-to-text-decoding-a-non-invasive-approach-via-typing/

660 0 32 5 11

Links for 2025-02-08

AI:

1. Sam Altman Dialogue at UTokyo: Altman says OpenAI have an internal AI model that ranks as the 50th best competitive programmer in the world and by the end of 2025 their model will be ranked #1. He says in 2035, a single AI data center will have the same intellectual capacity as all humans plus AI currently on Earth combined. https://www.youtube.com/watch?v=8LmfkUb2uIY

2. GitHub Copilot: The agent awakens https://github.blog/news-insights/product-news/github-copilot-the-agent-awakens/

3. Database-Augmented Transformer-Based Large Language Models Achieve High Accuracy in Mapping Gene-Phenotype Relationships https://www.biorxiv.org/content/10.1101/2025.01.28.635344v1

4. DeepPrep: an accelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning https://www.nature.com/articles/s41592-025-02599-1

5. A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods https://arxiv.org/abs/2502.01618

6. BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation https://arxiv.org/abs/2502.03860

7. Value-Based Deep RL Scales Predictably https://arxiv.org/abs/2502.04327

8. ReAG - Reasoning Augmented Generation https://github.com/superagent-ai/reag

9. Learn how to use Gemini 2.0 to convert PDF into structured JSON data. https://www.philschmid.de/gemini-pdf-to-data

10. Advancing Reasoning in Large Language Models: Promising Methods and Approaches https://arxiv.org/abs/2502.03671

11. Syntriever: How to Train Your Retriever with Synthetic Data from LLMs https://arxiv.org/abs/2502.03824

12. DeepSeek AI Runs Near Instantaneously on These Weird Chips https://cerebras.ai/blog/cerebras-launches-worlds-fastest-deepseek-r1-llama-70b-inference

13. DARPA program on AI for pure mathematics https://sam.gov/opp/4def3c13ca3947069b1779e7ff697c6a/view

AI investments:

1. Amazon will invest $100 billion in infrastructure this year, mostly in artificial intelligence https://www.bloomberg.com/news/articles/2025-02-06/amazon-projects-profit-missing-estimates-on-rising-ai-spending [no paywall: https://archive.is/Oz9Wd]

2. UAE to invest billions in France AI data center https://www.lemonde.fr/en/france/article/2025/02/06/uae-to-invest-billions-in-france-ai-data-center_6737871_7.html

3. Ilya Sutskever's Safe Superintelligence Inc is in talks to raise funding at a valuation of at least $20 billion. https://www.reuters.com/technology/openai-co-founder-sutskevers-ssi-talks-be-valued-20-bln-sources-say-2025-02-07/ [no paywall: https://archive.is/Nkgrd]

4. Artificial intelligence startup Anthropic’s financing is oversubscribed and on track to be larger than expected, exceeding the $2 billion fundraising that was previously reported https://www.bloomberg.com/news/articles/2025-02-07/general-catalyst-mgx-in-talks-to-join-anthropic-megaround [no paywall: https://archive.is/b9gro]

5. DeepSeek fever fuels patriotic bets on Chinese AI stocks https://www.reuters.com/markets/asia/deepseek-fever-fuels-patriotic-bets-chinese-ai-stocks-2025-02-06/ [no paywall: https://archive.is/5KSJe]

Science and Technology:

1. New laser-based artificial neuron processes enormous data sets at high speed https://www.livescience.com/technology/artificial-intelligence/new-laser-based-artificial-neuron-processes-enormous-data-sets-at-high-speed

2. A high-quality online IQ test normed with a nationally representative US sample. https://www.youtube.com/watch?v=PdS6gYnnk30

3. Active agent against cancer metastasis discovered: Adhibin prevents migration and attachment to other cells https://phys.org/news/2025-02-agent-cancer-metastasis-adhibin-migration.html

4. CiFi: A significant advancement in the field of genomics because it allows scientists to study DNA organization and interactions in more detail than previously possible. https://www.biorxiv.org/content/10.1101/2025.01.31.635566v1

5. Terence Tao on how we measure the cosmos | Part 1 https://www.youtube.com/watch?v=YdOXS_9_P4U

794 1 12 2 12

Robust Autonomy Emerges from Self-Play

Apple team shows self-driving AI can learn entirely by practicing against itself - no human driving data needed.

In testing, their system averages 17.5 years of continuous driving between incidents, far surpassing humans. All through self-play, not imitation.

Paper: https://arxiv.org/abs/2502.03349




We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answers.

We evaluated them on the AIME 2025 I competition from *yesterday* and the results are good!


Source: https://x.com/mbalunovic/status/1887962694659060204


STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving https://arxiv.org/abs/2502.00212

Inspired by how mathematicians continue advancing the field, the authors train an LLM that conjectures and attempts proofs; then they iteratively reinforce/re-train it with correct, elegant, novel, and approachable generated conjectures and correctly generated proofs.

STP has two main components: a conjecturer and a prover. The conjecturer generates increasingly challenging conjectures that are barely provable by the current prover. The prover attempts to prove these conjectures and receives training signals based on its success.

STP significantly improves the performance of LLMs in formal theorem proving.


AlphaGeometry2 can solve Olympiad geometry problems at a superhuman level

- It has an 84% solve rate on IMO geometry problems from the past 25 years, up from 54% with the previous version.

- The system uses a combination of language models and symbolic reasoning to solve geometry problems.

- The language model is used to generate possible solutions, and the symbolic engine is used to check whether these solutions are correct.

- AlphaGeometry2 is also able to solve problems that are not constructive, meaning that they cannot be solved by simply following a set of steps. This is done by using a numerical optimization algorithm to find a possible solution.

Paper: https://arxiv.org/abs/2502.03544

750 0 22 3 10

Video oldindan ko‘rish uchun mavjud emas
Telegram'da ko‘rish
"We know how to improve these models so, so much. And there's not an obvious roadblock in front of us." 

Sam Altman believes the AI progress from Feb 2025 to Feb 2027 will feel more impressive than the advancements from Feb 2023 to Feb 2025.

Source: In the Age of AI – A Panel Discussion with Sam Altman at TU Berlin https://www.youtube.com/live/McuO7Osgzqo?si=B4NEOIZ_R3fB6yys&t=2993


Links for 2025-02-06

AI:

1. Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search https://satori-reasoning.github.io/blog/satori/

2. Dynamic object goal pushing with mobile manipulators through constrained reinforcement learning https://www.youtube.com/watch?v=wGAdPGVf9Ws

3. SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations https://arxiv.org/abs/2502.02472

4. BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation https://www.arxiv.org/abs/2502.01697

5. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning https://arxiv.org/abs/2502.03275

6. Demystifying Long Chain-of-Thought Reasoning in LLMs https://arxiv.org/abs/2502.03373

7. Deep Dive into LLMs like ChatGPT: "This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications." https://www.youtube.com/watch?v=7xTGNNLPyMI

Science and Technology:

1. The brain calculates with waves: New insights into neural waves could revolutionize the development of energy-efficient AI systems https://www.mpg.de/24143275/oscillating-networks-in-the-brain

2. Google says commercial quantum computing applications arriving within five years https://www.reuters.com/technology/google-says-commercial-quantum-computing-applications-arriving-within-five-years-2025-02-05/ [no paywall: https://archive.is/iS7s4]

3. What is an Electron? How Times Have Changed https://profmattstrassler.com/2025/02/06/what-is-an-electron-how-times-have-changed/

4. A gene-editing technology called 'dual prime editing' was used in plants for the first time. This tool can precisely delete up to two million bases of DNA, or replace a 258,000 base stretch of DNA with a new sequence, in both wheat and tomatoes (so far). https://www.nature.com/articles/s41477-024-01898-3

5. A large study, performed on 960 female mice, suggests that genetics – and not diet or exercise – are the biggest predictor of which mice live longer than others. https://www.nature.com/articles/s41586-024-08026-3


Video oldindan ko‘rish uchun mavjud emas
Telegram'da ko‘rish
Hibiki: Real-time speech translation that runs on your phone.

Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech.

Samples: https://x.com/neilzegh/status/1887498102455869775
Paper: https://arxiv.org/abs/2502.03382
Inference code: https://github.com/kyutai-labs/hibiki
Models: https://huggingface.co/kyutai


Video oldindan ko‘rish uchun mavjud emas
Telegram'da ko‘rish
Making robots truly helpful and safe in our everyday lives: Latent-Space Reachability Analysis https://kensukenk.github.io/latent-safety/

A new approach called "Latent Safety Filters" allows robots to understand and prevent complex "failures." Imagine teaching a robot to pick up a bag of Skittles. Traditional safety systems might stop the robot from bumping into the table, but they wouldn't understand that pulling the bag up too quickly will cause the candy to spill everywhere.

The researchers equip the robot with a world model that learn to understand how the world works just by watching videos and trying things out. Think of it as the robot building a mental picture of the scene.

The "Safety Filter" then acts like a guardian angel for the robot's actions. It monitors what the robot is about to do and checks if it's heading towards a failure in its imagined world. It does this without needing to be told exactly how to be safe in every situation beforehand. It learns from experience and its "imagination."


Human level sample efficiency? LIMO: Less is More for Reasoning https://arxiv.org/abs/2502.03387

- LIMO achieves unprecedented performance in mathematical reasoning with only 1% of the training data used by previous approaches, showcasing remarkable data efficiency.

- LIMO exhibits exceptional out-of-distribution generalization, outperforming models trained on 100x more data by a significant 40.5% absolute improvement across diverse benchmarks.

LIMO Hypothesis: In foundation models with comprehensively encoded domain knowledge (achieved through extensive pre-training), sophisticated reasoning can emerge through minimal, precisely orchestrated demonstrations of cognitive processes.

- The core of LIMO's success lies in the meticulous curation of a small, high-quality dataset. The resulting dataset of 817 examples was carefully selected from millions of candidates.

- LIMO fundamentally challenges the assumption that massive datasets are necessary for complex reasoning in LLMs. Quality of the examples, rather than just the number, is the key factor.

- LIMO suggests that modern, well-pretrained models like Qwen already possess latent, rich reasoning capabilities. LIMO demonstrates that these capabilities can be unlocked and activated effectively with the right "cognitive templates" provided by curated examples.

- LIMO indicates that sophisticated reasoning, regardless of complexity, could potentially be activated with minimal samples given sufficient pre-trained domain knowledge and optimal cognitive reasoning chains for activation.

Further research is needed to validate the LIMO hypothesis across different model architectures and reasoning domains beyond mathematics.


UK government rips up rules to fire-up nuclear power https://www.gov.uk/government/news/government-rips-up-rules-to-fire-up-nuclear-power

Awesome! This is the way!

20 ta oxirgi post ko‘rsatilgan.