Фильтр публикаций


Links for 2024-12-23

AI:

1. What o3 Becomes by 2028 — “We haven't seen AIs made from compute optimal LLMs pretrained on these systems yet, but the systems were around for 6+ months, so the AIs should start getting deployed imminently, and will become ubiquitous in 2025.” https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/what-o3-becomes-by-2028

2. Orienting to 3 year AGI timelines https://www.lesswrong.com/posts/jb4bBdeEEeypNkqzj/orienting-to-3-year-agi-timelines

3. “AI that exceeds human performance in nearly every cognitive domain is almost certain to be built and deployed in the next few years. We need to act accordingly now.” https://milesbrundage.substack.com/p/times-up-for-ai-policy

4. The new hyped Genesis simulator is up to 10x slower, not 10-80x faster https://stoneztao.substack.com/p/the-new-hyped-genesis-simulator-is

5. Stanford researchers introduced a new system that can generate physically plausible human-object interactions from natural language https://hoifhli.github.io/

6. “We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!” https://arxiv.org/abs/2411.02280

7. A biologically-inspired hierarchical convolutional energy model predicts V4 responses to natural videos https://www.biorxiv.org/content/10.1101/2024.12.16.628781v1

8. LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation https://arxiv.org/abs/2412.15188

9. Memory Layers at Scale https://arxiv.org/abs/2412.09764

10. Deliberative alignment: reasoning enables safer language models https://openai.com/index/deliberative-alignment/

11. A foundation model for generalizable disease detection from retinal images https://www.nature.com/articles/s41586-023-06555-x

12. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation https://promptda.github.io/

13. ⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion https://marigolddepthcompletion.github.io/

14. Startup’s autonomous drones precisely track warehouse inventories https://news.mit.edu/2024/corvus-autonomous-drones-precisely-track-warehouse-inventories-1220

15. MIT engineers developed AI frameworks to identify evidence-driven hypotheses that could advance biologically inspired materials. https://news.mit.edu/2024/need-research-hypothesis-ask-ai-1219

16. Anthropic co-founder Jack Clark says when he and Dario Amodei went to the White House in 2023 where Kamala Harris told them, "We've got our eye on you guys. AI is going to be a really big deal and we're now actually paying attention" https://youtu.be/om2lIWXLLN4?si=ySlUNtsBzZVe7Tc-&t=705

17. UN Secretary-General Antonio Guterres tells the UN Security Council that technology will never move in the future as slowly as today https://x.com/tsarnick/status/1871016318247604624

Technology:

1. Aqueous Homogeneous Miniature Reactors Could Supply U.S. Bases With Unlimited Fuel https://www.forbes.com/sites/davidhambling/2024/12/05/miniature-reactors-could-supply-us-bases-with-unlimited-fuel/

2. A Billion Times Faster: Laser Neurons Ignite the Future of AI https://opg.optica.org/optica/fulltext.cfm?uri=optica-11-12-1690&id=565919

3. MIT engineers grow “high-rise” 3D chips https://news.mit.edu/2024/mit-engineers-grow-high-rise-3d-chips-1218

Math:

1. Tarski's high school algebra problem inquires whether all identities involving addition, multiplication, and exponentiation over the positive integers can be derived solely from a specific set of eleven axioms commonly taught in high school mathematics. https://en.wikipedia.org/wiki/Tarski%27s_high_school_algebra_problem

2. Mathematicians Uncover a New Way to Count Prime Numbers https://www.quantamagazine.org/mathematicians-uncover-a-new-way-to-count-prime-numbers-20241211/

Miscellaneous:

1. Game-Changing Dual Cancer Therapy Completely Eradicates Tumors Without Harsh Side Effects https://news.mit.edu/2024/implantable-microparticles-can-deliver-two-cancer-therapies-1028

2. Are most of senescent cells immune cells? https://www.nature.com/articles/s12276-024-01354-4


I think that is overconfident. But it's not too far off either. Mankind will be dethroned relatively soon.

My prediction: 90% that ASI will be created before 2050.

https://x.com/elonmusk/status/1871083864111919134

390 0 4 20 15

Видео недоступно для предпросмотра
Смотреть в Telegram
Unitree B2-W Talent Awakening! 🥳
One year after mass production kicked off, Unitree’s B2-W Industrial Wheel has been upgraded with more exciting capabilities.


Видео недоступно для предпросмотра
Смотреть в Telegram
The Heist. Every shot was done via text-to video with Google Veo 2.

https://www.youtube.com/watch?v=lFc1jxLHhyM


Circa 2040: "So the AI cured cancer, I get it. But it feels like brute force to me. Not really reasoning. They had to spend tens of millions of dollars on compute to get that result."


Видео недоступно для предпросмотра
Смотреть в Telegram
“Some people are making us believe that we're really close to AGI. We're actually very far from it. I mean, when I say very far, it's… several years.”

— Yann LeCun

562 0 13 2 13

99.99% of people cannot comprehend how insane FrontierMath is. The problems are crafted by math profs and are private, not in any training data.

This is what Tamay Besiroglu, a co-author of the FrontierMath benchmark, has to say about o3's performance:

For context, FrontierMath is a brutally difficult benchmark with problems that would stump many mathematicians. The easier problems are as hard as IMO/Putnam; the hardest ones approach research-level complexity.

With earlier models like o1-preview, Pass@1 performance (solving on first attempt) was only around 2%. When allowing 8 attempts per problem (Pass@8) and counting problems solved at least once, we saw ~6% performance. o3's 25.2% at Pass@1 is substantially more impressive.


Видео недоступно для предпросмотра
Смотреть в Telegram
It never ceases to amaze me how much progress has been made in the last few years.

Original source: https://www.youtube.com/watch?v=ctWfv4WUp2I

910 1 13 2 17

Past five years of OpenAI models vs. the ARC-AGI benchmark

Website of the benchmark: https://arcprize.org/

Graph by Riley Goodside


Видео недоступно для предпросмотра
Смотреть в Telegram
Full video of the OpenAI o3 announcement with the author of the ARC-AGI benchmark.


OpenAI just announced their latest model, o3.

It has the best coding and math benchmarks ever scoring 96.7% on AIME and 71.7% on SWE, It's 20% better than o1.

It is also 3x times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI model’s ability to reason over problems they’re encountering for the first time.

The FrontierMath benchmark is a data set consisting of novel and unpublished problems that would take Fields Medalists hours or days to solve. All current models have less than 2% accuracy on this benchmark, while o3 is able to get over 25%.

(For anyone puzzled as to why they are going from o1 to o3, it involves a copyright dispute over the name o2.)

781 0 29 4 18

After 25.3 million autonomous miles driven, Google's Waymo vehicles have an 88% reduction in property damage claims and a 92% reduction in bodily injury claims compared to human drivers per mile driven.

"The study compared Waymo’s liability claims to human driver baselines, which are based on Swiss Re’s data from over 500,000 claims and over 200 billion miles of exposure. It found that the Waymo Driver demonstrated better safety performance when compared to human-driven vehicles, with an 88% reduction in property damage claims and 92% reduction in bodily injury claims. In real numbers, across 25.3 million miles, the Waymo Driver was involved in just nine property damage claims and two bodily injury claims. Both bodily injury claims are still open and described in the paper. For the same distance, human drivers would be expected to have 78 property damage and 26 bodily injury claims."

Read more: https://waymo.com/blog/2024/12/new-swiss-re-study-waymo


Google just dropped their first test-time-compute model: Gemini 2.0 Flash Thinking

Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.

Try it out today in Google AI Studio: https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-thinking-exp-1219


Видео недоступно для предпросмотра
Смотреть в Telegram
Genesis project: A generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications.

It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090.

The Genesis physics engine and simulation platform is fully open source: https://github.com/Genesis-Embodied-AI/Genesis

The aim is to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications.

Project webpage: https://genesis-embodied-ai.github.io/


Links for 2024-12-18

AI:

1. Byte Latent Transformer: Patches Scale Better Than Tokens — Training transformers directly on raw bytes https://arxiv.org/abs/2412.09871

2. Compressed Chain of Thought: Efficient Reasoning Through Dense Representations https://arxiv.org/abs/2412.13171

3. REGENT: A generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning. https://kaustubhsridhar.github.io/regent-research/

4. Can frontier AI transform ANY physical object from ANY input modality into a high-quality digital twin that also MOVES? Articulate-Anything, exploring how large vision-language models (VLMs) can bridge the gap between the physical and digital worlds. https://articulate-anything.github.io/

5. Cultural Evolution of Cooperation among LLM Agents https://arxiv.org/abs/2412.10270

6. A demonstration of strategic deception arising naturally in LLM training https://www.anthropic.com/research/alignment-faking

7. Testing which LLM architectures can do hidden serial reasoning https://www.lesswrong.com/posts/ZB6guMhHH3NEyxA2k/testing-which-llm-architectures-can-do-hidden-serial-3

8. “We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute. How? By combining step-wise reward models with tree search algorithms.” https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute

9. ProcessBench, a benchmark for measuring the ability to identify process errors in mathematical reasoning https://arxiv.org/abs/2412.06559

10. “Project Numina, which won the first AIMO progress prize in part through developing their database...of nearly a million math problems” https://mathstodon.xyz/@tao/113669121621914558

11. A dataset of questions on decision-theoretic reasoning in Newcomb-like problems https://www.lesswrong.com/posts/d9amcRzns5pwg9Fcu/a-dataset-of-questions-on-decision-theoretic-reasoning-in

12. Superhuman performance of a large language model on the reasoning tasks of a physician https://arxiv.org/abs/2412.10849

13. GenEx: Generating an Explorable World https://www.genex.world/

14. Fast LLM Inference From Scratch https://andrewkchan.dev/posts/yalm.html

15. MIT researchers introduce Boltz-1, a fully open-source model for predicting biomolecular structures https://news.mit.edu/2024/researchers-introduce-boltz-1-open-source-model-predicting-biomolecular-structures-1217

16. “This paper describes a process for automatically generating academic finance papers using large language models” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5060022

Neuroscience:

1. “We used neurofeedback from closed-loop real-time functional MRI to create new categories of visual objects in the brain, without the participants’ explicit awareness.” https://mindblog.dericbownds.net/2024/12/sculpting-new-visual-categories-into.html

2. The Unbearable Slowness of Being: Why do we live at 10 bits/s? https://arxiv.org/abs/2408.10234

3. What are recurrent networks doing in the brain? https://www.thetransmitter.org/neural-networks/what-are-recurrent-networks-doing-in-the-brain/

Technology:

1. [Scott Aaronson: Really good article] Quantum Computers Cross Critical Error Threshold https://www.quantamagazine.org/quantum-computers-cross-critical-error-threshold-20241209/

2. Fast, scalable, clean, and cheap enough: How off-grid solar microgrids can power the AI race https://www.offgridai.us/

Miscellaneous:

1. The daring doctor behind a world-first treatment for autoimmune disease https://www.nature.com/articles/d41586-024-03895-0

2. The number of exceptional people: Fewer than 85 per 1 million across key traits https://www.sciencedirect.com/science/article/pii/S019188692400415X

3. “Incredible historically accurate short film set in Bronze Age Sardinia.” — Cast in Bronze: Sherden, the Sea People of Sardinia https://www.youtube.com/watch?v=aAvtoFx3M00

4. “All of statistics and much of science depends on probability — an astonishing achievement, considering no one’s really sure what it is.” https://www.nature.com/articles/d41586-024-04096-5


Видео недоступно для предпросмотра
Смотреть в Telegram
Watch Google's Waymo self-driving car avoid calamity with a scooter rider

924 0 19 2 27

Видео недоступно для предпросмотра
Смотреть в Telegram
Google just released Veo 2, a new state-of-the-art AI video model.

https://blog.google/technology/google-labs/video-image-generation-update-december-2024/


Видео недоступно для предпросмотра
Смотреть в Telegram
GenEx: Generating an Explorable World https://www.genex.world/

GenEx is an AI model to create a fully explorable 360° world in 3D from just a single image!

Paper: https://arxiv.org/abs/2412.09624


Видео недоступно для предпросмотра
Смотреть в Telegram
Pika Labs launched Pika 2.0 https://pika.art/

It's the startup's new image-to-video model that lets you combine characters, objects, clothes, and locations in an AI-generated video


Links for 2024-12-14

AI:

1. Microsoft Phi-4 is a 14B parameter LM trained heavily on synthetic data, with very strong performance, even exceeding GPT-4o on GPQA and MATH benchmarks. https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090

2. 10,000x Faster: AI Discovers New Microscopy Techniques in Record Time https://www.nature.com/articles/s41467-024-54696-y

3. AutoReason: Automatic Few-Shot Reasoning Decomposition https://arxiv.org/abs/2412.06975

4. Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel https://www.arxiv.org/abs/2412.08467

5. Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models https://arxiv.org/abs/2412.02674

6. “The convergence of neuroscience, artificial intelligence and computing has created an unprecedented opportunity to understand intelligence itself.” https://www.thetransmitter.org/neuroai/solving-intelligence-requires-new-research-and-funding-models/

7. Neural networks are often trained on data generated by agents pursuing long-term plans (whether human data, or e.g. distilling MCTS as in AlphaZero). So it's natural that networks might learn long-term planning. This would have big implications for generalization and safety. https://axrp.net/episode/2024/12/12/episode-38_3-erik-jenner-learned-look-ahead.html

8. “Ever notice how some people pace when they’re deep in thought? Surprisingly, neural networks do something similar—and it boosts their performance! We made this discovery while exploring the planning behavior of a recurrent neural network (RNN) trained to play the complex puzzle game Sokoban.” https://far.ai/post/2024-07-learned-planners/

9. DeepNose: An Equivariant Convolutional Neural Network Predictive Of Human Olfactory Percepts https://arxiv.org/abs/2412.08747

10. Frontier LLMs have shrunk dramatically: GPT-4 had ~1.8T params, while GPT-4o likely has ~200B and Claude 3.5 Sonnet ~400B. The next generation of models, corresponding to GPT-5 and Claude 4 (or Claude 3.5 Opus) will probably return to or slightly exceed the size of the original GPT-4. https://epoch.ai/gradient-updates/frontier-language-models-have-become-much-smaller

11. Ilya Sutskever at NeurIPS 2024 speaks about the forthcoming arrival of superintelligence https://youtu.be/1yvBqasHLZs?si=ya0LXSZLnX0b6hX4

12. Microsoft CEO Satya Nadella says OpenAI has a 2-year lead in the AI race and this gives them an "escape velocity" advantage https://youtu.be/9NtsnzRFJ_o?si=7qQLfEsonCvCW_cF&t=2497

13. ARK Invest's Chief Futurist Brett Winton explains why AI foundation models will be worth $15-20 trillion by 2030 and OpenAI could grab the lion's share of that market https://youtu.be/SImm15uF_3Q?si=R0c2WeilV2Bp2ZlB&t=83

14. Europe jumps into ‘incredibly costly’ AI supercomputing race https://www.politico.eu/article/europe-costly-artificial-intelligence-race-supercomputer-startups/

15. Elon Musk wanted an OpenAI for-profit — “You can’t sue your way to AGI.” https://openai.com/index/elon-musk-wanted-an-openai-for-profit/

Miscellaneous:

1. Craig Mundie says the nuclear fusion company backed by Sam Altman will surprise the world by showing fusion electrical generation next year, becoming the basis for a "radical transformation of the energy system" due to safe, cheap power https://www.youtube.com/live/Z246nuPpeOQ?si=ZWGqa2FDnn_aFkb5&t=3846

2. Ferroelectric Devices Could Make IoT Data Unhackable — FeFET array enables homomorphic encryption in battery-powered devices https://spectrum.ieee.org/unhackable-phone

3. "Natural selection... has been acting on us for the past 3,000 years, right up to the modern day, new research suggests. And it seems to be acting in surprising ways on complex traits encoded by multiple genes, such as those tied to intelligence..." [published in 2021] https://www.livescience.com/natural-selection-human-genes

4. How this cancer drug could make radiation a slam dunk therapy https://www.sciencedaily.com/releases/2024/12/241210115059.htm

Показано 20 последних публикаций.