Posts filter


New research from Meta FAIR: Large Concept Models (LCM) is a fundamentally different paradigm for language modeling that decouples reasoning from language representation, inspired by how humans can plan high-level thoughts to communicate.

Read more: https://ai.meta.com/research/publications/large-concept-models-language-modeling-in-a-sentence-representation-space/


Links for 2024-12-23

AI:

1. What o3 Becomes by 2028 — “We haven't seen AIs made from compute optimal LLMs pretrained on these systems yet, but the systems were around for 6+ months, so the AIs should start getting deployed imminently, and will become ubiquitous in 2025.” https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/what-o3-becomes-by-2028

2. Orienting to 3 year AGI timelines https://www.lesswrong.com/posts/jb4bBdeEEeypNkqzj/orienting-to-3-year-agi-timelines

3. “AI that exceeds human performance in nearly every cognitive domain is almost certain to be built and deployed in the next few years. We need to act accordingly now.” https://milesbrundage.substack.com/p/times-up-for-ai-policy

4. The new hyped Genesis simulator is up to 10x slower, not 10-80x faster https://stoneztao.substack.com/p/the-new-hyped-genesis-simulator-is

5. Stanford researchers introduced a new system that can generate physically plausible human-object interactions from natural language https://hoifhli.github.io/

6. “We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!” https://arxiv.org/abs/2411.02280

7. A biologically-inspired hierarchical convolutional energy model predicts V4 responses to natural videos https://www.biorxiv.org/content/10.1101/2024.12.16.628781v1

8. LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation https://arxiv.org/abs/2412.15188

9. Memory Layers at Scale https://arxiv.org/abs/2412.09764

10. Deliberative alignment: reasoning enables safer language models https://openai.com/index/deliberative-alignment/

11. A foundation model for generalizable disease detection from retinal images https://www.nature.com/articles/s41586-023-06555-x

12. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation https://promptda.github.io/

13. ⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion https://marigolddepthcompletion.github.io/

14. Startup’s autonomous drones precisely track warehouse inventories https://news.mit.edu/2024/corvus-autonomous-drones-precisely-track-warehouse-inventories-1220

15. MIT engineers developed AI frameworks to identify evidence-driven hypotheses that could advance biologically inspired materials. https://news.mit.edu/2024/need-research-hypothesis-ask-ai-1219

16. Anthropic co-founder Jack Clark says when he and Dario Amodei went to the White House in 2023 where Kamala Harris told them, "We've got our eye on you guys. AI is going to be a really big deal and we're now actually paying attention" https://youtu.be/om2lIWXLLN4?si=ySlUNtsBzZVe7Tc-&t=705

17. UN Secretary-General Antonio Guterres tells the UN Security Council that technology will never move in the future as slowly as today https://x.com/tsarnick/status/1871016318247604624

Technology:

1. Aqueous Homogeneous Miniature Reactors Could Supply U.S. Bases With Unlimited Fuel https://www.forbes.com/sites/davidhambling/2024/12/05/miniature-reactors-could-supply-us-bases-with-unlimited-fuel/

2. A Billion Times Faster: Laser Neurons Ignite the Future of AI https://opg.optica.org/optica/fulltext.cfm?uri=optica-11-12-1690&id=565919

3. MIT engineers grow “high-rise” 3D chips https://news.mit.edu/2024/mit-engineers-grow-high-rise-3d-chips-1218

Math:

1. Tarski's high school algebra problem inquires whether all identities involving addition, multiplication, and exponentiation over the positive integers can be derived solely from a specific set of eleven axioms commonly taught in high school mathematics. https://en.wikipedia.org/wiki/Tarski%27s_high_school_algebra_problem

2. Mathematicians Uncover a New Way to Count Prime Numbers https://www.quantamagazine.org/mathematicians-uncover-a-new-way-to-count-prime-numbers-20241211/

Miscellaneous:

1. Game-Changing Dual Cancer Therapy Completely Eradicates Tumors Without Harsh Side Effects https://news.mit.edu/2024/implantable-microparticles-can-deliver-two-cancer-therapies-1028

2. Are most of senescent cells immune cells? https://www.nature.com/articles/s12276-024-01354-4


I think that is overconfident. But it's not too far off either. Mankind will be dethroned relatively soon.

My prediction: 90% that ASI will be created before 2050.

https://x.com/elonmusk/status/1871083864111919134

471 0 5 24 17

Video is unavailable for watching
Show in Telegram
Unitree B2-W Talent Awakening! 🥳
One year after mass production kicked off, Unitree’s B2-W Industrial Wheel has been upgraded with more exciting capabilities.


Video is unavailable for watching
Show in Telegram
The Heist. Every shot was done via text-to video with Google Veo 2.

https://www.youtube.com/watch?v=lFc1jxLHhyM

546 0 15 1 14

Circa 2040: "So the AI cured cancer, I get it. But it feels like brute force to me. Not really reasoning. They had to spend tens of millions of dollars on compute to get that result."


Video is unavailable for watching
Show in Telegram
“Some people are making us believe that we're really close to AGI. We're actually very far from it. I mean, when I say very far, it's… several years.”

— Yann LeCun

614 0 14 2 13

99.99% of people cannot comprehend how insane FrontierMath is. The problems are crafted by math profs and are private, not in any training data.

This is what Tamay Besiroglu, a co-author of the FrontierMath benchmark, has to say about o3's performance:

For context, FrontierMath is a brutally difficult benchmark with problems that would stump many mathematicians. The easier problems are as hard as IMO/Putnam; the hardest ones approach research-level complexity.

With earlier models like o1-preview, Pass@1 performance (solving on first attempt) was only around 2%. When allowing 8 attempts per problem (Pass@8) and counting problems solved at least once, we saw ~6% performance. o3's 25.2% at Pass@1 is substantially more impressive.


Video is unavailable for watching
Show in Telegram
It never ceases to amaze me how much progress has been made in the last few years.

Original source: https://www.youtube.com/watch?v=ctWfv4WUp2I

948 1 13 2 17

Past five years of OpenAI models vs. the ARC-AGI benchmark

Website of the benchmark: https://arcprize.org/

Graph by Riley Goodside


Video is unavailable for watching
Show in Telegram
Full video of the OpenAI o3 announcement with the author of the ARC-AGI benchmark.


OpenAI just announced their latest model, o3.

It has the best coding and math benchmarks ever scoring 96.7% on AIME and 71.7% on SWE, It's 20% better than o1.

It is also 3x times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI model’s ability to reason over problems they’re encountering for the first time.

The FrontierMath benchmark is a data set consisting of novel and unpublished problems that would take Fields Medalists hours or days to solve. All current models have less than 2% accuracy on this benchmark, while o3 is able to get over 25%.

(For anyone puzzled as to why they are going from o1 to o3, it involves a copyright dispute over the name o2.)

809 0 30 4 18

After 25.3 million autonomous miles driven, Google's Waymo vehicles have an 88% reduction in property damage claims and a 92% reduction in bodily injury claims compared to human drivers per mile driven.

"The study compared Waymo’s liability claims to human driver baselines, which are based on Swiss Re’s data from over 500,000 claims and over 200 billion miles of exposure. It found that the Waymo Driver demonstrated better safety performance when compared to human-driven vehicles, with an 88% reduction in property damage claims and 92% reduction in bodily injury claims. In real numbers, across 25.3 million miles, the Waymo Driver was involved in just nine property damage claims and two bodily injury claims. Both bodily injury claims are still open and described in the paper. For the same distance, human drivers would be expected to have 78 property damage and 26 bodily injury claims."

Read more: https://waymo.com/blog/2024/12/new-swiss-re-study-waymo


Google just dropped their first test-time-compute model: Gemini 2.0 Flash Thinking

Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.

Try it out today in Google AI Studio: https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-thinking-exp-1219


Video is unavailable for watching
Show in Telegram
Genesis project: A generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications.

It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090.

The Genesis physics engine and simulation platform is fully open source: https://github.com/Genesis-Embodied-AI/Genesis

The aim is to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications.

Project webpage: https://genesis-embodied-ai.github.io/


Links for 2024-12-18

AI:

1. Byte Latent Transformer: Patches Scale Better Than Tokens — Training transformers directly on raw bytes https://arxiv.org/abs/2412.09871

2. Compressed Chain of Thought: Efficient Reasoning Through Dense Representations https://arxiv.org/abs/2412.13171

3. REGENT: A generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning. https://kaustubhsridhar.github.io/regent-research/

4. Can frontier AI transform ANY physical object from ANY input modality into a high-quality digital twin that also MOVES? Articulate-Anything, exploring how large vision-language models (VLMs) can bridge the gap between the physical and digital worlds. https://articulate-anything.github.io/

5. Cultural Evolution of Cooperation among LLM Agents https://arxiv.org/abs/2412.10270

6. A demonstration of strategic deception arising naturally in LLM training https://www.anthropic.com/research/alignment-faking

7. Testing which LLM architectures can do hidden serial reasoning https://www.lesswrong.com/posts/ZB6guMhHH3NEyxA2k/testing-which-llm-architectures-can-do-hidden-serial-3

8. “We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute. How? By combining step-wise reward models with tree search algorithms.” https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute

9. ProcessBench, a benchmark for measuring the ability to identify process errors in mathematical reasoning https://arxiv.org/abs/2412.06559

10. “Project Numina, which won the first AIMO progress prize in part through developing their database...of nearly a million math problems” https://mathstodon.xyz/@tao/113669121621914558

11. A dataset of questions on decision-theoretic reasoning in Newcomb-like problems https://www.lesswrong.com/posts/d9amcRzns5pwg9Fcu/a-dataset-of-questions-on-decision-theoretic-reasoning-in

12. Superhuman performance of a large language model on the reasoning tasks of a physician https://arxiv.org/abs/2412.10849

13. GenEx: Generating an Explorable World https://www.genex.world/

14. Fast LLM Inference From Scratch https://andrewkchan.dev/posts/yalm.html

15. MIT researchers introduce Boltz-1, a fully open-source model for predicting biomolecular structures https://news.mit.edu/2024/researchers-introduce-boltz-1-open-source-model-predicting-biomolecular-structures-1217

16. “This paper describes a process for automatically generating academic finance papers using large language models” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5060022

Neuroscience:

1. “We used neurofeedback from closed-loop real-time functional MRI to create new categories of visual objects in the brain, without the participants’ explicit awareness.” https://mindblog.dericbownds.net/2024/12/sculpting-new-visual-categories-into.html

2. The Unbearable Slowness of Being: Why do we live at 10 bits/s? https://arxiv.org/abs/2408.10234

3. What are recurrent networks doing in the brain? https://www.thetransmitter.org/neural-networks/what-are-recurrent-networks-doing-in-the-brain/

Technology:

1. [Scott Aaronson: Really good article] Quantum Computers Cross Critical Error Threshold https://www.quantamagazine.org/quantum-computers-cross-critical-error-threshold-20241209/

2. Fast, scalable, clean, and cheap enough: How off-grid solar microgrids can power the AI race https://www.offgridai.us/

Miscellaneous:

1. The daring doctor behind a world-first treatment for autoimmune disease https://www.nature.com/articles/d41586-024-03895-0

2. The number of exceptional people: Fewer than 85 per 1 million across key traits https://www.sciencedirect.com/science/article/pii/S019188692400415X

3. “Incredible historically accurate short film set in Bronze Age Sardinia.” — Cast in Bronze: Sherden, the Sea People of Sardinia https://www.youtube.com/watch?v=aAvtoFx3M00

4. “All of statistics and much of science depends on probability — an astonishing achievement, considering no one’s really sure what it is.” https://www.nature.com/articles/d41586-024-04096-5


Video is unavailable for watching
Show in Telegram
Watch Google's Waymo self-driving car avoid calamity with a scooter rider

938 0 19 2 27

Video is unavailable for watching
Show in Telegram
Google just released Veo 2, a new state-of-the-art AI video model.

https://blog.google/technology/google-labs/video-image-generation-update-december-2024/


Video is unavailable for watching
Show in Telegram
GenEx: Generating an Explorable World https://www.genex.world/

GenEx is an AI model to create a fully explorable 360° world in 3D from just a single image!

Paper: https://arxiv.org/abs/2412.09624


Video is unavailable for watching
Show in Telegram
Pika Labs launched Pika 2.0 https://pika.art/

It's the startup's new image-to-video model that lets you combine characters, objects, clothes, and locations in an AI-generated video

20 last posts shown.