Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement.
Quote: "Scaling this weak-to-strong training approach yields (seemingly) unbounded improvements in both length and hardness generalization, allowing models to solve problem instances far exceeding the difficulty of those in the training data distribution...Our results show that careful self-supervision allows small transformers to transcend superficial pattern matching failures and learn multi step algorithms."
Talk: https://www.youtube.com/watch?v=szhEnXiSjJY
Paper on arXiv coming on Monday.
Quote: "Scaling this weak-to-strong training approach yields (seemingly) unbounded improvements in both length and hardness generalization, allowing models to solve problem instances far exceeding the difficulty of those in the training data distribution...Our results show that careful self-supervision allows small transformers to transcend superficial pattern matching failures and learn multi step algorithms."
Talk: https://www.youtube.com/watch?v=szhEnXiSjJY
Paper on arXiv coming on Monday.