Richard Sutton – Father of RL thinks LLMs are a dead end

TLDR

Richard Sutton, a co-founder of reinforcement learning and Turing Award winner, argues that large language models are a dead end for general intelligence because they lack goals, learn by imitation rather than experience, and do not possess a true world model, advocating instead for continued development of experience-based reinforcement learning.

Takeways

• LLMs lack goals and world models, learning by imitation instead of direct experience.

• True intelligence in RL requires goals and continuous learning from environmental interaction and rewards.

• The 'bitter lesson' suggests scalable, experience-driven methods will ultimately surpass human-knowledge-reliant approaches.

Richard Sutton, a pioneer in reinforcement learning (RL), views large language models (LLMs) as fundamentally flawed for achieving true intelligence. He contends that LLMs, which operate by next-token prediction, lack a goal-driven understanding of the world and are trained on imitation data rather than direct experience. Sutton champions RL as the basic form of AI, emphasizing that real intelligence involves learning from sensory experience and actively influencing the environment to achieve specific goals, a capability LLMs do not possess.

Critique of Large Language Models

• 00:01:44 Large language models are seen as mimicking human behavior and generating text based on existing data, rather than genuinely understanding the world. Sutton argues that LLMs do not build a true 'world model' because they predict what a person would say, not what will physically happen in the environment. Unlike RL systems that learn from direct experience and feedback, LLMs lack a defined goal or ground truth, which is essential for determining right or wrong actions and for continuous learning.

• 00:08:45 Richard Sutton believes that applying reinforcement learning on top of LLMs is not a productive direction for AI development. He asserts that LLMs, even when solving complex math problems, are engaging in computational planning rather than learning about the empirical, physical world through experience. Starting with human knowledge in LLMs, while seemingly beneficial, can lead to a 'bitter lesson' scenario where systems that learn directly from experience and computation eventually outperform those reliant on pre-programmed human expertise.

• 00:13:50 Sutton highlights a fundamental difference between human and LLM learning, emphasizing that animals, including humans, primarily learn from experience and trial-and-error, not through supervised imitation or 'training data' in the LLM sense. He argues that infants actively explore and learn consequences without explicit targets for their actions. Supervised learning, prevalent in LLMs, is considered an artificial construct not inherent to natural animal learning processes, which prioritize prediction and goal-driven control from raw sensation and reward feedback.

• 00:43:22 Despite his critical view of LLMs, Sutton acknowledges their surprising effectiveness in language tasks as an impressive development. He notes that the long-standing debate in AI between general-purpose 'weak methods' like search and learning versus 'strong methods' that incorporate human knowledge has seen the former triumph. He finds this gratifying, as he has consistently advocated for simple, basic principles, exemplified by successes like AlphaGo and AlphaZero, which rely on learning from experience rather than extensive human input.