Key Takeaways
- Artificial General Intelligence (AGI) is defined by the capacity for new scientific discovery and paradigm shifts.
- LLMs operate by interpolating training data and cannot achieve recursive self-improvement or fundamental new science.
- Vishal Misra developed Retrieval-Augmented Generation (RAG) and formal mathematical models for LLM reasoning.
- LLM progress is seen as plateauing in fundamental capabilities, similar to early iPhone development cycles.
- Current AI research is criticized for prioritizing empirical results over foundational theoretical modeling.
Deep Dive
- Columbia CS Professor Vishal Misra's work explains LLM reasoning by proposing that models reduce complex multi-dimensional data into geometric manifolds.
- LLMs function by predicting the next token based on training data distributions, traversing 'Bayesian manifolds' in this process.
- The entropy of the predicted token distribution is a key factor; low entropy indicates fewer, more probable next tokens, guiding the model's output.
- Increased precision in LLM output corresponds to reduced options for the next token, akin to navigating a more constrained manifold.
- Vishal Misra's background includes networking, entrepreneurship, and co-founding CrickInfo, where he developed the StatsGuru database.
- He sought to improve StatsGuru's complex web form user interface, which required intricate queries.
- His motivation to enhance the database's accessibility led to discussions with ESPN CrickInfo's editor-in-chief in early 2020.
- Misra explored GPT-3 to resolve StatsGuru's database issues after the pandemic, encountering limitations in context window and instruction following.
- He invented Retrieval-Augmented Generation (RAG) to translate natural language queries into structured data requests for the StatsGuru problem.
- The guest noted GPT-3's completion capabilities were in production by September 2021, predating ChatGPT's public release.
- The guest's matrix abstraction model represents each prompt as a row and the LLM's vocabulary of possible next tokens as columns.
- This theoretical matrix is immensely large, even after accounting for sparsity and removing improbable prompts, exceeding current representational capabilities.
- Large language models interpolate between their training data and new prompts to generate a 'next token distribution,' described as Bayesian on trained metrics.
- This Bayesian learning allows an LLM to infer likely outcomes and learn custom languages, like a cricket DSL, from a few examples not in its original training data.
- The guest argues that LLMs cannot recursively self-improve beyond their training data, as they primarily interpolate existing knowledge rather than generating new information.
- LLMs cannot introduce new information beyond their initial training set, even with multiple models interacting, due to the concept of inductive closure.
- True scientific discoveries, such as the theory of relativity, required fundamental shifts beyond existing knowledge, which current LLMs trained on prior data cannot achieve.
- Current LLMs can refine existing knowledge and solve complex problems by connecting known information, exemplified by mathematical olympiad problems.
- However, they are not capable of creating fundamentally new science or mathematics, which historically required stepping outside existing axioms like Newtonian physics.
- An architectural advance is necessary for LLMs to generate new scientific paradigms, as simply adding more data or compute will not create fundamentally new manifolds.
- The guest expresses skepticism that current LLMs are on a direct path to Artificial General Intelligence (AGI), despite their power as productivity tools.
- He argues that multimodality would increase power, but human-like learning from few examples requires a different approach and new architectures.
- Promising research directions include energy-based architectures and benchmarks like the ARC prize, aiming to move beyond language-based processing to simulation-based reasoning.
- Professor Misra notes that while some in the AI community are receptive to his work, large conference review processes can be random, sometimes dismissing foundational models.
- He criticizes the current empirical approach in AI, advocating for theoretical models before measurement, contrasting it with the systems field's historical rigor.
- He suggests terms like 'prompt engineering' reflect a lack of systematic rigor, characterizing them as 'prompt twiddling' due to superficial adjustments.