Written by Vikash Mansingh, Joi Ito, Joseph Park, Daum Kim, Lulu Ito
What is probabilistic programming, and how does it work? Consider an example of driving on the road. There’s no uncertainty at noon or on a bright moonlit road — you feel very sure you know what you’re looking at. But then envision driving at night in dense fog where, if you’re not paying attention or something’s whizzing by really fast, you’re very uncertain. In any case, you’re not entirely sure where other vehicles or road barriers are, but most of the time, you manage to navigate without much thought.
As a child, you experience balls dropped, thrown, and held in your hand. When you see someone throw a ball, you can imagine where it’s going to go based on a model of physics that you have developed in your head. It behaves like a physics engine in a video game. In a well-designed, realistic video game, the outcome of a collision or a bounce wouldn’t result in the exact same outcome each time. Both the game and your mental model have constructed a probabilistic model based on data. While it may feel intuitive, it is in fact similar to the laws of Newtonian physics that you may have learned in school. But unlike the exact equations, you have a sense of where it will “probably go” instead of where it will exactly go. In fact, a well-designed physics model or a probabilistic program based on the equations would have a probabilistic factor built in using randomness and statistics. The probabilistic model in your head and the physics equations that would describe the trajectory of the ball are both models and in many ways, similar. You can also make rational decisions even in the face of uncertainty because you know what you should probably expect.
The examples above demonstrate two things. First, what is happening beneath the surface is the probabilistic inference, loosely defined as taking the data that are available to you to work out all the other stuff. Second, and more fundamentally, probability is integral even in scenarios considered deterministic or entirely knowable, such as driving down a road. It is a silent yet potent factor shaping our lives, interacting with us even when we are oblivious to it.
Interestingly, there is a growing consensus that the future of AI in computing is probabilistic. There are plenty of examples. The growth of autonomous driving, for instance, is a testament to the power of probabilistic robotics. The technique used by Stanley, the car that clinched the inaugural Star Program challenge and whose lead researcher penned “Probabilistic Robotics,” laid the groundwork for the autonomous driving industry.
More relevant to us, Large Language Models (LLMs), which have been popularized by recent developments like ChatGPT and Bard, embody probabilistic computation because their next word is always uncertain, introducing an element of probability. Knowing that “probabilistic” is a key feature of AI, the challenge for the AI community is to generalize this concept and scale it up for broader applications. Neural networks, the technology behind deep learning and ChatGPT, present one solution. However, they are a big “black box” that nobody really understands, including their creators. A fitting metaphor for neural networks would be “auto-correct on a grand scale,” which relies on optimization algorithms to suit the data.
We’ve seen many issues with LLMs in the past seven months. A critical concern is their over-reliance on data quality. Biased data yields biased results, and identifying problematic data within massive datasets is virtually impossible. There’s also the well-documented hallucination issue, highlighting potential inaccuracies that can spawn numerous other issues. Is feeding more data into neural networks the solution to these problems? Not necessarily, as it can be costly and inefficient. So what we need is a different approach to scale up the probabilistic nature of the world — a form of AI that actually has a model of the world that the language is connected to.
Enter symbolic generative AI, which possesses explicit meanings we can comprehend and link to our understanding of the world. In short, it is a machine that sees the world more like humans do.
Let’s consider Newton’s laws for comparison. Originally devised to predict the motion of stars in the sky, these laws can also explain an apple falling from a tree, the motion of galaxies, and even the flow of blood in our veins. By learning Newton’s laws, we acquire a mental model that allows us to reason and apply these laws in contexts vastly different from those they initially meant to explain. Just like a three-year-old can naturally demonstrate this learning process by absorbing so much from very little, we see this versatility partly explains the power and flexibility of our intelligence.
Symbolic generative AI mirrors this process. A symbolic generative AI that threw a ball in a game could easily be modified to simulate the gravity of the moon or a change in the rules of the game. It’s a simple change to one small part of the model. Unfortunately, with neural networks and LLMs, a change in rules would require retraining of the model with reams of new data showing the behavior of the ball in the new environment, modifying many of the parameters at once. LLMs and neural networks simply consist of countless numbers and unstructured interconnections without the flexibility of a modular, structured system that one can easily edit, understand or retrain.
With symbolic generative AI, we can address many issues posed by LLMs, including the alignment problems as outlined by Stuart Russell. We can escape the black box of neural networks, audit them, control them, and modify their representations to align with our preferences and reflect what we want.
This is especially significant from DAL’s perspective, as we strongly believe in how probabilistic programs align with the digital architecture that will shape the next phase of the Internet. As advocates of transparency and decentralization, we view this challenge as an architectural question. Today, we’re missing a stack where it’s technically possible to express societal preferences and values. Unfortunately, this is technically impossible with the neural network. This gap motivates DAL’s collaboration with the MIT Prob Comp team. It’s important to note that the Prob Comp project at MIT didn’t originate from an engineering perspective, but rather is based in cognitive science — it’s about understanding our minds and how we perceive and think. This technology is fundamentally humanistic, resonating deeply with DAL’s ethos and affirming our belief in the necessity of a different medium — a symbolic medium for a generative AI that we can understand.
Admittedly, the term “symbolic generative AI” may be unfamiliar to many, even those who have kept up with recent AI trends. It’s worth remembering that not long ago, the consensus was that neural networks couldn’t scale. It hasn’t been long since neural networks and deep learning really established themselves and entered mass consciousness. And in the meantime champions of symbolic generative AI, such as Stuart Russell, have been striving to scale probabilistic programs but faced technical obstacles.
However, the history of AI development is a pendulum swinging back and forth. Now we are witnessing breakthroughs in symbolic generative AI, most notably in computer vision, common-sense data cleaning, and automated data modeling.
At MIT, Prob Comp team researchers are applying probabilistic programming to outperform transformers in robustness and efficiency on economically important problems, such as 3D scene perception, and data-driven expert reasoning, and to reverse-engineer human cognition and perception. Another area of focus is advancing Gen, MIT’s open-source stack for generative modeling and probabilistic inference, to match the usability of TensorFlow v1. DAL and Digital Garage will continue contributing to and applying the MIT open-source stack in close collaboration with the MIT team to demonstrate to industry and government leaders how the controllability and explainability of probabilistic programming open up new possibilities for safety, regulation, and expressing social values, particularly in Japan.
Stay tuned for future updates!
Glossary
- probability — a mathematical toolkit for accounting for the incompleteness of knowledge, and the uncertainty that can arise as a result
- probabilistic programming — a new symbolic medium for creating intelligent systems, that includes neural networks, but goes beyond them. Probabilistic programming can be done by people, by machines (including other probabilistic programs!), or by both, working together.
- neural networks — AI models created from (typically large networks of) simple components, by tuning parameters describing the strength of connections between components, to try to fit data
- probabilistic computing — an emerging discipline integrating probabilistic programming and generative AI into the building blocks of software and hardware, and using computer science concepts to scale up computations involving uncertain knowledge
- generative AI — AI models that can generate an infinite universe of possible datasets
- neural generative AI — generative AI built from neural networks, often based on networks that generate realistic data by amplifying noise through a cascade of neural networks, or by cascading networks that try to predict the next token in a sequence from recent history
- symbolic generative AI — generative AI built from probabilistic programs, that generates data using structured, causal explanations that match the constructs we experience psychologically
- machine learning — AI approaches based on fitting parameters to data
- artificial intelligence — software and hardware systems that imitate, simulate, and/or replicate aspects of human intelligence, and/or try to make optimal decisions
Illustration: Satoshi Hashimoto
Edits: Janine Liberty & Joseph Park