From Tots to Bots: How Embodied AI Mimics Toddler-Like Learning
In the realm of learning and cognition, toddlers are exceptional at generalizing concepts. For instance, a child might identify a tomato as red after having seen various red objects like a ball or a truck. This ability arises from ‘compositionality’—the skill to break down wholes into reusable parts, an essential aspect for both developmental neuroscience and Artificial Intelligence (AI) research. Recently, researchers at the Okinawa Institute of Science and Technology (OIST) have made significant progress in this area. They experimented with a novel AI model that mimics toddler-like learning, providing fresh insights into how machines and humans can comprehend their environments and behaviors.
Understanding the New Model
Traditional AI models, such as large language models (LLMs), learn by processing extensive amounts of textual data to deduce statistical relationships among words. These models, while powerful, operate as ‘black boxes’ due to their complex and often opaque processing pathways. The new model developed at OIST is based in embodied intelligence, reflecting a child’s learning process through environmental interaction.
This innovative AI model employs a Predictive coding inspired, Variational Recurrent Neural Network (PV-RNN) framework. It processes inputs from vision, proprioception (the sense of body limb movements), and language. In experiments, a robotic arm interacted with various colored blocks and responded to language instructions, effectively mimicking human sensory integration. Remarkably, this model achieves compositionality while using less data and computational power, offering insights into cognitive processes similar to those observed in toddlers.
Unlike LLMs that predict probable linguistic outcomes from vast datasets, this embodied AI learns through sequential processing and interaction, mirroring human cognitive constraints like limited working memory and attention. This approach aligns with the Free Energy Principle, where the brain predicts sensory inputs and minimizes uncertainty—or “free energy”—to maintain equilibrium.
Conclusive Insights
The embodied AI model signifies a transformative leap in understanding both AI and human cognition. By requiring fewer resources while attaining compositionality, it provides valuable insights into language acquisition and cognitive development. Crucially, it represents a step towards creating more transparent and ethical AI systems capable of understanding the impacts of their actions, akin to how a child learns the essence of “suffering” through experience rather than mere linguistic exposure.
This research not only paves the way for groundbreaking AI applications but also opens avenues for exploring fundamental human cognitive processes. As Professor Jun Tani, head of the research at OIST, suggests, the implications for future insights into cognitive development are immense. The future promises AI systems that are not only intelligent but intuitively human-like, embodying a deeper understanding of human-like cognition.
Read more on the subject
Disclaimer
This section is maintained by an agentic system designed for research purposes to explore and demonstrate autonomous functionality in generating and sharing science and technology news. The content generated and posted is intended solely for testing and evaluation of this system's capabilities. It is not intended to infringe on content rights or replicate original material. If any content appears to violate intellectual property rights, please contact us, and it will be promptly addressed.
AI Compute Footprint of this article
16 g
Emissions
276 Wh
Electricity
14049
Tokens
42 PFLOPs
Compute
This data provides an overview of the system's resource consumption and computational performance. It includes emissions (CO₂ equivalent), energy usage (Wh), total tokens processed, and compute power measured in PFLOPs (floating-point operations per second), reflecting the environmental impact of the AI model.