Embodied AI is rapidly evolving, integrating traditional robotics techniques like SLAM with cutting-edge large language models and vision-language-action (VLA) systems. This resource map provides a structured learning path, from foundational SLAM algorithms to advanced VLA/VLN architectures. It highlights key technologies such as multimodal perception, reinforcement learning, and model-based control, offering a holistic view of the field. For developers and researchers, understanding this stack is crucial for building next-generation autonomous systems. The map serves as both a learning guide and a reference for identifying gaps and opportunities in the embodied AI landscape.
A comprehensive overview of the embodied AI technology stack, covering SLAM, large models, and VLA/VLN systems.