State-of-the-art quadrupedal locomotion approaches integrate Model Predictive Control (MPC) with Reinforcement Learning (RL), enabling complex motion capabilities with planning. Specifically, an infinite-horizon trajectory optimization problem is solved using a neural network-based dynamics model, with reference actions provided by an RL policy. However, these models often suffer from compounding errors over long horizons, leading to suboptimal performance, while the absence of inductive biases results in an opaque representation of motion. To address these issues, we introduce PEPC, a framework that integrates Lagrangian Neural Networks (LNNs) with MPC and RL, providing a principled and interpretable approach to learning system dynamics by embedding physical priors. Inspired by prior work where a policy and a world model (a "dreamer") which includes a neural network-based dynamics model are trained jointly, we leverage an LNN-based dreamer for future state predictions, ensuring robust policy learning. During deployment, we propose a novel inverse dynamics-based infinite-horizon MPC scheme that utilizes the same LNN-based dreamer for real-time planning, mitigating compounding errors while also making it 2x more computationally efficient, as it bypasses costly matrix inversions. We validate PEPC on real-world robotic hardware Unitree Go1, demonstrating robust locomotion across diverse terrains.
Illustration of PEPC's inverse dynamics-based MPC, which iteratively samples, evaluates, and refines motion plans using a physics-informed model for fast optimal quadrupedal locomotion. Unlike traditional neural network dynamics models, which provide opaque representations of motion, LNNs embed physical priors, enabling structured learning of the system's governing equations. We leverage this structured representation to design an inverse dynamics-based infinite horizon MPC, ensuring computational efficiency and stable quadrupedal locomotion.
Overview of the training framework, where an encoder compresses observation history into full state estimates, which are processed by a physics-informed Dreamer model to predict future states. These predictions guide an expert actor-critic policy for quadrupedal locomotion.
Our approach demonstrates robust locomotion behaviors across multiple terrains, including gravel surfaces, slopes, outdoors, and stairs. Below are some of the results showcasing the performance of PEPC on real-world robotic hardware Unitree Go1.