Stoch Lab Logo IISc Logo

Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds

Indian Institute of Science (IISc), Bangalore
(Preprint under review)

Abstract

State-of-the-art quadrupedal locomotion approaches integrate Model Predictive Control (MPC) with Reinforcement Learning (RL), enabling complex motion capabilities with planning and terrain-adaptive behaviors. However, they often face compounding errors over long horizons and have limited interpretability due to the absence of physical inductive biases. We address these issues by integrating Lagrangian Neural Networks (LNNs) into an RL–MPC framework, enabling physically consistent dynamics learning. At deployment, our inverse-dynamics infinite-horizon MPC scheme avoids costly matrix inversions, improving computational efficiency by up to 4× with minimal loss of task performance. We validate our framework through multiple ablations of the proposed LNN and its variants. We show improved sample efficiency, reduced long-horizon error, and faster real-time planning compared to unstructured neural dynamics. Lastly, we also test our framework on the Unitree Go1 robot to show real-world viability.


Deployment

PEPC Alg 1 PEPC Alg 3

Illustration of PEPC's inverse dynamics-based MPC, which iteratively samples, evaluates, and refines motion plans using a physics-informed model for fast optimal quadrupedal locomotion. Unlike traditional neural network dynamics models, which provide opaque representations of motion, LNNs embed physical priors, enabling structured learning of the system's governing equations. We leverage this structured representation to design an inverse dynamics-based infinite horizon MPC, ensuring computational efficiency and stable quadrupedal locomotion.

Training Architecture

Overview of the training framework, where an encoder compresses observation history into full state estimates, which are processed by a physics-informed Dreamer model to predict future states. These predictions guide an expert actor-critic policy for quadrupedal locomotion.

Results

Our approach demonstrates robust locomotion behaviors across multiple terrains, including gravel surfaces, slopes, outdoors, and stairs. Below are some of the results showcasing the performance of PEPC on real-world robotic hardware Unitree Go1.