Landing a rocket in a gust

Picture of the lab member

Jeyaditya Babu

MS Student

Picture of the second lab member

Mohanraj

MS Student

2026-03-24

Schematic

In this post, we introduce the concept of Model Predictive Control (MPC), which originated as a mechanism for decision making in the 1960s in the petrochemical industry for complex process control with strict constraints. However, it has found wider applicability, especially in robotic systems, and is now touted as the 'System 2' for complex decision tasks. We begin with the fundamental idea behind MPC, how it is formulated mathematically and then move towards using it in a toy example: vertical landing of a rocket while using as little thrust as possible à la SpaceX. Along the way, we will see how prediction, optimization, and constraints come together to form a practical control strategy.

Brief history

In the 1960s and 70s, industrial processes such as those in oil refineries and petrochemical plants had grown too complex with little guarantees to be handled by traditional control methods. A system like a distillation column involved multiple inputs, including reflux rate, reboiler heat, and feed flow, all influencing multiple outputs such as product composition and temperature across different stages. These interactions were tightly coupled, the dynamics were slow, and the effects of any control action could take a long time to appear, all while the system had to operate within strict limits on temperature, pressure, and actuator capability. Engineers tried to manage this using independent PID loops, but these loops acted locally and without foresight, so correcting one variable often disturbed another, leading to oscillations and inefficient operation. At some point it became clear that reacting to the present was not enough, because every action had consequences that unfolded in the future. The question then shifted from what input should be applied now to what sequence of inputs should be applied over time. This change in thinking led to a new approach in which a model of the system is used to predict how it will evolve, an optimization problem is solved over a finite horizon to choose inputs that satisfy constraints and improve performance, and only the first input is applied before the process is repeated as new information becomes available. This idea, which came to be known as Model Predictive Control, was first implemented in the 1970s in forms such as Dynamic Matrix Control at Shell, where it proved valuable not because it was elegant, but because it worked under real constraints and improved economic performance.

Formulation

Schematic illustrating the MPC strategy. The top graph shows the past trajectory of the system resulting in the current state xkx_k, alongside the predicted optimal trajectory approaching the target xrefx^{\text{ref}} over the prediction horizon NN. The bottom graph displays the corresponding sequence of calculated control inputs. The fundamental idea behind MPC is to compute full future plan but apply only the immediate action ukku^*_{k|k}.

Before we land anything, let us try to understand the mathematical formalism of MPC. Let us consider a system with state xkx_k (could be the position or momentum or any variable that characterizes the system of interest) at a discrete time kk. The evolution of the state can be written as xk+1=F(xk,uk)x_{k+1} = F(x_k, u_k), where uku_k is the control input of the system and F(xk,uk)F(x_k, u_k) is the process that governs the dynamics of the system. Imagine riding a bicycle, the state xkx_k describes where you are, how fast you are moving, and in what direction you are moving. The control input uku_k is the angle to which you rotate the handlebar or the thrust you provide on the pedal. Those actions determine where you will be in the next instant, which is the new state xk+1x_{k+1}.

In the framework of Model Predictive Control, our goal is to decide what is the 'best' action to take at a particular instant by considering several future scenarios. That is, at time kk, we consider a sequence ukk,  uk+1k,  uk+2k,u_{k|k}, \; u_{k+1|k}, \; u_{k+2|k}, \dots to choose the control input uku_k. The notation uk+iku_{k+i|k} simply means "the control input at a future time k+ik+i, computed at time kk". Once this future input sequence is fixed, the model tells us exactly what will happen next. Using the model xk+1=F(xk,uk)x_{k+1} = F(x_k,u_k), we can predict the next state, xk+1k=F(xk,ukk).x_{k+1|k} = F(x_k, u_{k|k}). Using the same equation again, we can predict the state two steps ahead, xk+2k=F(xk+1k,uk+1k).x_{k+2|k} = F(x_{k+1|k}, u_{k+1|k}). The state several steps ahead, xk+jkx_{k+j|k} for j1j \gg 1, depends on the current state, xkkx_{k|k} and on all future control inputs, uk+jku_{k+j|k} we have chosen. In other words, once we choose a future control input sequence uk+jku_{k+j|k}, we have also chosen a future trajectory for the system. Control is not purely reactive, it is about shaping the future trajectory.

The obvious next question is: among all possible future input sequences uk+jku_{k+j|k}, which one should we choose? To answer this, we introduce a cost function, JJ. The cost function is a measure of how "good" a predicted future trajectory is with respect to how we want the system to behave. In the rocket landing example, we want the predicted height and velocity to approach zero over time. At the same time, we do not want to use unnecessarily large thrust. A simple way to express this is to penalize two quantities: the rocket's position error and the control effort. Over a horizon of NN future steps, we can define JJ of the form:

J=λxi=1Nxk+ikxk+iref2+λui=0N1uk+ik2.J = \lambda_x \sum_{i=1}^{N} \| x_{k+i|k} - x_{k+i}^{\text{ref}} \|^2 + \lambda_u \sum_{i=0}^{N-1} \| u_{k+i|k} \|^2.

In order to find the control input at time-step kk, our goal is to minimize JJ for a fixed horizon NN. The optimization problem therefore consists of minimizing JJ over the future input sequence, subject to the dynamic constraint xk+1=F(xk,uk)x_{k+1} = F(x_k, u_k). The first term in the cost function JJ penalizes the deviation from the desired trajectory while the second penalizes excessive control action. Weighting factors λx,u\lambda_{x, u} adjust the relative importance of these two objectives. The predicted outputs depend on the chosen input sequence through the system dynamics. If the dynamics is linear and the cost is quadratic, the resulting optimization problem becomes a well known quadratic program. On the other hand, if the dynamics is nonlinear, the optimization becomes a nonlinear program. Therefore, the optimal sequence of inputs can be found by minimizing JJ with respect to future inputs. The important aspect of MPC compared to other optimal control frameworks is that after computing this optimal control input uk+jku^*_{k+j|k} at time-step kk, we do not apply it for all future times from kk to k+Nk+N, rather we apply only the first control action, ukku^*_{k|k}. At the next time-step, we measure the new state xk+1x_{k+1}, shift the prediction horizon forward, and solve the optimization problem again. In this way, the control input is continuously re-evaluated based on updated state information. This repeated prediction and optimization is what makes Model Predictive Control adaptive.

In its most general form, a control problem consists of choosing an input sequence uku_k that minimizes a performance metric J(xk,uk)J(x_k, u_k), subject to the system dynamics and physical constraints xk+1=F(xk,uk)x_{k+1} = F(x_k, u_k). For a discrete-time system, a finite-horizon optimal control problem can be written as

min{u0,,uN1}k=0N1J(xk,uk)+Φ(xN)\min_{\{u_0,\dots,u_{N-1}\}} \sum_{k=0}^{N-1} J(x_k, u_k) + \Phi(x_N)

subject to

xk+1=F(xk,uk).x_{k+1} = F(x_k, u_k).

Here, J(xk,uk)J(x_k,u_k) represents the stage cost accumulated at each step and Φ(xN)\Phi(x_N) is the terminal cost. Model Predictive Control operates by repeatedly solving such a finite-horizon optimal control problem in real time. At each time-step, the problem is solved using the current state, xkx_k as the initial condition, the first control action uku_k is applied, we shift the horizon (NN steps) by one discrete time step kk to k+1k+1. In this sense, MPC is not a fundamentally different type of controller but rather a real-time implementation of optimal control. With this background in place, let us now try to land the rocket.

Landing the Rocket

Schematic of the rocket in the yzyz-plane with its total mass (including fuel) mm, orientation θ\theta and control variables, the gimbal angle δ\delta and the thrust TT. The mass of the rocket m(t)m(t) evolves with time as the fuel gets consumed to exert thrust T(t)T(t) while the gimbal angle δ(t)\delta(t) can be tuned to change the orientation of the rocket.

Consider the task of vertically landing a rocket which, as we shall see, is a concrete instance where we can use the general finite-horizon optimal control framework. Our goal is to plan the motion of a planar rigid rocket with variable mass whose state at time tt is given by

x(t)=[y(t)z(t)θ(t)y˙(t)z˙(t)θ˙(t)m(t)]T,\mathbf{x}(t) = \begin{bmatrix} y(t) & z(t) & \theta(t) & \dot y(t) & \dot z(t) & \dot \theta(t) & m(t) \end{bmatrix}^\mathsf{T},

where y(t)y(t) and z(t)z(t) denote horizontal and vertical positions, θ(t)\theta(t) is the pitch angle, y˙(t)\dot y(t), z˙(t)\dot z(t), and θ˙(t)\dot\theta(t) are the corresponding velocities, and m(t)m(t) is the current mass of the rocket. The control input is

u(t)=[Teff(t)δ(t)]T,\mathbf{u}(t) = \begin{bmatrix} T_{\mathrm{eff}}(t) & \delta(t) \end{bmatrix}^\mathsf{T},

where Teff(t)T_{\mathrm{eff}}(t) represents the thrust magnitude and δ(t)\delta(t) the gimbal deflection angle, which are controlled through actuators. The rocket motion can then be described using Newton's laws for a variable mass rigid body. The thrust acts along the body axis of the rocket, while gravity acts downward (see figure above for a schematic of the setup). The translational dynamics is written as

y¨=Teffmsin(θ+δ),z¨=Teffmcos(θ+δ)g.\ddot y = \frac{T_{\mathrm{eff}}}{m} \sin(\theta + \delta), \qquad \ddot z = \frac{T_{\mathrm{eff}}}{m} \cos(\theta + \delta) - g.

The thrust exerted by the rocket, TeffT_{\mathrm{eff}} must reach zero when the fuel is empty. This effect is accounted for, while also maintaining the differentiability required by interior-point solvers, by approximating the hard engine cutoff using a smooth function. We define an activation function Θ(m)\Theta(m) that smoothly transitions from 11 to 00 as the mass approaches mdrym_{\text{dry}},

Θ(m)=12[1+tanh(k(mmdry))],\Theta(m) = \frac{1}{2} \left[ 1 + \tanh\big(k (m - m_{\text{dry}})\big) \right],

where kk is a steepness parameter that controls the sharpness of the cutoff. The effective thrust is then defined as the commanded thrust scaled by this activation factor, Teff(T,m)=TΘ(m).T_{\text{eff}}(T, m) = T \Theta(m). The mass depletion dynamics can be written as a single continuous equation, m˙=Teff(T,m)ve.\dot m = - \frac{T_{\text{eff}}(T, m)}{v_e}. The translational dynamics driven by this effective thrust are given by

y¨=Teff(T,m)msin(θ+δ),z¨=Teff(T,m)mcos(θ+δ)g,\ddot y = \frac{T_{\text{eff}}(T, m)}{m} \sin(\theta + \delta), \qquad \ddot z = \frac{T_{\text{eff}}(T, m)}{m} \cos(\theta + \delta) - g,

and the balance of torque about the center of mass yields the rotational dynamics,

θ¨=eff(m)I(m)Teff(T,m)sin(δ).\ddot{\theta} = \frac{\ell_{\text{eff}}(m)}{I(m)} \, T_{\text{eff}}(T, m) \sin(\delta).

These equations define a nonlinear continuous-time model x˙=f(x,u)\dot{\mathbf{x}}= \mathbf{f}(\mathbf{x}, \mathbf{u}). To use this model to perform the landing task using MPC, we need to discretize the temporal dynamics using a numerical scheme such as the 4-th order Runge Kutta scheme. This produces a discrete mapping xk+1=F(xk,uk)\mathbf{x}_{k+1} = \mathbf{F}(\mathbf{x}_k, \mathbf{u}_k) given the current state xk\mathbf{x}_k and the control input uk\mathbf{u}_k. With the model in place, we now identify a cost function that measures performance. For soft landing, position, velocity, and orientation should approach zero near the landing pad. Using the discrete definition of the state, xk=[ykzkθky˙kz˙kθ˙kmk]T,\mathbf{x}_k = \begin{bmatrix} y_k & z_k & \theta_k & \dot y_k & \dot z_k & \dot \theta_k & m_k \end{bmatrix}^\mathsf{T}, a reasonable choice of cost function J(xk,uk)J(\mathbf{x}_k, \mathbf{u}_k) is

J(xk,uk)=k=0N1[(xkxtarget)TQ(xkxtarget)+WTTk+RTTk2+Rδδk2]+(xNxtarget)TP(xNxtarget).J(\mathbf{x}_k, \mathbf{u}_k) = \sum_{k=0}^{N-1} \bigg[ (\mathbf{x}_k - \mathbf{x}_{\text{target}})^\mathsf{T}Q (\mathbf{x}_k - \mathbf{x}_{\text{target}}) + W_T T_k + R_T T_k^2 + R_\delta \delta_k^2 \bigg] + (\mathbf{x}_N - \mathbf{x}_{\text{target}})^\mathsf{T}P (\mathbf{x}_N - \mathbf{x}_{\text{target}}).

The quadratic term (xkxtarget)TQ(xkxtarget)(\mathbf{x}_k - \mathbf{x}_{\text{target}})^\mathsf{T}Q (\mathbf{x}_k - \mathbf{x}_{\text{target}}) penalizes deviation from the desired landing state, ensuring that position, velocity, and orientation approach zero. The terminal cost (xNxtarget)TP(xNxtarget)(\mathbf{x}_N - \mathbf{x}_{\text{target}})^\mathsf{T}P (\mathbf{x}_N - \mathbf{x}_{\text{target}}) enforces the accuracy in the final time step. The term RTTk2R_T T_k^2 penalizes large thrust magnitudes, encouraging smoother control inputs, avoiding aggressive actuation. In addition to this, the linear term WTTkW_T T_k directly penalizes the total thrust usage. While the quadratic term discourages large instantaneous thrust, the linear term discourages sustained thrust over time, thereby promoting fuel efficient trajectories.

With the dynamics, cost, and constraints defined, we can now state the landing task explicitly as a finite horizon optimal control problem,

min{xk,uk}J(xk,uk)\min_{\{\mathbf{x}_k, \mathbf{u}_k\}} J (\mathbf{x}_k, \mathbf{u}_k)

subject to

xk+1=F(xk,uk),k=0,,N1,\mathbf{x}_{k+1} = \mathbf{F}(\mathbf{x}_k, \mathbf{u}_k), \quad k=0,\dots,N-1,

x0=xmeasured,\mathbf{x}_0 = \mathbf{x}_{\text{measured}},

0TkTmax,δkδmax,0 \le T_k \le T_{\max}, \qquad |\delta_k| \le \delta_{\max},

zk0,mdrymkmwet.z_k \ge 0, \qquad m_{\text{dry}} \le m_k \le m_{\text{wet}}.

Here, the mass is not directly penalized in the tracking term; fuel efficiency is encouraged through the thrust penalty instead. Solving this problem at each time-step, applying only the first control input, and then repeating the procedure with the updated state yields a receding horizon controller capable of guiding the rocket toward a soft landing while reacting to disturbances along the way. The animation above shows the soft-landing of the rocket using this procedure. The python-code used to generate this animation is available here. This toy model was inspired by the problem set in Robert Stengel's book on Optimal Control and Estimation.

Computational Implementation

Translating the continuous-time optimal control problem into code requires a self-consistent discretization procedure. We discretize the dynamics using a 4th-order Runge-Kutta (RK4) scheme with a time step of dt=0.1dt = 0.1 s and a horizon of N=40N = 40 steps (a 4 sec look-ahead). The resulting Nonlinear Program (NLP) is formulated using CasADi library in python. At each time-step, the NLP is solved using IPOPT and the solver simultaneously optimizes the physical states xk\mathbf{x}_k, controls uk\mathbf{u}_k and the Lagrange multipliers that ensure xk+1=F(xk,uk)\mathbf{x}_{k+1} = \mathbf{F}(\mathbf{x}_k, \mathbf{u}_k) and thrust limits. To achieve real-time feasibility, we use warm starting, where we take the optimal control sequence computed at the previous step as the initial guess: uk={ukk,uk+1k,,uk+N1k}\mathbf{u}^*_{k} = \{ \mathbf{u}_{k|k}, \mathbf{u}_{k+1|k}, \dots, \mathbf{u}_{k+N-1|k} \} We shift this sequence forward by one step and duplicate the final input to create our initial guess: uk+1guess={uk+1k,uk+2k,,uk+N1k,uk+N1k}\mathbf{u}^{\text{guess}}_{k+1} = \{ \mathbf{u}_{k+1|k}, \mathbf{u}_{k+2|k}, \dots, \mathbf{u}_{k+N-1|k}, \mathbf{u}_{k+N-1|k} \} We also apply the same shift to the state trajectory xk\mathbf{x}_k and the lagrange multipliers. This guess places the solver close to the next optimal solution, reducing the computation time and allowing the MPC to run dynamically in real time.

Conclusion

Model Predictive Control extends the capabilities of optimal control but with added adaptability. At every time-step, we use a model of the system to predict how the future might unfold, choose the sequence of inputs that best balances performance and effort, apply only the first input, and then repeat the process with new information. The primary advantage compared to a fixed controller obtained via an optimal control procedure is that you allow the environmental perturbations to affect the system dynamically and adapt to it with an updated controller. The rocket landing problem is only a toy example; however, the MPC framework is general enough that it is today applied in chemical industry, autonomous vehicle planning, active suspension systems, motor drives, wind turbines and wind farm clusters, and several other places.