MSc internship/assignment: "Application of a RL autopilot in station-keeping in waves"

Publication dateMay 3, 2021
LocationWageningen, Wageningen
Employment40 hours

For MARIN Academy we are looking for a student for the following MSc internship/assignment:

Application of a RL autopilot in station-keeping in waves


Borrowing from [1], “reinforcement learning can be defined as learning what to do in which situation, e.g. how to map situations to actions, so as to maximize a reward. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also in the next time step and through that, all subsequent rewards. These two characteristics — trial-and-error search and delayed reward — are the two most important distinguishing features of reinforcement learning.”
These characteristics motivated considerable amount of research over the last few decades. Reinforcement learning (RL) and deep RL (combination of RL with deep learning techniques) have been used successfully in addressing challenging sequential decision-making problems. Several notable works using RL in games have stood out for attaining super-human level in playing Atari games from the pixels [2], mastering Go [3] or beating the world’s top professionals at the game of Poker [4]. RL also has potential for real-world applications such as robotics [5-7], self-driving cars [8], finance [9] and smart grids [10], to name a few.
For the motion control of ships and industrial applications the classical feedback control systems are typically used [13]. In the maritime domain, deep RL is found in path planning applications and in the control of autonomous robotic fish [11-12].  Especially in autonomous surface and underwater vehicles and collision avoidance type of applications, the use of deep RL has gained traction [16, 17, 18].

Figure 1: 30°  heading change in calm water. The top plot shows the yaw angle and bottom plot shows the rudder angle. The result from the PID controller is shown in black color and the result from the RL agent in blue. The red line in the top plot shows the target yaw angle.

Since 2019, we have been working on designing an autopilot with deep RL at MARIN. Initially we considered two problem cases; heading change in calm water and heading keeping in waves. We explored several deep RL methods from the classes of value based methods and actor-critic methods. While training the RL agents in these problems we used as environment our in-house ship seakeeping and manoeuvring software. As the ship geometry we considered a frigate. The results showed promising performance (see Figure 1) from the deep RL agent in comparison with the baseline PID controller we considered.
We are currently looking for a student who will apply the RL autopilot in the problem of station-keeping in waves using different types of ships. The details of the two MSc projects are given below.


The goal of this project is to train an RL agent in a station keeping problem using different types of ships in order to observe its generalization capability.


The main motivation behind this project is to observe the generalization capability of the RL agent. This will be investigated in two aspects:

  • The RL agent will be trained using different types of ships including a small fast boat, a submarine and a catamaran. These ships have different behaviours at sea and different thrusters\control surfaces.  So the main question we would like to answer is: Does the RL agent provide an accurate and reliable performance when different ship types are considered?
  • So far in our work at MARIN, we have considered mainly two problem cases; heading change in calm water and heading keeping in waves. In this project, the RL agent will be trained in the station-keeping problem case. So the main question is:  Does the RL agent provide an accurate and reliable performance when a different problem case is considered?



The scope of this MSc project is as follows:

  • Get familiar with the RL framework at MARIN written in Python.
  • Train the RL agent for a few different ships and compare its performance to the baseline controller. The ships can potentially include a small fast boat, a submarine, and a catamaran.
  • Consider the station-keeping in waves as the problem case, where the ship travels from one location to another in waves, and keeps it pose at the target location for a certain amount of time. This task potentially requires controlling the thrusters\control surfaces of the ship.
  • Compile a report summarizing the outcome of the work and give a presentation. A publication in a conference or a journal can be considered depending on the outcome of the work.


Profile of the student

For this project we are looking for candidates with the following profile:

  • Master student with background in a relevant field such as Machine Learning, Hydrodynamics, Engineering, Computational Science, Applied Science, Mathematics or other similar disciplines.
  • Knowledge in ship seakeeping and manoeuvring
  • Experience in Python
  • Knowledge in basic machine learning (online course can be suggested to the chosen candidate)
  • Knowledge in RL (online course can be suggested to the chosen candidate)
  • Knowledge in basic control theory (advantageous but not a must)
  • Experience in working with supercomputers (advantageous but not a must)



The duration of the project is minimum 6 months. The start date and the precise duration of the project will be determined in consultation with the supervisors.

Department and supervisor

During the MSc project the student will be connected with the R&D department of MARIN. The supervisors are Bulent Duz, Bart Mak and Douwe Rijpkema all working as researchers at the R&D department.

Application for the internship

In order to apply, visit and click on the link for this project. For information about the project, contact Bulent Duz at or Bart Mak at


Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).
Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. 2015. “Human-level control through deep reinforcement learning”. Nature. 518(7540): 529–533.
Silver, D., A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016a. “Mastering the game of Go with deep neural networks and tree search”. Nature. 529(7587): 484–489.
Brown, N. and T. Sandholm. 2017. “Libratus: The Superhuman AI for No-Limit Poker”. International Joint Conference on Artificial Intelligence (IJCAI-17).
Levine, S., C. Finn, T. Darrell, and P. Abbeel. 2016. “End-to-end training of deep visuomotor policies”. Journal of Machine Learning Research. 17(39): 1–40.
Gandhi, D., L. Pinto, and A. Gupta. 2017. “Learning to Fly by Crashing”. arXiv preprint arXiv:1704.05588.
Pinto, L., M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. 2017. “Asymmetric Actor Critic for Image-Based Robot Learning”. arXiv preprint arXiv:1710.06542.
You, Y., X. Pan, Z. Wang, and C. Lu. 2017. “Virtual to Real Reinforcement Learning for Autonomous Driving”. arXiv preprint arXiv:1704.03952.
Deng, Y., F. Bao, Y. Kong, Z. Ren, and Q. Dai. 2017. “Deep direct reinforcement learning for financial signal representation and trading”. IEEE transactions on neural networks and learning systems. 28(3): 653–664.
François-Lavet, V. 2017. “Contributions to deep reinforcement learning and its applications
in smartgrids”. PhD thesis. University of Liege, Belgium.
L. P. Tuyen, A. Layek, N. A. Vien, T. Chung, 2017. “Deep Reinforcement Learning Algorithms for Steering an Underactuated Ship”, IEEE Int. Conf. On Multisensor Fusion and Integration for Intelligent Sytems.
J. Liu, L. E. Parker, R. Madhavan, 2007. “Reinforcement learning for autonomous robotic fish”, Mobile Robots: The Evolutionary Approach.
T. I. Fossen, Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons, 2011.
S. Kamthe and M.P. Deisenroth, 2018. “Data efficient reinforcement learning with probabilistic model predictive control”, arXiv:1706.06491v2.
Y. Cui, S. Osaki and T. Matsubara, 2019. “Reinforcement Learning Boat Autopilot: A Sample-efficient and Model Predictive Control based Approach”, arXiv:1901.07905v2.
A. J. Sinisterra, A. Barker, S. Verma and M. R. Dhanak, 2020. “Nonlinear and machine-learning-based station-keeping control of an unmanned surface vehicle”, OMAE2020-19276.
L. Zhao and M. Roh, 2019. “COLREGs-compliant multiship collision avoidance based on deep reinforcement learning”, Ocean Engineering, 191(1), 106436.
C. Zhao, L. Weng, S. Yu and H. He, 2020. “Autonomous surface vessel obstacle avoidance based on hierarchical reinforcement learning”, OMAE2020-18454.