MSc internship/assignment: "An autopilot with a sample-efficient RL algorithm"

Publication dateJan 27, 2022
LocationWageningen, Wageningen
Employment40 hours
ContactBulent Duz,
For MARIN Academy we are looking for a student for the following MSc internship/assignment:

An autopilot with a sample-efficient RL algorithm



Borrowing from [1], “reinforcement learning can be defined as learning what to do in which situation, e.g. how to map situations to actions, so as to maximize a reward. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also in the next time step and through that, all subsequent rewards. These two characteristics — trial-and-error search and delayed reward — are the two most important distinguishing features of reinforcement learning.”
These characteristics motivated considerable amount of research over the last few decades. Reinforcement learning (RL) and deep RL (combination of RL with deep learning techniques) have been used successfully in addressing challenging sequential decision-making problems. Several notable works using RL in games have stood out for attaining super-human level in playing Atari games from the pixels [2], mastering Go [3] or beating the world’s top professionals at the game of Poker [4]. RL also has potential for real-world applications such as robotics [5-7], self-driving cars [8], finance [9] and smart grids [10], to name a few.
For the motion control of ships and industrial applications the classical feedback control systems are typically used [13]. In the maritime domain, deep RL is found in path planning applications and in the control of autonomous robotic fish [11-12].  Especially in autonomous surface and underwater vehicles and collision avoidance type of applications, the use of deep RL has gained traction [16, 17, 18].

Figure 1: 30°  heading change in calm water. The top plot shows the yaw angle and bottom plot shows the rudder angle. The result from the PID controller is shown in black color and the result from the RL agent in blue. The red line in the top plot shows the target yaw angle.

Since 2019, we have been working on designing an autopilot with deep RL at MARIN. Initially we considered two problem cases; heading change in calm water and heading keeping in waves. We explored several deep RL methods from the classes of value based methods and actor-critic methods. While training the RL agents in these problems we used as environment our in-house ship seakeeping and manoeuvring software. As the ship geometry we considered a frigate. The results showed promising performance (see Figure 1) from the deep RL agent in comparison with the baseline PID controller we considered.
We are currently looking for a student who will work on developing a sample efficient RL agent. The details of the project are given below.


The goal in this project is to develop a RL agent that has better sample efficiency than model-free RL agents.


So far in our work at MARIN, we mainly focused on value-based and actor-critic RL agents. One of the issues of these model-free methods is that they have low sample efficiency. Having high sample efficiency can be especially important when RL agents are trained in real world, where extremely high time cost of exploring and sampling with real ships can be unfeasible. In order to have better sample efficiency than model-free RL agents, one of the following two ideas might be explored in this project:
  • Model-based RL agents: In model-based RL, the agents learn policies from a trained model instead of directly from the environment, which allow these methods to have better data efficiency. Another advantage of model-based RL is that the learned model of the environment can be reused in different tasks.
  • Hybrid control algorithm: The strong suits of the deep RL and classical control algorithms can be combined in a hybrid algorithm in order take advantage of the best of both worlds.
The main question we would like to answer in this project is: How can we design and implement a sample efficient RL agent?


The scope of this MSc project is as follows:
  • Get familiar with the RL framework at MARIN written in Python.
  • Carry out a literature survey to choose a promising approach.
  • Implement the new RL agent.
  • Train the implemented RL agent using a Frigate type ship in the problem of heading keeping in waves. In this problem, the agent controls the rudder of the ship in order to maintain its heading in waves.
  • Compare the performances of the implemented agent with the model-free RL agents.
  • Compile a report summarizing the outcome of the work and give a presentation. A publication in a conference or a journal can be considered depending on the outcome of the work.

Profile of the student

For this project we are looking for candidates with the following profile:
  • Master student with background in a relevant field such as Machine Learning, Hydrodynamics, Engineering, Computational Science, Applied Science, Mathematics or other similar disciplines.
  • Knowledge in basic machine learning
  • Knowledge in RL
  • Experience in Python
  • Knowledge in basic control theory (advantageous but not a must)
  • Knowledge in ship seakeeping and manoeuvring (advantageous but not a must)
  • Experience in working with supercomputers (advantageous but not a must)


The duration of the project is minimum 6 months. The start date and the precise duration of the project will be determined in consultation with the supervisors.

Department and supervisor

During the MSc project the student will be connected with the R&D department of MARIN. The supervisors are Bulent Duz, Bart Mak and Douwe Rijpkema all working as researchers at the R&D department.

Application for the internship

In order to apply, visit and click on the link for this project. For information about the project, contact Bulent Duz at or Bart Mak at


Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).
Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. 2015. “Human-level control through deep reinforcement learning”. Nature. 518(7540): 529–533.
Silver, D., A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016a. “Mastering the game of Go with deep neural networks and tree search”. Nature. 529(7587): 484–489.
Brown, N. and T. Sandholm. 2017. “Libratus: The Superhuman AI for No-Limit Poker”. International Joint Conference on Artificial Intelligence (IJCAI-17).
Levine, S., C. Finn, T. Darrell, and P. Abbeel. 2016. “End-to-end training of deep visuomotor policies”. Journal of Machine Learning Research. 17(39): 1–40.
Gandhi, D., L. Pinto, and A. Gupta. 2017. “Learning to Fly by Crashing”. arXiv preprint arXiv:1704.05588.
Pinto, L., M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. 2017. “Asymmetric Actor Critic for Image-Based Robot Learning”. arXiv preprint arXiv:1710.06542.
You, Y., X. Pan, Z. Wang, and C. Lu. 2017. “Virtual to Real Reinforcement Learning for Autonomous Driving”. arXiv preprint arXiv:1704.03952.
Deng, Y., F. Bao, Y. Kong, Z. Ren, and Q. Dai. 2017. “Deep direct reinforcement learning for financial signal representation and trading”. IEEE transactions on neural networks and learning systems. 28(3): 653–664.
François-Lavet, V. 2017. “Contributions to deep reinforcement learning and its applications
in smartgrids”. PhD thesis. University of Liege, Belgium.
L. P. Tuyen, A. Layek, N. A. Vien, T. Chung, 2017. “Deep Reinforcement Learning Algorithms for Steering an Underactuated Ship”, IEEE Int. Conf. On Multisensor Fusion and Integration for Intelligent Sytems.
J. Liu, L. E. Parker, R. Madhavan, 2007. “Reinforcement learning for autonomous robotic fish”, Mobile Robots: The Evolutionary Approach.
T. I. Fossen, Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons, 2011.
S. Kamthe and M.P. Deisenroth, 2018. “Data efficient reinforcement learning with probabilistic model predictive control”, arXiv:1706.06491v2.
Y. Cui, S. Osaki and T. Matsubara, 2019. “Reinforcement Learning Boat Autopilot: A Sample-efficient and Model Predictive Control based Approach”, arXiv:1901.07905v2.
A. J. Sinisterra, A. Barker, S. Verma and M. R. Dhanak, 2020. “Nonlinear and machine-learning-based station-keeping control of an unmanned surface vehicle”, OMAE2020-19276.
L. Zhao and M. Roh, 2019. “COLREGs-compliant multiship collision avoidance based on deep reinforcement learning”, Ocean Engineering, 191(1), 106436.
C. Zhao, L. Weng, S. Yu and H. He, 2020. “Autonomous surface vessel obstacle avoidance based on hierarchical reinforcement learning”, OMAE2020-18454.


Contact person photo

Bulent Duz