MSc internship/assignment: "Application of a RL autopilot in station-keeping in waves"
For MARIN Academy we are looking for a student for the following MSc internship/assignment:
Borrowing from , “reinforcement learning can be defined as learning what to do in which situation, e.g. how to map situations to actions, so as to maximize a reward. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also in the next time step and through that, all subsequent rewards. These two characteristics — trial-and-error search and delayed reward — are the two most important distinguishing features of reinforcement learning.”
These characteristics motivated considerable amount of research over the last few decades. Reinforcement learning (RL) and deep RL (combination of RL with deep learning techniques) have been used successfully in addressing challenging sequential decision-making problems. Several notable works using RL in games have stood out for attaining super-human level in playing Atari games from the pixels , mastering Go  or beating the world’s top professionals at the game of Poker . RL also has potential for real-world applications such as robotics [5-7], self-driving cars , finance  and smart grids , to name a few.
For the motion control of ships and industrial applications the classical feedback control systems are typically used . In the maritime domain, deep RL is found in path planning applications and in the control of autonomous robotic fish [11-12]. Especially in autonomous surface and underwater vehicles and collision avoidance type of applications, the use of deep RL has gained traction [16, 17, 18].
Figure 1: 30° heading change in calm water. The top plot shows the yaw angle and bottom plot shows the rudder angle. The result from the PID controller is shown in black color and the result from the RL agent in blue. The red line in the top plot shows the target yaw angle.
Since 2019, we have been working on designing an autopilot with deep RL at MARIN. Initially we considered two problem cases; heading change in calm water and heading keeping in waves. We explored several deep RL methods from the classes of value based methods and actor-critic methods. While training the RL agents in these problems we used as environment our in-house ship seakeeping and manoeuvring software. As the ship geometry we considered a frigate. The results showed promising performance (see Figure 1) from the deep RL agent in comparison with the baseline PID controller we considered.
We are currently looking for a student who will apply the RL autopilot in the problem of station-keeping in waves using different types of ships. The details of the two MSc projects are given below.
The goal of this project is to train an RL agent in a station keeping problem using different types of ships in order to observe its generalization capability.
The main motivation behind this project is to observe the generalization capability of the RL agent. This will be investigated in two aspects:
The scope of this MSc project is as follows:
For this project we are looking for candidates with the following profile:
The duration of the project is minimum 6 months. The start date and the precise duration of the project will be determined in consultation with the supervisors.
During the MSc project the student will be connected with the R&D department of MARIN. The supervisors are Bulent Duz, Bart Mak and Douwe Rijpkema all working as researchers at the R&D department.
In order to apply, visit https://www.marin.nl/internships and click on the link for this project. For information about the project, contact Bulent Duz at firstname.lastname@example.org or Bart Mak at email@example.com.
Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).
Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. 2015. “Human-level control through deep reinforcement learning”. Nature. 518(7540): 529–533.
Silver, D., A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016a. “Mastering the game of Go with deep neural networks and tree search”. Nature. 529(7587): 484–489.
Brown, N. and T. Sandholm. 2017. “Libratus: The Superhuman AI for No-Limit Poker”. International Joint Conference on Artificial Intelligence (IJCAI-17).
Levine, S., C. Finn, T. Darrell, and P. Abbeel. 2016. “End-to-end training of deep visuomotor policies”. Journal of Machine Learning Research. 17(39): 1–40.
Gandhi, D., L. Pinto, and A. Gupta. 2017. “Learning to Fly by Crashing”. arXiv preprint arXiv:1704.05588.
Pinto, L., M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. 2017. “Asymmetric Actor Critic for Image-Based Robot Learning”. arXiv preprint arXiv:1710.06542.
You, Y., X. Pan, Z. Wang, and C. Lu. 2017. “Virtual to Real Reinforcement Learning for Autonomous Driving”. arXiv preprint arXiv:1704.03952.
Deng, Y., F. Bao, Y. Kong, Z. Ren, and Q. Dai. 2017. “Deep direct reinforcement learning for financial signal representation and trading”. IEEE transactions on neural networks and learning systems. 28(3): 653–664.
François-Lavet, V. 2017. “Contributions to deep reinforcement learning and its applications
in smartgrids”. PhD thesis. University of Liege, Belgium.
L. P. Tuyen, A. Layek, N. A. Vien, T. Chung, 2017. “Deep Reinforcement Learning Algorithms for Steering an Underactuated Ship”, IEEE Int. Conf. On Multisensor Fusion and Integration for Intelligent Sytems.
J. Liu, L. E. Parker, R. Madhavan, 2007. “Reinforcement learning for autonomous robotic fish”, Mobile Robots: The Evolutionary Approach.
T. I. Fossen, Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons, 2011.
S. Kamthe and M.P. Deisenroth, 2018. “Data efficient reinforcement learning with probabilistic model predictive control”, arXiv:1706.06491v2.
Y. Cui, S. Osaki and T. Matsubara, 2019. “Reinforcement Learning Boat Autopilot: A Sample-efficient and Model Predictive Control based Approach”, arXiv:1901.07905v2.
A. J. Sinisterra, A. Barker, S. Verma and M. R. Dhanak, 2020. “Nonlinear and machine-learning-based station-keeping control of an unmanned surface vehicle”, OMAE2020-19276.
L. Zhao and M. Roh, 2019. “COLREGs-compliant multiship collision avoidance based on deep reinforcement learning”, Ocean Engineering, 191(1), 106436.
C. Zhao, L. Weng, S. Yu and H. He, 2020. “Autonomous surface vessel obstacle avoidance based on hierarchical reinforcement learning”, OMAE2020-18454.