The School of EECS is hosting the following PhD Progress Review 3 Seminar

AlgorithmOS, a self organising modular framework for the implementation of algorithmic research, and its applications in the implementation of probabalistic models for goal-based reinforcement learning

Speaker: Llewyn Salt
Host: A/Prof Marcus Gallagher

Abstract: Reinforcement learning - teaching an artificial agent how to interactwith an environment so as to maximise a reward signal - has had many successes in recent years. It has been used to train agents to play Atari games, Go, and Starcraft. These successes have been made possible by the development of new algorithms, such as deep Q-learning, and the development of new hardware, such as graphics and tensor processing units (GPUs and TPUs). The uptick in interest in reinforcement learning has led to the development of a large corpus of algorithms such as deep Q-learning, deep deterministic policy gradients, proximal policy optimisation, trust region policy optimisation, and soft actor critic to name a few. With the increased interest in reinforcement learning and the rise in the number of these algorithms there has also been an increasing concern in the reproducibility of these algorithms. Indeed, going over the code provided for the algorithm presented in the soft q learning paper one struggles to find the α parameter discussed in the paper. This is not an isolated incident, and the lack of reproducibility in the field has led to the development of a number of reproducibility frameworks such as OpenAI's Spinning Up, Tianshou, stable baselines 3, rllib, and DeepMind's Acme. These frameworks tout modularity but often tightly couple components of reinforcement learning algorithms, making it difficult to extend them.

The introduction of goals to allow for multimodal policies has been one area of interest. This field typically consists of hierarchical reinforcement learning or curriculum learning to break down the behaviours into simpler sub-problems before learning the more difficult overarching or metapolicy. Much like we learn to run before we walk or we learn arithmetic before we learn calculus. One open research problem in this field is how do we fully automate task creation?

I implement a probabilistic model to suggest goals within the proximal zone of development for the agents utilising parametric models such as deep mixture density networks or normalising flows. The goldilocks goals, so to speak, not too hard or too easy, so as to maximise the amount of information gained during training. This model is tested on a number of common environments as well as a single input single output direct current motor simulation to analyse the accuracy of the probability model. Due to nature of the model, one essentially learns the transition function so we can consider utilising the same model as a hierarchical agent that can suggest sub-goals to the underlying policy. During the implementation of these ideas it became clear that the current "modular" frameworks for reinforcement learning were too opinionated to be able to implement these ideas.

The inflexibility of current reinforcement learning libraries has led to the development of a new framework, AlgOS, which is self-organising, unopinionated, and modular. It utilises a novel method combining  an abstract syntax tree, a heavily modified observer pattern, and threads to action the logical flow experimental code. The user defines the inputs and outputs of various components and the framework handles the rest. The framework was developed with the intention of being research friendly, to enable reusability of code, and to enable one to inject one’s own logic into existing algorithms through input/output intercepts. AlgOS is developed with core functionality and then extended specifically for reinforcement learning but could be extended to any algorithms that can be modelled as a cyclic or acyclic graph. Additional features such as a database logger, distributed computing functionality, cli experiments, and automated hyperparameter optimisation are also included to make the framework user friendly for researchers. I hope the development of this framework can make inroads into the reproducibility crisis in reinforcement learning and enable researchers to focus on the research rather than the implementation details.

In summary, I make three main contributions in this thesis. Firstly, a feature rich self organising research framework AlgOS which enables users to develop their own logic and reuse existing code. The core OS exploits a novel method to control the logical flow of an algorithm. Additionally, it provides features for running code on remote super computers, automated hyperparameter optimisation, and a database logger. Secondly, a probabilistic model for goal selection in reinforcement learning. This model is tested on a number of common environments as well as a single input single output direct current motor simulation to analyse the accuracy of the probability model. Thirdly, a hierarchical reinforcement learning agent that utilises the probabilistic model to suggest sub-goals to the underlying policy.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Zoom - https://uqz.zoom.us/j/85110095915