Fitness Landscape Features as Curriculum Ordering Measures for Reinforcement Learning

The School of EECS is hosting the following PhD progress review 3 seminar

Fitness Landscape Features as Curriculum Ordering Measures for Reinforcement Learning

Speaker: Nathaniel du Preez-Wilkinson
Host: A/Prof Marcus Gallagher

Abstract: Reinforcement learning is a machine learning paradigm in which an agent interacts with an environment by taking actions, receives rewards, and learns to optimise the sum of those rewards via trial and error. Defining the function that determines when rewards are given in a reinforcement learning problem is a non-trivial task. Reward functions that are rich in information, following some heuristic about how to solve the problem, can often promote undesirable behaviour as an unintended consequence. On the other hand, reward functions that are sparse, potentially providing reward only when a final goal is achieved, are more likely to promote the desired behaviour; but provide little information to assist the agent during learning.

Curriculum learning is one method for addressing the problem of sparse rewards in reinforcement learning. Curriculum learning is a macine learning paradigm that suggests that agents should be trained on multiple problem instances of increasing difficulty. Previous results have demonstrated improved performance on a single target problem by using a curriculum over multiple problems, when compared to spending the same amount of training time solely on the target problem. A key drawback to curriculum learning, however, is that the notion of “difficulty” is not defined. This presents a challenge for machine learning practioners interested in using the technique, and may cause reproducibility and consistency issues as researchers heutristically determine difficulty orderings on a case-by-case basis.

Fitness landscape analysis is a field of research concerned with defining and measuring quantifiable features of optimisation problems, in order to categorise them and determine their relative difficulties. The most commonly mentioned practical application for this field is algorithm selection: selecting which algorithm to use to solve a given problem, based on the characteristics of the problem. The natural inverse of this would be selecting a problem to solve for a given algorithm; which, if repeated multiple times, is essentially the problem of determing an ordering for a curriculum. This thesis combines the three fields of reinforcement learning, curriculum learning, and fitness landscape analysis for the first time. At a high level, curriculum learning is used to address the issue of sparse rewards in reinforcement learning, and features from the fitness landscape literature are used to formally define ordering measures for curriculum learning. Two types of curricula are considered: curricula over starting states of the environment; and curricula over reward functions. Several different problems are studied, in conjuncation with two different methods for encoding the fitness landscapes of the problems. Theoretical and experimental results are presented for both static curriculum learning (where features are calucated and the curriculum is fixed prior to training) and dynamic curriculum learning (where features estimates and problem orderings are determined online).

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Zoom link: https://uqz.zoom.us/j/81221141936