Parsimony and Performance in Rule-Based Evolutionary Reinforcement Learning

The School of EECS is hosting the following thesis review seminar:

Parsimony and Performance in Rule-Based Evolutionary Reinforcement Learning

Speaker: Mr Jordan Bishop
Host: A/Prof Marcus Gallagher

Abstract: Reinforcement learning (RL) has recently seen a surge in interest, due to human-level performance achieved by various deep neural network models. However, such models can be highly complex, often containing millions of parameters, earning them a reputation as “black boxes” that are not easily explainable or interpretable. In tandem to the successes of these models, within the broader machine learning space increased attention has been devoted to studying eXplainable Artificial Intelligence (XAI) models that can be understood by human users. In contrast to the complex, non-separable models constructed by deep neural networks, rule-based evolutionary reinforcement learning (ERL) systems have the innate capacity to address the concerns of XAI within RL; this being due to their usage of symbolic, separable rulesets that form behavioural policies. Learning Classifier Systems (LCSs) are the predominant family of rule-based ERL systems, with Genetic Fuzzy Systems (GFSs) being a related sub-family. Both families employ two main styles of population-based evolutionary learning to construct rulesets, these being Michigan (rule-per-individual) and Pittsburgh (ruleset-per-individual).

The overarching theme of this thesis is to measure the relationship between ruleset parsimony and performance for both Michigan and Pittsburgh systems in RL domains. Broadly, the thesis contributions are motivated by a lack of prior attention given to Pittsburgh style systems in RL domains, coupled with the additional scarcity of investigations into parsimony methods for both styles of systems in such domains. The parsimony of a ruleset is related to its complexity, which is in turn linked to the concept of Occam’s Razor employed in wider machine learning. Unlike the more vague XAI aspects of interpretability/explainability, parsimony is a concretely definable and therefore measurable quantity. The performance achievable by rulesets is inherently related to their parsimony, since more complex rulesets have the capacity to achieve higher performance.

The thesis work is divided into three avenues of inquiry. Firstly, post-hoc compaction of Michigan rulesets in discrete-state RL domains is considered; in order to increase parsimony through discarding low quality rules after training. This is motivated by the lack of work applying such compaction algorithms in RL domains. Results indicate that, like in supervised learning domains, compaction is effective at substantially reducing ruleset sizes while having minimal effect on performance. Following this, the second avenue of inquiry considers how a novel Pittsburgh GFS can be applied to a continuous-state RL domain, addressing the parsimony vs. performance relationship using multiobjective optimisation. This is motivated by both: the general lack of work considering Pittsburgh systems in RL domains, and the suitability of Pittsburgh (rather than Michigan) systems in performing multiobjective optimisation. Results show that this Pittsburgh GFS is capable of identifying maximally parsimonious rulesets (in terms of number of rules) for differing levels of achievable performance. Lastly, the third avenue of inquiry compares the parsimony and performance achieved by Michigan and Pittsburgh systems across discrete-state RL domains of varying difficulty, in terms of both action chain length and environmental noise. These comparisons are motivated by the absence of previous comparisons between the two types of systems in RL. As part of this, Monte Carlo learning mechanisms from a prior Pittsburgh LCS are re-investigated within a novel strength-based Pittsburgh LCS. Results show that Pittsburgh systems can outperform Michigan systems in more difficult domains, while also exhibiting higher levels of parsimony.

Overall, results demonstrate the capability of Pittsburgh rule-based ERL systems in supporting the concerns of XAI within RL domains; evidenced by their ability to construct parsimonious and high-performing rulesets. This signals the value of continuing the investigation of Pittsburgh systems in future work to better understand their full potential: both in terms of conducting more comparisons to Michigan systems and generalising results to other domains.

Speaker Biography: Jordan Bishop is a final year PhD candidate in the school of EECS, primarily supervised by A/Prof. Marcus Gallagher (UQ) alongside associate Prof. Will Browne (QUT). He received his BEng (Hons. I) from UQ in 2016. His main research interests are in rule-based systems, evolutionary computation, and explainable AI.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Online via Zoom https://uqz.zoom.us/j/3887143903