Dave Cliff, Susi Ross
In a recent paper, Wilson (1994b) described a 'zeroth-level' classifier system (ZCS). ZCS employs a reinforcement learning technique comparable to Q-Learning (Watkins, 1989). This paper presents results from the first reconstruction of ZCS. Having replicated Wilson's results, we extend ZCS in a manner suggested by Wilson: the original formulation of ZCS has no memory mechanisms, but Wilson (1994b) suggested how internal 'temporary memory' registers could be added. We show results from adding one-bit and two-bit memory registers to ZCS. Our results demonstrate that ZCS can efficiently exploit memory facilities in non-Markov environments. We also show that the memoryless ZCS can converge on near-optimal stochastic solutions in non-Markov environments. Following the discussion of adding memory, we present results from trials using ZCS in Markov environments requiring increasingly long chains of actions before reward is received. Our results indicate that inaccurate over-general classifiers can intereact with the classifier-generation mechanisms to cause catastrophic breakdowns in overall system performance. Basing classifier fitness on accuracy may alleviate this problem. We conclude that the memory mechanism in its current form is unlikely to scale well for situations requiring large amounts of temporary memory. Nevertheless, the ability to find stochastic solutions when there is insufficient memory might offset this problem to some extent.
Download compressed postscript file