|
Chris Thornton
The paper considers the situation in which a learner's testing set contains close approximations of cases which appear in the training set. Such cases can be considered 'virtual seens' since they are approximately seen by the learner. Generalisation measures which do not take account of the frequency of virtual seens may be misleading. The paper shows that the 1-NN algorithm can be used to derive a normalising baseline for generalisation statistics. The normalisation process is demonstrated through application to Holte's [1993] study in which the generalisation performance of the 1R algorithm was tested against C4.5 on 16 commonly used datasets.