propeller/doc/downsample.md
Ryan Boldi e3ef43e95a overhaul to down-sampling. Separated selection from the down-sampling type.
Also, added to the docs to help startup faster
2022-12-12 11:53:14 -05:00

3.0 KiB

Downsampling the Training Data

Downsampling is a very simple way to improve the efficiency of your evolutionary runs. It might allow for deeper evolutionary searches and a greater success rate.

Using Downsampled selection with propeller is easy:

Set the :parent-selection argument to whichever selection strategy you would like, and set the :downsample? argument to true as follows:

lein run -m propeller.problems.simple-regression :parent-selection :lexicase :downsample? true

The number of evaluations is held constant when comparing to a full training set run, so set the :max-generations to a number of generations that you would have gone to using a full sample.

Downsample Functions

In this repository, you have access to 3 different downsampling functions. These are the methods used to take a down-sample from the entire training set.

To use them, add the argument :ds-function followed by which function you would like to us

The list is

  • :case-maxmin - This is the method used for informed down-sampled lexicase selection
  • :case-maxmin-auto - This method automatically determines the downsample size
  • :case-rand- Random Sampling

Using :case-maxmin:

In order to use regular informed down-sampled selection, you must specify a few things:

  • :downsample-rate- This is the r parameter: what proportion of the full sample should be in the down-sample \in [0,1]
  • :ds-parent-rate - This is the \rho parameter: what proportion of parents are used to evaluate case distances \in [0,1]
  • :ds-parent-gens - This is the k parameter: How many generations in between parent evaluations for distances \in \{1,2,3, \dots\}

Using :case-maxmin-auto:

In order to use automatic informed down-sampled selection, you must specify a few things:

  • :case-delta - This is the \Delta parameter: How close can the farthest case be from its closest case before we stop adding to the down-sample
  • :ids-type - Either :elite or :solved - Specifies whether we are using elite/not-elite or solved/not-solved as our binary-fication of case solve vectors.
  • :ds-parent-rate - This is the \rho parameter: what proportion of parents are used to evaluate case distances \in [0,1]
  • :ds-parent-gens - This is the k parameter: How many generations in between parent evaluations for distances \in \{1,2,3, \dots\}

Using :case-rand:

In order to use regular randomly down-sampled selection, you must specify a few things:

  • :downsample-rate- This is the r parameter: what proportion of the full sample should be in the down-sample \in [0,1]

Here's an example of running informed downsampled lexicase selection with r=0.1, \rho=0.01 and k=100 on the simple classification problem:

lein run -m propeller.problems.simple-classification :parent-selection :lexicase :downsample? true :ds-function :case-maxmin :downsample-rate 0.1 :max-generations 300 :ds-parent-rate 0.01 :ds-parent-gens 100