About
One issue in data classification problems is to find an optimal subset of instances to train a classifier. Training sets that represent well the characteristics of each class have better chances to build a successful predictor. Instance selection techniques remove examples from the data set so that classifiers are built faster and, in some cases, with better accuracy.
SeleSup, which is an acronym for selection by suppression,
is a simple and fast algorithm (O(n^2)) for reducing the training
cardinality of data sets through the elimination of irrelevant instances by
mimicking the self-regulatory and suppression mechanism found in the immune
system. According to self-regulation mechanisms, those cells
unable to neutralize danger tend to disappear from the organism. Therefore, by
analogy, data not relevant to the learning of a classifier are eliminated from
the training process.
Running SeleSup
Prerequisites
In order to compile SeleSup you must have installed the following tools:
Compiling
Building using an out-of-source approach is recommended. To do so, within the root selesup directory:
cd build cmake .. make
This should leave in the build directory an executable file called selesup.
Usage
Usage: selesup [OPTION] <dataset>
General options:
-f, --wbc-fraction
the fraction of instances used as WBCs [default = 0.9]
-o file, --output-file file
save selected instances in 'file' [default = stdout]
-s <n>, --seed <n>
seed for pseudo-random number generation [default = random]
-r <n>, --random-sampling <n>
do random sampling of size 'n' instead of SeleSup
--shuffle
shuffle the data set on-the-fly before running SeleSup
-v, --verbose
turn on the verbose mode
--version
print SeleSup version and exit
-h, --help
print this help and exit
Example
- Selecting instances from the Iris data set:
./selesup -v --shuffle iris.dsff -o iris.dsff.ss
This command will create a reduced data set,iris.dsff.ss, by selecting instances fromiris.dsff(assuming this file is in the current directory). The option--shuffletells SeleSup to shuffle the contents of the input data set before running the algorithm itself.
License
SeleSup is licensed under the GNU General Public License (GPL) Version 3 (or later), June 2007

