brozonoyer/Parameter-Clusterization
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Extremal grouping program works in 2 steps:
STEP1. Build membership of features to groups and save information necessary for calculation of extremal factors.
STEP2. Calculate value of extremal group factors for new data not used in the first step.
This 2 steps procedure is required because we want to calculate extremal factors for new datasets.
It is not sufficient to calculate membership and value of factors on one dataset only.
INTERFACE:
Input data for STEP1:
1. Input training CSV-file
2. Number of extremal groups
3. Number of eigenvalues to use in calculation of functional's value (c.f. Braverman)
Output data for STEP1:
1. Memberships to groups
2. Mean and StDev values of features in input CSV-file
3. Weights of features in groups
4. Pickled MODEL storing the factors for the groupings
Arguments for program:
algorithm.py --step
--input_filename
--output_filename
--model_pickle_file
--num_groups
--eigenvalues
Example of possible command line to run the program:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
python algorithm.py 1 Input/m255_train.csv output model 3 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where output file output will contain the following information:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ExtremalGroupsInfo
Built on features
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67
nParams,67
nGroups,3
inGroup, group number starts from 1
2,3,3,1,1,1,2,2,3,1,1,1,1,2,2,2,3,3,3,2,2,2,2,3,3,3,2,2,2,2,3,3,3,1,1,1,1,3,1,1,1,1,1,1,1,1,2,3,3,3,3,2,2,2,3,3,3,3,2,2,2,3,3,1,1,1,2
Weights
-0.186458,-0.0816229,-0.064139,0.16274,0.180544,0.163993,-0.128864,-0.086013,-0.0308409,0.0306928,0.134885,0.104327,0.106939,-0.0859965,0.0646137,0.0938427,0.0742317,0.122684,0.121096,0.115043,0.167587,0.20639,0.161277,0.137589,0.226445,0.191443,0.147012,0.17435,0.155488,0.0732077,0.150944,0.20764,0.067144,-0.157152,-0.118166,-0.103356,-0.106509,0.110805,-0.206371,-0.232019,-0.179604,-0.167401,-0.0723795,-0.0851955,-0.0487207,-0.130955,-0.0812782,-0.117166,-0.105671,-0.139035,-0.0974891,-0.146558,-0.185041,-0.152154,-0.152799,-0.13035,-0.137212,-0.0572567,-0.180454,-0.194798,-0.129724,-0.130205,-0.123183,0.132646,0.123184,0.122309,-0.0989264
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Input data for STEP2:
1. Input test CSV-file
2. The (pickled) MODEL from training that stores the factors for extremal groupings
NOTE: the "num_groups" and "eigenvalues" parameters are not provided for step 2
Output data:
1. Value of factors for test CSV-file
Example of possible command line for program:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
algorithm.py 2 test.csv model output.csv
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where output will be the output.csv file containing the factors for the newly grouped parameters