How To

So the way this works is you provide a set of data with samples of known data. Then you provide a similar file with samples of unknown results and it fills in the missing results. The short-short version is column 1 is the label for the row, column 2 is the value being predicted and column 3 and everything after is known data about that row.

What about categorical columns? Like house type 'Ranch','Villa','Igloo' ... or Baseball league 'National','American' ... etc. The program assumes all columns of are numerical, if you want to specify that a column should be handled like the values are categorical in nature (even if they ARE numbers) prefix the label with 'cat-'. If you do this any unique value will be handled like it is a category. It is case sensitive so "AA" does not match "aa".

File Format

Here are the specifics spelled out a little more succinctly.

First some samples without a category column.

Example trian file: Or Download it
Runner's Name,Distance until exhaustion,Sex,Height,weight,Age,Months training,Marathons run,Average pace
Bob,14,0,6.05,210,40,36,2,6
Sally,27,1,5.5,,27,14,8,6.5
Joe,21,0,5.75,160,33,8,0,7
Gus,10,0,,190,22,4,0,5.5
Tina,14,1,5.8,125,30,60,1,6.1
Kim,3,1,5.5,130,37,1,0,5.25
Kelly,50,1,5.6,120,45,,12,6.2
James,20,0,5.8,200,42,60,3,6.3
Bob,15,0,5.8,150,70,240,3,5.5
Jerry,50,0,5.9,170,28,24,3,7.1
Example test file: Or Download it
Runner's Name,Distance until exhaustion,Sex,Height,weight,Age,Months training,Marathons run,Average pace
Runner 1,,0,6.1,215,,6,0,6
Runner 2,,1,5.5,105,27,14,1,6.5
Runner 3,,0,5.25,135,28,18,2,7.2
Runner 4,,,5.5,155,,17,1,
Runner 5,,1,5,105,22,,0,8
Runner 6,,,,,30,9,1,7.1
Runner 7,,1,5.2,,,,,4
Runner 8,,0,6,240,48,6,0,5

Next the same data but with 2 category column.

example trian file: Or Download it
Runner's Name,Distance until exhaustion,cat-Sex,Height,weight,Age,Months training,Marathons run,Average pace,cat-class
Bob,14,0,6.05,210,40,36,2,6,B
Sally,27,1,5.5,,27,14,8,6.5,A
Joe,21,0,5.75,160,33,8,0,7,B
Gus,10,0,,190,22,4,0,5.5,B
Tina,14,1,5.8,125,30,60,1,6.1,B
Kim,3,1,5.5,130,37,1,0,5.25,B
Kelly,50,1,5.6,120,45,,12,6.2,A
James,20,0,5.8,200,42,60,3,6.3,A
Bob,15,0,5.8,150,70,240,3,5.5,B
Jerry,50,0,5.9,170,28,24,3,7.1,
example test file: Or Download it
Runner's Name,Distance until exhaustion,cat-Sex,Height,weight,Age,Months training,Marathons run,Average pace,cat-class
Runner 1,,0,6.1,215,,6,0,6,A
Runner 2,,1,5.5,105,27,14,1,6.5,A
Runner 3,,0,5.25,135,28,18,2,7.2,
Runner 4,,,5.5,155,,17,1,A
Runner 5,,1,5,105,22,,0,8,B
Runner 6,,,,,30,9,1,7.1,B,
Runner 7,,1,5.2,,,,,4A
Runner 8,,0,6,240,48,6,0,5,B
Questions: js.project.106@gmail.com