Optimize and Samples

+ This page will allow you try various settings using only the training data. The samples below ...

- This page will allow you try various settings using only the training data. The samples below will allow you to do some processing right now if you just want to see it work. Some processing times can can take upwards of 3 minutes if the file is large enough.

After the process is finished, it will score the final results and tell you how well the training data worked using the settings you selected. The final result is something called root mean squared error (RMSE). This is a measure of, generally, how off the predictions are. You will want the RMSE number as small as possible. To do this you can either adjust the settings and see the results yourself, or click the "Find Best" button and let the web page try all the settings for you. Note: When "Find Best" is running, the results are only displayed to the screen and will not be downloaded.

There are a few things you that can be adjusted to hone the results: Depth and Gradient

  • The program is building a series of yes/no decisions to decide how to split up the data. You can control how many times in a row it does this. This is depth.
  • Once the maximum depth is reached the group of rows in that have been issolated (from the yes/no decisions) each have a training result on them. That is averaged together. This average is a best guess for any rows in that group. We don't want to over comit to that answer so we take a portion of it. The percentage we take is the gradient.
The above steps are repated till no improvement is found or until you hit the cap I have in place. The ideal settings can vary a lot between data sets, to find the one best for your data you simply give a few settings a try.

File Type
Advanced Options
depth: gradient:
Train Data:
Display results to screen
Sorted difference of the first 200 results
Actual results

Samples

You don't have data? No problem! here are some real world samples you can try for fun. Click on the predicting column to select that file.

PredictingSourceSizeSource LinkFile Link
Morbidity
Try it!
Epizootic Hemorrhagic Disease12.9 Kilobtesheredisease_alt.csv
Age
Try it!
Census Data5.65 Megabytesheress16pil_alt.csv
Play Ends in Touchdown
Try it!
Football Plays2.83 Megabytesherepbp-2017_alt.csv
Election Results
Try it!
Global Political Results9.58 Megabyteshereclea_20170530_alt.csv
Favored Type For Resposne*
Try it!
postal and web-based surveys24 KilobtesherefavoredResponseType.csv
Max Run Distance
Try it!
Demo Only415 bytes/Home/HowTo/train1.csv
Max Run Distance With Categories
Try it!
Demo Only437 bytes/Home/HowTo/train2.csv
* - I might have interpreted what I was reading on their spreedsheet incorrectly but then these are just examples.