The sample app doesn't do much (by design), but it illustrates a few features and hopefully will give you an idea of what is possible. In particular, the sample does not provide a means of loading a new set of data nor does it allow you to export data, models, or images of the charts. All of these would be standard parts of any app we create.
The data we chose to use is the famous Fisher's Iris data set, where three species of iris are measured on four dimensions: sepal length, sepal width, petal length, and - you guessed it - petal width. Click here if you want more details (like "what's a sepal?").
The first tab shows a table of the data. The table can be sorted, searched, and paged through (at the bottom). If you play with it a little, you will see that it has 50 examples of each species along with each example's four dimensions. This table display would be useful for making sure your data are in the format you expected and that you actually have all the data you expect.
The second tab shows a couple basic plots of the data. First a scatterplot of two dimensions, colored by species. This plot doesn't happen to tell us much besides the fact that these two dimensions separate or identify one species (setosa), but the other two need different or additional information to cleanly split them.
An alternative chart is shown below it - a boxplot - that clearly shows that petal width can pretty much guarantee identification of one species (again setosa), while there is a bit of overlap, not a lot, in petal widths for the other two.
Normally, a feature that allows you to save images of these plots would be included in the app. These are great charts to communicate differences between segments that are inevitably a little messy.
The third tab shows an interactive 3-D plot of the iris data. A third dimension, petal length, was added. You can drag your mouse as you click it to rotate the plot. Zooming is done by the scroll wheel. After playing with it for a while, you see that there is no obvious separation of the remaining two species.
The fourth tab shows a decision tree that is created when the app is opened. A decision tree provides simple "rules" for identifying/predicting species. It's accurate and is usually the best way to identify segments. It can also be a valuable tool for deeper understanding of segments. There are other goodies that decision trees provide, but we'll cover those in a future post.
In this case, we see that we only need to take two measurements to identify the three species with an accuracy of over 95%. The within-species purity of the predictions are: 100%(setosa), 91%(versicolor), and 98%(virginica). This fits with what we found in the previous tabs, but now we know which dimensions matter and exactly where the boundaries are.
The final tab shows the importance of each measurement in identifying species. Not surprisingly, the petal dimensions are by far the most important inputs.
The importance measures are determined by a random forest, which is a collection of many hundreds of decision trees. Again, there will be more information on these in future posts. Random forest is a remarkable tool used in machine learning. It doesn't need normalized data, missing values are not a problem, and it can derive importance measures with data that have interactions or multicollinearity. These properties make it very useful for analyzing survey data, especially for CSat, that usually run afoul of many assumptions needed for traditional approaches. Random forest accuracy is excellent and, like decision trees, can be used to identify/predict categories (like best customer prospects) or continuous measures (like weight).
So that's it for now. If you haven't yet played with the app, please do. If you run into any issues or if you have any questions, please contact us directly at the email above or leave a comment.
Brett Matheson is the founder and principal of EMS Analytics. He has experience in a wide array of multivariate techniques and their application to real-world business issues, including: choice modeling, segmentation, forecasting, and simulation.