What’s in this notebook?
This is a workflow I use often in data exploration. TSNE gives a good representation of high-dimensional data, and Bokeh is helpful in creating a simple interactive plots with contextual info given by colors and tooltips.
This workflow has been extremely helpful for:
- text analytics/NLP tasks if text data is passed through a
TfidfVectorizeror similar from
doc2vecvectors by passing them to TSNE
- getting an idea of separability in doing prediction / classification by passing the outcome variable to bokeh
This example uses the Australian atheletes data set, which contains 11 numeric variables. This workflow is even more helpful on larger datsets with higher dimensionality.