November 2019 Archives

By Any k-Means Necessary

You want to get to know your data, questions like, can they be broken down into a simple set of classes. You don't know what these classes might be, so your task is clustering and you reach for one of the oldest clustering algorithms around k-means.

k-means is popular because it's simple to understand, converges fast, works in higher dimensions and gives you an answer. It's also usually the wrong choice unless you've already got nicely clustered data just waiting for you to guess k, the most appropriate number of clusters to answer your question. But it is a decent warm up exercise in becoming friends with your data set.

About Enkidu

user-pic I am a Freelance Scientist** and Perl is my Igor.