Hands-On Python Deep Learning for the Web
上QQ阅读APP看书,第一时间看更新

Unsupervised learning

Unsupervised learning presents itself in scenarios where the training samples do not carry with them output feature(s). You could wonder then, what are we supposed to learn or predict in such situations? The answer is similarity. In more elaborate terms, when we have a dataset for unsupervised learning, we're usually trying to learn the similarity between the training samples and then to assign classes or labels to them.

Consider a crowd of people standing in a large field. All of them have features such as age, gender, marital status, salary range, and education level. Now, we wish to group them based on their similarities. We decide to form three groups and see that they arrange themselves in a manner of gendera group of females, a group of males, and a group of people who identify with other genders. We again ask them to form subgroups within those groups and see what people make groups based on their age rangeschildren, teenagers, adults, and senior citizens. This gives us a total of 12 such subgroups. We could make further smaller subgroups based on the similarity any two individuals exhibit. Also, the manner of grouping discussed in the preceding example is just one among several manners of forming groups. Now, say we have 10 new members joining the crowd. Since we already have our groups defined, we can easily sort these new members into those groups. Hence, we can successfully apply group labels to them.

The preceding example demonstrates just one form of unsupervised learning, which can be divided into two types:

  • Clustering: This is to form groups of training samples based on the similarity of their features.
  • Association: This is to find abstract associations or rules exhibited between features or training samples. For example, on analyzing a shop's sales logs, it was found that customers buy beer mostly after 7 p.m.

K-means clustering, DBSCAN, and the Apriori algorithm are some of the best-known algorithms used for unsupervised learning.