The Unsupervised Learning Workshop
上QQ阅读APP看书,第一时间看更新

The Organization of the Hierarchy

Both the natural and human-made world contain many examples of organizing systems into hierarchies and why, for the most part, it makes a lot of sense. A common representation that is developed from these hierarchies can be seen in tree-based data structures. Imagine that you have a parent node with any number of child nodes that can subsequently be parent nodes themselves. By organizing information into a tree structure, you can build an information-dense diagram that clearly shows how things are related to their peers and their larger abstract concepts.

An example from the natural world to help illustrate this concept can be seen in how we view the hierarchy of animals, which goes from parent classes to inpidual species:

Figure 2.2: The relationships of animal species in a hierarchical tree structure

In the preceding diagram, you can see an example of how relational information between varieties of animals can be easily mapped out in a way that both saves space and still transmits a large amount of information. This example can be seen as both a tree of its own (showing how cats and dogs are different, but both are domesticated animals) and as a potential piece of a larger tree that shows a breakdown of domesticated versus non-domesticated animals.

As a business-facing example, let's go back to the concept of a web store selling products. If you sold a large variety of products, then you would probably want to create a hierarchical system of navigation for your customers. By preventing all of the information in your product catalog from being presented at once, customers will only be exposed to the path down the tree that matches their interests. An example of the hierarchical system of navigation can be seen in the following diagram:

Figure 2.3: Product categories in a hierarchical tree structure

Clearly, the benefits of a hierarchical system of navigation cannot be overstated in terms of improving your customer experience. By organizing information into a hierarchical structure, you can build an intuitive structure into your data that demonstrates explicit nested relationships. If this sounds like another approach to finding clusters in your data, then you're definitely on the right track. Through the use of similar distance metrics, such as the Euclidean distance from k-means, we can develop a tree that shows the many cuts of data that allow a user to subjectively create clusters at their discretion.