scikit-learn Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Creating an unbalanced classification dataset

Classification datasets are also very simple to create. It's simple to create a base classification set, but the basic case is rarely experienced in practice—most users don't convert, most transactions aren't fraudulent, and so on.

  1. Therefore, it's useful to explore classification on unbalanced datasets:
classification_set = d.make_classification(weights=[0.1])
np.bincount(classification_set[1])

array([10, 90], dtype=int64)