Go Machine Learning Projects
上QQ阅读APP看书,第一时间看更新

The conditional expectation functions

Instead, let's do what we originally set out to do: explore the CEFs of the variables. Fortunately, we already have the necessary data structures (in other words, the index), so writing the function to find the CEF is relatively easy.

The following is the code block:

func CEF(Ys []float64, col int, index []map[string][]int) map[string]float64 {
retVal := make(map[string]float64)
for k, v := range index[col] {
var mean float64
for _, i := range v {
mean += Ys[i]
}
mean /= float64(len(v))
retVal[k]=mean
}
return retVal
}

This function finds the conditionally expected house price when a variable is held fixed. We can do an exploration of all the variables, but for the purpose of this chapter, I shall only share the exploration of one –the yearBuilt variable—as an example.

Now, YearBuilt is an interesting variable to dive deep into. It's a categorical variable (1950.5 makes no sense), but it's totally orderable as well (1,945 is smaller than 1,950). And there are many values of YearBuilt. So, instead of printing it out, we shall plot it out with the following function:

// plotCEF plots the CEF. This is a simple plot with only the CEF. 
// More advanced plots can be also drawn to expose more nuance in understanding the data.
func plotCEF(m map[string]float64) (*plot.Plot, error) {
ordered := make([]string, 0, len(m))
for k := range m {
ordered = append(ordered, k)
}
sort.Strings(ordered)

p, err := plot.New()
if err != nil {
return nil, err
}

points := make(plotter.XYs, len(ordered))
for i, val := range ordered {
// if val can be converted into a float, we'll use it
// otherwise, we'll stick with using the index
points[i].X = float64(i)
if x, err := strconv.ParseFloat(val, 64); err == nil {
points[i].X = x
}

points[i].Y = m[val]
}
if err := plotutil.AddLinePoints(p, "CEF", points); err != nil {
return nil, err
}
return p, nil
}

Our ever-growing main function now has this appended to it:

ofInterest := 19 // variable of interest is in column 19
cef := CEF(YsBack, ofInterest, indices)
plt, err := plotCEF(cef)
mHandleErr(err)
plt.Title.Text = fmt.Sprintf("CEF for %v", hdr[ofInterest])
plt.X.Label.Text = hdr[ofInterest]
plt.Y.Label.Text = "Conditionally Expected House Price"
mHandleErr(plt.Save(25*vg.Centimeter, 25*vg.Centimeter, "CEF.png"))

Running the program yields the following chart:

conditional  expectation  functions for Yearbuilt

Upon inspecting the chart, I must confess that I was a little surprised. I'm not particularly familiar with real estate, but my initial instincts were that older houses would cost more—houses, in my mind, age like fine wine; the older the house, the more expensive it would be. Clearly this is not the case. Oh well, live and learn.

The CEF exploration should be done for as many variables as possible. I am merely eliding for the sake of brevity in this book.