
Bayesian networks
A Bayesian network is a probabilistic model represented by a direct acyclic graph G = {V, E}, where the vertices are random variables Xi, and the edges determine a conditional dependence among them. In the following diagram, there's an example of simple Bayesian networks with four variables:

The variable x4 is dependent on x3, which is dependent on x1 and x2. To describe the network, we need the marginal probabilities P(x1) and P(x2) and the conditional probabilities P(x3|x1,x2) and P(x4|x3). In fact, using the chain rule, we can derive the full joint probability as:

The previous expression shows an important concept: as the graph is direct and acyclic, each variable is conditionally independent of all other variables that are not successors given its predecessors. To formalize this concept, we can define the function Predecessors(xi), which returns the set of nodes that influence xi directly, for example, Predecessors(x3) = {x1,x2} (we are using lowercase letters, but we are considering the random variable, not a sample). Using this function, it's possible to write a general expression for the full joint probability of a Bayesian network with N nodes:

The general procedure to build a Bayesian network should always start with the first causes, adding their effects one by one, until the last nodes are inserted into the graph. If this rule is not respected, the resulting graph can contain useless relations that can increase the complexity of the model. For example, if x4 is caused indirectly by both x1 and x2, therefore adding the edges x1 → x4 and x2 → x4 could seem a good modeling choice; however, we know that the final influence on x4 is determined only by the values of x3, whose probability must be conditioned on x1 and x2, hence we can remove the spurious edges. I suggest reading Introduction to Statistical Decision Theory, Pratt J., Raiffa H., Schlaifer R., The MIT Press to learn many best practices that should be employed in this procedure.