data:image/s3,"s3://crabby-images/0e444/0e444593ff2415cd804732c3080d28aeec9d888d" alt="Puppet:Mastering Infrastructure Automation"
Establishing relationships among containers
Puppet's classes bear little or no similarity to classes that you find in object-oriented programming languages such as Java or Ruby. There are no methods or attributes. There are no distinct instances of any class. You cannot create interfaces or abstract base classes.
One of the few shared characteristics is the encapsulation aspect. Just as classes from OOP, Puppet's classes hide implementation details. To get Puppet to start managing a subsystem, you just need to include the appropriate class.
Passing events between classes and defined types
By sorting all resources into classes, you make it unnecessary (for your co-workers or other collaborators) to know about each single resource. This is beneficial. You can think of the collection of classes and defined types as your interface. You would not want to read all manifests that anyone on your project ever wrote.
However, the encapsulation is inconvenient for passing resource events. Say you have some daemon that creates live statistics from your Apache logfiles. It should subscribe to Apache's configuration files so that it can restart if there are any changes (which might be of consequence to this daemon's operation). In another scenario, you might have Puppet manage some external data for a self-compiled Apache module. If Puppet updates such data, you will want to trigger a restart of the Apache service to reload everything.
Armed with the knowledge that there is a service, Service['apache2']
, defined somewhere in the apache
class, you can just go ahead and have your module data files notify that resource. It would work—Puppet does not apply any sort of protection to resources that are declared in foreign classes. However, it would pose a minor maintainability issue.
The reference to the resource is located far from the resource itself. When maintaining the manifest later, you or a coworker might wish to look at the resource when encountering the reference. In the case of Apache, it's not difficult to figure out where to look, but in other scenarios, the location of the reference target can be less obvious.
Besides, this approach will not work for the other scenario, in which your daemon needs to subscribe to configuration changes. You could blindly subscribe the central apache2.conf
file, of course. However, this would not yield the desired results if the responsible class opted to do most of the configuration work inside snippets in /etc/apache2/conf.d
.
Both scenarios can be addressed cleanly and elegantly by directing the notify
or subscribe
parameters at the whole class that is managing the entity in question:
file { '/var/lib/apache2/sample-module/data01.bin': source => '...', notify => Class['apache'], } service { 'apache-logwatch': enable => true, subscribe => Class['apache'], }
Of course, the signals are now sent (or received) indiscriminately—the file not only notifies Service['apache2']
, but also every other resource in the apache
class. This is usually acceptable, because most resources ignore events.
As for the logwatch
daemon, it might refresh itself needlessly if some resource in the apache
class needs a sync action. The odds for this occurrence depend on the implementation of the class. For ideal results, it might be sensible to relocate the configuration file resources into their own class so that the daemon can subscribe to that instead.
With your defined types, you can apply the same rules: subscribe to and notify them as required. Doing so feels quite natural, because they are declared like native resources anyway. This is how you subscribe several instances of the defined type, symlink
:
$active_countries = [ 'England', 'Ireland', 'Germany' ]
service { 'example-app':
enable => true,
subscribe => Symlink[$active_countries],
}
Granted, this very example is a bit awkward, because it requires all symlink
resource titles to be available in an array variable. In this case, it would be more natural to make the defined type instances notify the service instead:
symlink { [ 'England', 'Ireland', 'Germany' ]: notify => Service['example-app'], }
This notation passes a metaparameter to a defined type. The result is that this parameter value is applied to all resources declared inside the define.
If a defined type wraps or contains a service
or exec
type resource, it can also be desirable to notify an instance of that define to refresh the contained resource. The following example assumes that the service
type is wrapped by a defined type called protected_service
:
file { '/etc/example_app/main.conf': source => '...', notify => Protected_service['example-app'], }
Ordering containers
The notify
and subscribe
metaparameters are not the only ones that you can direct at classes and instances of defined types—the same holds true for their siblings, before
and require
. These allow you to define an order for your resources relative to classes, order instances of your defined types, and even order classes among themselves.
The latter works by virtue of the chaining operator:
include firewall include loadbalancing Class['firewall'] -> Class['loadbalancing']
The effect of this code is that all resources from the firewall
class will be synchronized before any resource from the loadbalancing
class, and failure of any resource in the former class will prevent all resources in the latter from being synchronized.
Because of these ordering semantics, it is actually quite wholesome to require a whole class. You effectively mark the resource in question as being dependent on the class. As a result, it will only be synchronized if the entire subsystem that the class models is successfully synchronized first.
Limitations
Sadly, there is a rather substantial issue with both the ordering of containers and the distribution of refresh events: both will not transcend the include
statements of further classes. Consider the following example:
class apache { include apache::service include apache::package include apache::config } file { '/etc/apache2/conf.d/passwords.conf': source => '...', require => Class['apache'], }
I often mentioned how the comprehensive apache
class models everything about the Apache server subsystem, and in the previous section, I went on to explain that directing a require
parameter at such a class will make sure that Puppet only touches the dependent resource if the subsystem has been successfully configured.
This is mostly true, but due to the limitation concerning class boundaries, it doesn't achieve the desired effect in this scenario. The dependent configuration file should actually require the Package['apache']
package, declared in class apache::package
. However, the relationship does not span multiple class inclusions, so this particular dependency will not be part of the resulting catalog at all.
Similarly, any refresh events sent to the apache
class will have no effect—they are distributed to resources declared in the class's body (of which there are none), but are not passed on to included classes. Subscribing to the class will make no sense either, because any resource events generated inside the included classes will not be forwarded by the apache
class.
The bottom line is that relationships to classes cannot be built in utter ignorance of their implementation. If in doubt, you need to make sure that the resources that are of interest are actually declared directly inside the class you are targeting.
There is a bright side to this as well. A more correct implementation of the Apache configuration file from the example explained would depend on the package, but would also synchronize itself before the service, and perhaps even notify it (so that Apache restarts if necessary). When all resources are part of the apache
class and you want to adhere to the pattern of interacting with the container only, it would lead to the following declaration:
file { '/etc/apache2/conf.d/passwords.conf': source => '...', require => Class['apache'], notify => Class['apache'], }
This forms an instant dependency circle: the file
resource requires all parts of the apache
class to be synchronized before it gets processed, but to notify them, they must all be put after the file
resource in the order graph. This cannot work. With the knowledge of the inner structure of the apache
class, the user can pick metaparameter values that actually work:
file { '/etc/apache2/conf.d/passwords.conf': source => '...', require => Class['apache::package'], notify => Class['apache::service'], }
For the curious the preceding code shows what the inner classes look like, roughly.
The other good news is that invoking defined types does not pose the same kind of issue that an include
statement of a class does. Events are passed to resources inside defined types just fine, transcending an arbitrary number of stacked invocations. Ordering also works just as expected. Let's keep the example brief:
class apache { virtual_host { 'example.net': ... } ... }
This apache
class also creates a virtual host using the defined type, virtual_host
. A resource that requires this class will implicitly require all resources from within this virtual_host
instance. A subscriber to the class will receive events from those resources, and events directed at the class will reach the resources of this virtual_host
.
Tip
There is actually a good reason to make the include
statements behave differently in this regard. As classes can be included very generously (thanks to their singleton aspect), it is common for classes to build a vast network of includes. By adding a single include
statement to a manifest, you might unknowingly pull hundreds of classes into this manifest.
Assume, for a moment, that relationships and events transcend this whole network. All manners of unintended effects will be the consequence. Dependency circles will be nearly inevitable. The whole construct will become utterly unmanageable. The cost of such relationships will also grow exponentially. Refer to the next section.
The performance implications of container relationships
There is another aspect that you should keep in mind whenever you are referencing a container type to build a relationship to it. The Puppet agent will have to build a dependency graph from this. This graph contains all resources as nodes and all relationships as edges. Classes and defined types get expanded to all their declared resources. All relationships to the container are expanded to relationships to each resource.
This is mostly harmless, if the other end of the relationship is a native resource. A file that requires a class with five declared resources leads to five dependencies. That does not hurt. It gets more interesting if the same class is required by an instance of a defined type that comprises three resources. Each of these builds a relationship to each of the class's resources, so you end up with 15 edges in the graph.
It gets even more expensive when a container invokes complex defined types, perhaps even recursively.
A more complex graph means more work for the Puppet agent, and its runs will take longer. This is especially annoying when running agents interactively during the debugging or development of your manifest. To avoid the unnecessary effort, consider your relationship declarations carefully, and use them only when they are really appropriate.
Mitigating the limitations
The architects of the Puppet language have devised two alternative approaches to solve the ordering issues. We will consider both, because you might encounter them in existing manifests. In new setups, you should always choose the latter variant.
The anchor
pattern is the classic workaround for the problem with ordering and signaling in the context of recursive class include
statements. It can be illustrated by the following example class:
class example_app { anchor { 'example_app::begin': notify => Class['example_app_config'], } include example_app_config anchor { 'example_app::end': require => Class['example_app_config'], } }
Consider a resource that is placed before => Class['example_app']
. It ends up in the chain before each anchor
, and therefore, also before any resource in example_app_config
, despite the include
limitation. This is because the Anchor['example_app::begin']
pseudo-resource notifies the included class and is therefore ordered before all of its resources. A similar effect works for objects that require the class, by virtue of the example::end
anchor.
The anchor
resource type was created for this express purpose. It is not part of the Puppet core, but has been made available through the stdlib
module instead (the next chapter will familiarize you with modules). Since it also forwards refresh events, it is even possible to notify and subscribe this anchored class, and events will propagate into and out of the included example_app_config
class.
The stdlib
module is available in the Puppet Forge, but more about this in the next chapter. There is a descriptive document for the anchor
pattern to be found online as well, in Puppet Labs' Redmine issue tracker (now obsolete) at http://projects.puppetlabs.com/projects/puppet/wiki/Anchor_Pattern. It is somewhat dated, seeing as the anchor pattern has been supplanted as well by Puppet's ability to contain a class in a container.
To make composite classes directly work around the limitations of the include
statement, you can take advantage of the contain
function found in Puppet version 3.4.x or newer.
If the earlier apache
example had been written like the following one, there would have been no issues concerning ordering and refresh events:
class apache { contain apache::service contain apache::package contain apache::config }
The official documentation describes the behavior as follows:
"A contained class will not be applied before the containing class is begun, and will be finished before the containing class is finished."
This might read like we're now discussing the panacea for the presented class ordering issues here. Should you just be using contain
in place of include
from here on out and never worry about class ordering again? Of course not, this would introduce lots of unnecessary ordering constraints and lead you into unfixable dependency circles very quickly. Do contain classes, but make sure that it makes sense. The contained class should really form a vital part of what the containing class is modeling.