module-puppetdb/README_GETTING_STARTED.md
Garrett Honeycutt fd2e9904ed (#18228) updates README for style
This commit fixes style issues in the getting started README.
2012-12-19 10:37:34 -08:00

13 KiB
Raw Blame History

puppetlabs/puppetdb - PuppetDB Management

Purpose: Install and manage the PuppetDB server and database, and configure the Puppet master to use PuppetDB Module: puppetlabs/puppetdb (http://forge.puppetlabs.com/cprice404/puppetdb) Puppet Version: 2.7+ Platforms: RHEL6, Debian6, Ubuntu 10.04

One of the new projects that we at Puppet Labs are excited about right now is PuppetDB, our new “data warehouse” for managing storage and retrieval of all platform-generated data. (If you havent checked it out yet, have a look at Nick Lewis blog post or the PuppetDB documentation.) Currently, it offers a huge performance improvement for exported and collected resources, as well as several other great features. Were even more excited about some of the not-quite-released functionality that is in the pipeline, so stay tuned for more information!

Installing and configuring PuppetDB isnt too difficult, but we knew that it could and should be even easier than it was. Thats where the new puppetlabs/puppetdb module comes in. Whether you just want to throw PuppetDB onto a test system as quickly as possible so that you can check it out, or you want finer-grained access to managing the individual settings and configuration, this module aims to let you dive in at exactly the level of involvement that you desire.

Here are some of the capabilities of the new 1.0 release of the puppetdb module; almost all of these are optional, so you are free to pick and choose which ones suit your needs:

  • Installs and manages the core PuppetDB server
  • Installs and manages the underlying database server (PostgreSQL or a simple embedded database)
  • Configures your Puppet master to use PuppetDB
  • Optional support for opening the PuppetDB port in your firewall on RedHat-based distros
  • Validates your database connection before applying PuppetDB configuration changes, to help make sure that PuppetDB doesnt end up in a broken state
  • Validates your PuppetDB connection before applying configuration changes to the Puppet master, to help make sure that your master doesnt end up in a broken state

Installing the module

Installing the PuppetDB module is a breeze using the Puppet module tool (available in Puppet 2.7.14+ and Puppet Enterprise 2.5+):

$ puppet module install puppetlabs/puppetdb
puppet module install puppetlabs/puppetdb
Preparing to install into /etc/puppet/modules ...
Downloading from http://forge.puppetlabs.com ...
Installing -- do not interrupt ...
/etc/puppet/modules
└─┬ puppetlabs-puppetdb (v0.1.1)
  ├── cprice404-inifile (v0.0.2)
  ├─┬ inkling-postgresql (v0.3.0)
  │ └── puppetlabs-stdlib (v3.0.1)
  └── puppetlabs-firewall (v0.0.4)
$

Resource Overview

Lets take a quick peek at the main classes and types defined by the module. (Well take a more in-depth look, with examples, in the following section.)

puppetdb class

This is a sort of all-in-one class for the PuppetDB server. Itll get you up and running with everything you need (including database setup and management) on the server side. The only other thing youll need to do is to configure your Puppet master to use PuppetDB... which leads us to:

puppetdb::master::config class

This class should be used on your Puppet master node. Itll verify that it can successfully communicate with your PuppetDB server, and then configure your master to use PuppetDB.

NOTE: Using this class involves allowing the module to manipulate your puppet configuration files; in particular: puppet.conf and routes.yaml. The puppet.conf changes are supplemental and should not affect any of your existing settings, but the routes.yaml file will be overwritten entirely. If you have an existing routes.yaml file, you will want to take care to use the manage_routes parameter of this class to prevent the module from managing that file, and youll need to manage it yourself.

puppetdb::server class

This is for managing the PuppetDB server independently of the underlying database that it depends on; so itll manage the PuppetDB package, service, config files, etc., but will allow you to manage the database (e.g. postgresql) however you see fit.

puppetdb::database::postgresql class

This is a class for managing a postgresql server for use by PuppetDB. It can manage the postgresql packages and service, as well as creating and managing the puppetdb database and database user accounts.

Low-level classes

There are several lower-level classes in the module (e.g., puppetdb::master::* and puppetdb::server::* which you can use to manage individual configuration files or other parts of the system. In the interest of brevity, well skip over those for now... but if you need more fine-grained control over your setup, feel free to dive into the module and have a look!)

Example Usage

Enough with the gory details, lets talk about how to actually use the thing!

When you are first getting started with PuppetDB, there are a few decision youll have to make:

  • Which database back-end should I use? (The current choices are PostgreSQL or our embedded database; well discuss this more a bit later on.)
  • Should I run the database on the same node that I run PuppetDB on?
  • Should I run PuppetDB on the same node that I run my master on?

The answers to those questions will be largely dependent on your Puppet environment. How many nodes are you managing? What kind of hardware are you running on? Is your current load approaching the limits of your hardware?

The Simple Case

Since I wont be able to answer all of those questions for you, well start off with the absolute simplest case: using our default database (PostgreSQL), and running everything (PostgreSQL, PuppetDB, Puppet master) all on the same node. This setup will be great for testing / experimental environment, and may be sufficient for many real-world deployments depending on the number of nodes youre managing. So, what would our manifest look like in this case?

node puppetmaster {
 # Configure puppetdb and its underlying database
 include puppetdb

 # Configure the puppet master to use puppetdb
 include puppetdb::master::config
}

Thats it! Obviously, you can provide some parameters for these classes if youd like more control, but that is literally all that it will take to get you up and running with the default configuration. Here are the steps that this manifest will trigger:

  • Install PostgreSQL on the node if its not already there
  • Create the PuppetDB postgres database instance and user account
  • Validate the postgres connection and, if successful, install and configure PuppetDB
  • Validate the PuppetDB connection and, if successful, modify the Puppet master config files to use PuppetDB
  • Restart the Puppet master so that it will pick up the config changes

If your logging level is set to INFO or finer, you should start seeing PuppetDB-related log messages appear in both your Puppet master log and your PuppetDB log as subsequent agent runs occur.

Note: If youd prefer to use PuppetDBs embedded database rather than PostgreSQL, have a look at the database parameter on the puppetdb class. The embedded db can be useful for testing and very small production environments, but is not recommended for production environments as it consumes a great deal of memory as your number of nodes increases.

A Distributed Setup

In many cases, youll prefer not to install PuppetDB on the same node as the Puppet master. Your environment will be easier to scale if you are able to dedicate hardware to the individual system components. You may even choose to run the PuppetDB server on a different node from the PostgreSQL database that it uses to store its data. So lets have a look at what a manifest for that scenario might look like:

# This is an example of a very basic 3-node setup for PuppetDB.

# This node is our Puppet master.
node puppet {
  # Here we configure the puppet master to use PuppetDB,
  # and tell it that the hostname is puppetdb
  class { 'puppetdb::master::config':
    puppetdb_server => 'puppetdb',
  }
}

# This node is our postgres server
node puppetdb-postgres {
  # Here we install and configure postgres and the puppetdb
  # database instance, and tell postgres that it should
  # listen for connections to the hostname puppetdb-postgres
  class { 'puppetdb::database::postgresql':
    listen_addresses => 'puppetdb-postgres',
  }
}

# This node is our main puppetdb server
node puppetdb {
  # Here we install and configure PuppetDB, and tell it where to
  # find the postgres database.
  class { 'puppetdb::server':
    database_host => 'puppetdb-postgres',
  }
}

Thats it! This should be all it takes to get a 3-node, distributed installation of PuppetDB up and running. Note that if you prefer, you could easily move two of these classes to a single node and end up with a 2-node setup instead.

Cross-node Dependencies

If youre playing along at home, you may have spotted some cross-node dependencies here and youve probably recognized that the order that these nodes check in with the puppet master will have serious implications for getting everything up and running. It would be very bad to configure the master to use the PuppetDB server before that server was up and running. Likewise, it wouldnt be great to try to start up the PuppetDB server pointing to a Postgres server that isnt actually running Postgres yet.

The module handles this problem for you by taking a sort of “eventual consistency” approach. Theres nothing that the module can do to control the order in which your nodes check in, but the module can check to verify that the services it depends on are up and running before it makes configuration changes--so thats what it does.

When your Puppet master node checks in, it will validate the connectivity to the PuppetDB server before it applies its changes to the Puppet master config files. If it cant connect to PuppetDB, then the puppet run will fail and the previous config files will be left intact. This prevents your master from getting into a broken state where all incoming Puppet runs fail because the master is configured to use a PuppetDB server that doesnt exist yet. The same strategy is used to handle the dependency between the PuppetDB server and the postgres server.

What does this all mean to you, as a user? Well, it basically means that the first time you add this stuff to your manifests, you may see a few failed Puppet runs on the affected nodes. This should be limited to 1 failed run on the PuppetDB node, and up to 2 failed runs on the Puppet master node. After that, all of the dependencies should be satisfied and your puppet runs should start to succeed again.

If you prefer, you can manually trigger puppet runs on the nodes in the correct order (Postgres, PuppetDB, Puppet master) and you should avoid any failed runs.

Configuring the module

The module supports a large number of configuration options. If youd like more control over things like:

  • whether or not to open the PuppetDB port on the firewall
  • what address the PuppetDB server should listen on
  • what version of PuppetDB to use
  • what address the PostgreSQL server should listen on
  • PostgreSQL database name, username, password, etc.
  • custom paths to various configuration files

and more, please take a peek at the individual classes. They expose a large number of parameters and should hopefully be documented fairly well. (We wont cover them here since this post has already gotten a bit long-winded, if I do say so myself, but perhaps well do a follow-up blog post in the future that goes into greater detail.)

Conclusion

Thats about it for now. We hope that this module makes it So Darn Easy to get up and running with PuppetDB that you simply cant come up with any more excuses not to go ahead and do it right now! We think youll be happy you did--not only because of its current power and features, but also because of all of the great things we have in store for it in the near future.

If you have any questions, suggestions, or feedback, please send them to Ryan or Chris! If theres a setting that youd like to be able to manage that we havent exposed yet, let us know, or better yet, file a pull request to the module project: https://github.com/puppetlabs/puppetlabs-puppetdb