CFEngine 3 Policy Update or How cf_promises_validated Works

Over the past sev­eral months 2 years (since 3.1.2 release, wow time flys when your hav­ing fun. I checked and 3.1.2 was released on Dec 9th 2010.) I have seen a few ques­tions regard­ing how the default CFEngine pol­icy update works, and more specif­i­cally how cf_promises_validated plays into the update process. This is my stab at describ­ing the his­tory and behav­iour. I wel­come any corrections.

The default failsafe.cf update pol­icy is sim­ple in nature. (We could prob­a­bly debate what is sim­ple or com­plex, but I am com­fort­able with the label in this case.) Agents copy pol­icy from /var/cfengine/masterfiles on the pol­icy hub, to /var/cfengine/inputs. This is the same for all agents, even the agent that runs on the pol­icy hub, the only dif­fer­ence is that since the files are already local on the pol­icy hub they don’t have to go over the net­work, but they are still copied from the same source, to the same destination.

The 3.1.2 release was an effi­ciency related release. One of the enhance­ments was the intro­duc­tion of cf_promises_validated. Eystein wrote a great extended change log on his blog cov­er­ing the release includ­ing a sec­tion on cf_promises_validated which is where I first learned of the fea­ture and how to use it. Again, the cf_promises_validated mech­a­nism is sim­ple in nature. From his post “this file (/var/cfengine/masterfiles) is cre­ated by cf-agent or any other CFEngine com­po­nent after it has suc­cess­fully ver­i­fied the pol­icy with cf-promises.” What I think is miss­ing from this descrip­tion is that /var/cfengine/masterfiles is created/updated when pol­icy has been ver­i­fied after the pol­icy has changed (so it’s not sup­posed to update this file every exe­cu­tion, there seems to be a bug with this but that is not expected behav­iour). I do not know what con­sti­tutes change but I sus­pect it’s some vari­a­tion of a trip­wire pol­icy sim­i­lar to the fol­low­ing. Remem­ber this is a sim­ple mech­a­nism and is the same for any agent, pol­icy hub or not.

files:
  any::
    "/var/cfengine/masterfiles"
      changes      => detect_all_change,
      depth_search => recurse("inf"),
      classes      => if_repaired("cf_promises_validated");

  cf_promises_validated::
    "/var/cfengine/masterfiles/cf_promises_validated"
      create => "true",
      touch  => "true";

The dif­fer­ence between a pol­icy hub (am_policy_hub) and a non pol­icy hub as I under­stand it is deter­mined by com­par­ing the con­tents of /var/cfengine/policy_server.dat to the ips/hostnames asso­ci­ated with the inter­faces on the sys­tem. If the pol­icy server found in policy_server.dat file resolves to an ip on the cur­rent sys­tem, it raises the am_policy_hub class. This am_policy_hub class is used in the default failsafe.cf update pol­icy to deter­mine when to copy files from /var/cfengine/masterfiles on the pol­icy hub to /var/cfengine/inputs (locally to the exe­cut­ing agent).

Ini­tially cf_promises_validated was an empty file, and mtime was used to deter­mine if the file was newer. This was prob­lem­atic for hosts that had time skews and a time stamp was intro­duced in 3.3.0 so that digest could be used to deter­mine dif­fer­ence more accu­rately. The fact that a time value is now stored in the file is only rel­e­vant to a human read­ing the file.

Spend a few min­utes read­ing this snip­pet from the default update policy.

01 files:
02  
03  !am_policy_hub::  # policy hub should not alter inputs/ uneccessary
04
05   "$(inputs_dir)/cf_promises_validated"
06        comment => "Check whether a validation stamp is available for a new policy update to reduce the distributed load",
07         handle => "update_files_check_valid_update",
08      copy_from => u_rcp("$(master_location)/cf_promises_validated","$(sys.policy_hub)"),
09         action => u_immediate,
10        classes => u_if_repaired("validated_updates_ready");
11 
12   "$(modules_dir)"
13          comment => "Always update modules files on client side",
14           handle => "update_files_update_modules",
15        copy_from => u_rcp("$(modules_dir)","$(sys.policy_hub)"),
16     depth_search => u_recurse("inf"),
17            perms => u_m("755"),
18           action => u_immediate;
19
20  am_policy_hub|validated_updates_ready::  # policy hub should always put masterfiles in inputs in order to check new policy
21
22   "$(inputs_dir)"
23           comment => "Copy policy updates from master source on policy server if a new validation was acquired",
24            handle => "update_files_inputs_dir",
25         copy_from => u_rcp("$(master_location)","$(sys.policy_hub)"),
26      depth_search => u_recurse("inf"),
27      file_select  => u_input_files,
28            action => u_immediate,
29           classes => u_if_repaired("update_report");

The first thing that hap­pens is for non pol­icy hubs (line 3 starts the con­text class restric­tion, and line 5 begins the promiser). /var/cfengine/inputs/cf_promises_validated is checked against /var/cfengine/masterfiles/cf_promises_validated on the pol­icy hub. If the file is dif­fer­ent it is copied down and the validated_updates_ready class is defined. Skip­ping down to line 20 a new con­text class is defined for pol­icy hubs or for agents which have val­i­dated that updates are ready (their cf_promises_validated in /var/cfengine/inputs was dif­fer­ent from the cf_promises_validated file in /var/cfengine/masterfiles on the pol­icy hub). If either of those classes are defined the agent recur­sively scans /var/cfengine/masterfiles on the pol­icy hub and copies files that are dif­fer­ent to /var/cfengine/inputs locally on the exe­cut­ing agent.

So, pol­icy hubs always per­form this update and copy files that are dif­fer­ent from /var/cfengine/masterfiles to /var/cfengine/inputs. and non pol­icy hubs only update /var/cfengine/inputs from /var/cfengine/masterfiles on the pol­icy hub if cf_promises_validated has changes. The hub must always per­form this update if you recall how cf_promises_validated is created/updated. New pol­icy that suc­cess­fully val­i­dates in /var/cfengine/inputs trig­gers cf_promises_validated to be updated in /var/cfengine/masterfiles. Agents need to see that file be dif­fer­ent from the cf_promises_validated file in /var/cfengine/inputs in order to trig­ger a full pol­icy update.

If its still not clear, read it a few more times. Things are usu­ally pretty hard until they aren’t :) . I recall read­ing the extended change log from 3.1.2 sev­eral times, as well as the exam­ple pol­icy until I thought I had a good grasp on the flow. I hope you find this use­ful and I expect that this post will become less use­ful in the near future. I have filed a bug request­ing bet­ter doc­u­men­ta­tion cov­er­age of the default update policy.

5 Comments

  • Danny Windows other version Firefox 16.0 wrote:

    Thanks for that expla­na­tion. I was won­der­ing what all the hub­bub about this was for, and now I under­stand. I per­son­ally don’t use the default update or fail­safe con­fig, as we have a sep­a­rate Q/A process to ver­ify a pol­icy before it goes in place (and we use a staged mech­a­nism to deploy). This is cer­tainly an inter­est­ing way to ensure that invalid pol­icy doesn’t get prop­a­gated out, though I pre­fer my “use Sub­ver­sion and don’t pro­mote to a deploy­able loca­tion until some­one has val­i­dated the code” method bet­ter. ;)

  • I think using the cf_promises_validated file is a good pat­tern. It reduces the load on the hub from clients per­form­ing full scans on mas­ter­files unnec­es­sar­ily. It also func­tions as a gate to stop bro­ken pol­icy from prop­a­gat­ing, though it really shouldn’t mat­ter if bro­ken pol­icy does prop­a­gate, failsafe.cf should fix it as soon as a cor­rect ver­sion is available.

    I use a slightly mod­i­fied default failsafe.cf that has hubs main­tain a clean svn check­out, this could be a spe­cific branch or tag. As pol­icy is com­mit­ted a post-recieve hook does a san­ity check to try and keep bro­ken pol­icy from hit­ting the pol­i­cy­hub. This still allows for some other process chain for pol­icy val­i­da­tion. You could com­mit and the pol­icy could be auto­mat­i­cally val­i­dated, then you could require a human to review pol­icy and tag it which the pol­icy hub(s) would see and update their masterfiles.

    The flex­i­bil­ity is there to do what you think works best for your envi­ron­ment, but if you are using cf-serverd for pol­icy dis­tri­b­u­tion to agents (instead of each agent doing its own svn check­out) I would encour­age you to con­sider the extra work the clients are doing scan­ning mas­ter­files at what­ever inter­val point­lessly look­ing for updates, when they could use cf_promises_validated to decided if its appro­pri­ate to do a full scan.

    It would be great if you could do a write-up of your process. I think its great for the com­mu­nity to see mul­ti­ple approaches. To me one of the hard­est parts about get­ting started with any tech­nol­ogy is not know­ing what you don’t know.

  • Danny Windows other version Firefox 16.0 wrote:

    Well, I do pretty much what you’re describ­ing, with the pol­icy in svn and a post-commit hook on the /tags direc­tory (so devel­op­ers are encour­aged to save their work often). :)

    There’s a good point made about the com­pu­ta­tional expense of com­par­ing the files, but in my mind, it’s more impor­tant to con­firm that the pol­icy has not changed. A mali­cious or igno­rant local admin­is­tra­tor could poten­tially mod­ify a pol­icy file — inten­tion­ally or inad­ver­tently — and that would not be cor­rected or pos­si­bly even observed until there was another pol­icy update. In my envi­ron­ment, where igno­rant peo­ple obtain local root access and inad­ver­tently break all sorts of things, this risk is greater than the cost of cal­cu­lat­ing a few check­sums four times per hour. :)

    I’ve been talk­ing to our legal depart­ment for a while about open-sourcing some of the tools I’ve writ­ten to man­age this, and about writ­ing some of the design down for the pur­poses of bet­ter­ing the com­mu­nity. But it’s a bit of a process get­ting “inter­nal” infor­ma­tion approved for dis­clo­sure. :)

  • Yeah, I totally under­stand the igno­rant peo­ple obtain­ing root and wreak­ing havoc. I’ve been think­ing about adding a time class to bypass cf_promises_validated check and do a full scan at some min­i­mum inter­val for the same reason.

  • Great post, Nick, thanks for writ­ing it!

Leave a Reply

Your email is never shared.Required fields are marked *

To submit your comment, click the image below where it asks you to...
Clickcha - The One-Click Captcha