Requests

Requests are specified in YAML files. On startup, the Request Manager (RM) reads and checks all specs: all the .yaml files in specs.dir. All files in and under the specs directory ending with .yaml (case-insensitive) will be assumed to be specs files.

All spec files have the same syntax. The RM combines specs from multiple files to complete a request. One sequence per file and descriptively named files help keep all the specs oranized and easy to find by humans.

Sequence Spec

The root of every spec file is a sequence spec:

---
sequences:
  stop-container:
    request: true
    args:
      required:
        - name: containerName
          desc: "Container name to stop"
      optional:
        - name: restart
          desc: "Restart the container if \"yes\""
          default: ""
      static:
        - name: slackChan
          default: "#dba"
    acl:
      - role: eng
        ops: admin
      - role: ba
        ops: ["start","stop"]
    nodes:
      NODE_SPECS

This defines one or more sequences under sequences:. Although multiple sequence can be defined in a single file, we suggest one sequence per file.

The example above defines one sequence called “stop-container”. (We would name this file stop-container.yaml.) If request: true, the sequence is a request that callers can make. This also makes spinc (ran without any command line options) list the request. Set request: true only for top-level sequences that you want to expose to users as requests. All requests are sequences, but not all sequences are requests. To distinguish:

request: a sequence with request: true
non-request sequence (NRS): a sequence with request: false (usually omitted)
sequence: any and all sequences, generally speaking

args:

Sequences have three types of arguments (args): required, optional, and static.

required: args are, unsurprisingly, required. For requests, required args are provided by the caller, so they should be kept to a minimum—require the user to provide only what is necessary and sufficient to start the request, then figure out other args in jobs. For non-request sequences (NRS), required args are provided by the parent node (nodes are discussed in the next section).
optional: args are optional. If not explicitly given, the default value in the spec is used. In the example above, arg “restart” defaults to an empty string unless the user provides a value.
static: args are fixed values. Static arg “slackChan” has value “#dba”. Static args are useful when the value is known but differs in different sequences. For example, another request might set slackChan=#yourTeam to get Slack notifications at #yourTeam instead of #dba. This could also be solved by making slackChan a required or optional arg.

In job args, there are no distinctions. jobArgs["slackChan"] is the same as jobArgs["containerName"], and jobs can change its value.

Node Specs

A sequence is one or more node (vertex in the graph) defined under nodes:. There are three types of node specs. Shared fields (e.g. retry:) are only described once.

Job Node

A job node specifies a job to run. (If this was a tree data structure, these would be leaf nodes.) Every sequence eventually leads to job nodes. This is where work happens:

      expand-cluster:
        category: job
        type: etre/expand-cluster
        args:
          - expected: cluster
            given: cluster
        sets:
          - arg: app   # string
            as: clusterApp
          - arg: env   # string
          - arg: nodes # []string
        retry: 2
        retryWait: 3s
        deps: []

All node specs begin with a node name: “expand-cluster”, in this case. Node names must be unique within the sequence. (Spin Cycle makes nodes unique within a request by assigning them an internal job ID.) category: job makes this node a job node. type: specifies the job type: “etre/expand-cluster”. The jobs.Factory in your jobs repo must be able to make a job of this type.

args: lists all job args that the job requires. expected: is the job arg name that the job expects, and given: is the job arg name in the specs to use. In other words, jobArgs[expected] = jobArgs[given]. This is useful because it is nearly impossible to make all job args in specs match all job args in jobs. For example, a spec might use “host” for a server’s hostname, but a job uses “hostname”. In this case,

- expected: hostname
  given: host

makes Spin Cycle do jobArgs["hostname"] = jobArgs["host"] before passing jobArgs to the job.

If expected == given, given: may be omitted.

Only job args listed under args: are passed to the job. If a job needs arg “foo” but “foo” is not listed, then jobArgs["foo"] will be nil in the job. This requirement is strict and somewhat tedious, but it makes specs complete self-describing and easy to follow because there are no “hidden” args.

If a job has optional args, they must be listed so they are passed to the job, in case they exist. The job is responsible for using the optional args or not. (Note: “optional” here is not the same as sequence-level optional args.)

sets: specifies the job args that the job sets. The RM checks this. arg: and as: are to sets: as expected: and given: are to args: above: arg: is the job arg name set by the job, and as: is the job arg name in the specs to use.

If arg == as, as: may be omitted.

In the example above, the job sets “app” (remapped to “clusterApp” by the RM), “env”, and “node” in jobArgs. After calling the job’s Create method, the RM checks that all three are set in jobArgs (with any value, including nil). Like args:, this is strict but makes it possible to follow every arg through different sequences. It also makes it explicit which jobs set which args.

retry: and retryWait: specify how many times the JR should retry the job if Run does not return proto.STATE_COMPLETE. The job is always ran once, so total runs is 1 + retry. retryWait is the wait time between tries. It is a time.Duration string like “3s” or “500ms”. If not specified, the default is no wait between tries.

deps: is a list of node names that this node depends on. For nodes A and B, if B depends on A, the graph is A -> B. The JR runs B only after A completes successfully. A node can depend on many nodes, creating fan-out and fan-in points:

      B ->
    /      \
A ->        +-> E
    \      /
      C ->

Node E has deps: [B,C]. Nodes B and C have deps: [A]. Node A has deps: [].

deps: determines the order of nodes, not the order of node specs in the file. Every sequence must have a node with deps: [] (the first node in the sequence). Cycles are not allowed.

Sequence Node

All node specs begin with a node name: “notify-app-owners”, in this case. category: sequence makes this node a sequence node. type: specifies the sequence name: “notify-app-owners”. A node and sequence can have the same name. Whereas a job node runs a job, a sequence node imports another sequence.

      notify-app-owners:
        category: sequence
        type: notify-app-owners
        args:
          - expected: appName
            given: app
          - expected: env
            given: env
        sets:
          - arg: messageSent
            as: message
        deps: [expand-cluster]
        retry: 9
        retryWait: 5000ms

When the RM encounters this sequence node, it looks for a sequence called “notify-app-owners”. (We would put that sequence in a file named notify-app-owners.yaml.) It replaces the sequence node with all the nodes in the target sequence. Since sequences can have required args, the sequence node must specify the args: to pass to the sequence (as if the sequence was a request). The same rules about args:, expected:, and given: apply. The only difference is that the job args are passed to a sequence instead of a job.

sets: also applies to sequences - it specifies the job args that the jobs within the sequence set. In this example, the “notify-app-owners” sequence sets the job “messageSent”, which is renamed to “message” for use in this spec. One of the jobs within the “notify-app-owners” sequence must therefore set the arg “messageSent”.

The same rules about deps: apply (described above). In this example, the “notify-app-owners” sequence is not called until the “expand-cluster” node is complete and successful. Likewise, if another node deps: [notify-app-owners], it is not called until the entire sequence is complete and successful. The sequence node, at this point in the spec, acts like a single node—it just happens to contain/run other nodes and sequences.

retry: and retryWait: apply to sequences, too. If any job in the sequence fails, the entire sequence is retried from its beginning.

Sequences of Sequences

Sequences “calling” sequences are how large requests are built. Like a job, a sequence is a unit of work—a bigger unit of work. The “notify-app-owners” sequence, for example, might have several jobs which detremine who the app owners are, what their notification preferences are, and then notify them accordingly. That is one unit of work: notifying app owners. It is also a reusable unit of work.

Conditional Node

category: conditional makes this node a conditional node. if: specifies the job arg to use, and eq: operates like a switch cases on the if: job arg value.

      restart-vttablet:
        category: conditional
        if: vitess
        eq:
          yes: restart-vttablet
          default: noop
        args:
          - expected: node
            given: name
          - expected: host
            given: physicalHost
        deps: [bgp-peer-node]

In the example above, if jobArgs["vitess"] == "yes", then sequence “restart-vttablet” is called. Else, the default is a special, built-in sequence called “noop” which does nothing. In the case that jobArgs["vitess"] == "yes" and sequence “restart-vttablet” is called, the node acts exactly like a sequence node.

All values and comparison are expected to be strings. The if: job arg must be set with a string value, and the values listed under eq: are string values, except “default” which is a special case.

Conditional nodes can be used to switch between alternatives or, like the example above, do nothing in one part of a spec (but do everything else before and after).

Conditional nodes can also set job args using sets:, just as in job or sequence nodes. In order for a conditional node to set an arg, every sequence that the conditional may call must set that arg.

Sequence Expansion

Sequence expansion is possible in sequence and conditional nodes with each::

      decomm-nodes:
        category: sequence
        type: decomm-node
        each:
          - nodeHostname:node
          - hosts:host
        parallel: maxParallel
        args: []
          - expected: archiveData
            given: archiveData
        deps: []

each: takes a list of job arg names and “expands” the sequence, “decomm-node”, in parallel for each job arg value. Expanded sequences are ran in parallel.

parallel: takes a positive integer. At most maxParallel expanded sequences will run in parallel at a given time.

The job args must be type []string of equal lengths. In this example, the job args could be:

nodeHostname := []string{"node1", "node2"}
hosts := []string{"host1", "host2"}

The syntax is list:element where each jobArg[element] is initialized from the next value in list. The target sequence should require element.

The args: are passed to each expanded sequence as-is, i.e. each “decomm-node” sequence receives jobArgs[archiveData].

A conditional node with sequence expansion expands the sequence that matches if: and eq:.

The same rules about deps: apply (described above).

Linter

The RM checks all spec files on startup. This includes static checks, most of which can be performed by looking at a single node or sequence, and graph checks, which necessarily involve building graphs that describe the request specs. If some (less important) checks fail, the RM logs warnings; if others fail, the RM logs those errors and fails.

Some examples of static checks: ensuring that the category field is one of job, conditional, or sequence; checking that a sequence has at least one node; making sure a sequence node calls an actual sequence.

Some examples of graph checks: catching circular dependencies; making sure all job args for a node has been set by previous nodes, or by the sequence.

spinc-linter CLI

spinc-linter is a CLI into a local build of the linter (and only the linter). It runs exactly the same checks that the RM does on startup and logs all errors to stdout. Any errors thrown by linter should be addressed, because they will cause the RM to fail. Warnings should be ignored with caution; they indicate likely typos or mistakes in the specs.