Workload
The stage.workload
section determines how and when Finch executes transactions.
Read Client and Execution Groups first.
The stage.workload
section is optional.
If omitted, Finch auto-detects and auto-allocates a default workload:
stage:
trx: #
- file: A # Explicit
- file: B #
# workload: # Defaults:
# - clients: 1 #
# trx: [A, B] # all trx in order
# name: "..." # ddlN or dmlN
# iter: 0 # 0=unlimited
# runtime: 0 # 0=forever
The default workload runs all trx in trx order: the order trx files are specified in stage.trx
.
The example above is a stage with two trx files: A and B, in that order. Since there’s no explicit workload, the default workload is shown (commented out): 1 client runs both trx (in trx order) forever—or until you enter CTRL-C to stop the stage.
The default workload is not hard-coded; it’s the result of auto-allocation.
If these values are omitted from a client group, Finch automatically allocates:
clients = 1
- A client groups needs at least 1 client.
iter = 1
- Only if any assigned trx contains DDL.
name = ...
- If any trx contains DDL, the name will be “ddlN” where N is an integer: “ddl1”, “ddl2”, and so on. Else, the name will be “dmlN” where N is an integer but only increments in subsequent client groups when broken by a client group with DDL. For example, if two client groups in a row have only DML, both will be named “dml1” so they form a single execution group.
trx = stage.trx
- If a client group is not explicitly assigned any trx, it is auto-assigned all trx in trx order.
For brevity, “EG” is execution group and “CG” is client group.
- A stage must have at least one EG
- An EG must have at least one CG
- A CG must have a least one client and one assigned trx
- CG are read in
stage.workload
order (top to bottom) - A CG must have a name, either auto-assigned or explicitly named
- An EG is created by contiguous CG with the same name
- EG execute in the order they are created (
stage.workload
order given principles 4–6) - Only one EG executes at a time
- All CG in the same EG execute at the same time (in parallel)
- Clients in a CG execute only assigned trx in
workload.[CG].trx
order - An EG finishes when all its CG finish
These principles are written as P#, like “P1” to refer to “1. A stage must have at least one EG”.
You can assign any trx to a client group. Let’s say the stage file specifies:
stage:
trx:
- file: A
- file: B
- file: C
Given the trx above, the following workloads are valid:
✓ Any trx order
workload:
- trx: [B, C, A]
✓ Repeating trx
workload:
- trx: [A, A, B, B, C]
✓ Reusing trx
workload:
- trx: [A, B, C]
- trx: [A, B]
✓ Unassigned trx
workload:
- trx: [C]
Finch runs forever by default, but you probably need results sooner than that. There are three methods to limit how long Finch runs: runtime (wall clock time), iterations, and data. Multiple runtime limits are checked with logical OR: Finch stops as soon as one limit is reached.
Setting stage.runtime will stop the entire stage, even if some execution groups haven’t run yet.
Setting stage.workload.[CG].runtime will stop the client group.
Since execution groups are formed by client groups (P6), this is effectively an execution group runtime limit.
There is no runtime limit for individual clients; if needed, use a client group with clients: 1
.
One iteration is equal to executing all assigned trx, per client. If a client is assigned trx A, B, and C, it completes one iteration after executing those three trx. But if another client is assigned only trx C, then it completes one iteration after executing that one trx.
stage:
workload:
- iter: N
iter-clients: N
iter-exec-group: N
iter
limits each client to N iterations.
iter-clients
limit all clients in the client group to N iterations.
iter-exec-group
limits all clients in the execution to N iterations.
Combinations of these three are valid.
Finch auto-allocatesiter = 1
for client groups with DDL in an assigned trx.
Data limits will stop Finch even without a runtime or iterations limit.
When using a data limit, you probably want runtime = 0
(forever) and iter = 0
(unlimited) to ensure Finch stops only when the total data size is reached.
And since Finch auto-allcoates iter = 1
for client groups with DDL in an assigned trx, you shouldn’t mix DDL and DML with a data limit in the same trx because iter = 1
will stop Finch before the data limit.
To focus on the workload, let’s presume a stage with three trx files:
stage:
trx:
- file: A
- file: B
- file: C
This stage.trx
section will be presumed and omitted in the following stage file snippets.
The classic benchmark workload executes everything all at once:
workload:
- trx: [A, B, C]
That executes all three trx at the same time, with one client because workload.clients
defaults to 1.
You usually specify more clients:
workload:
- clients: 16
trx: [A, B, C]
That executes 16 clients, all executing all three trx at the same time.
In both cases (1 or 16 clients), the workload is one implicit execution group and one client group.
A sequential workload executes trx one by one. Given P7 and P8 and auto-allocation of DML-only client groups, three different named (explicit) execution groups are needed:
workload:
- trx: [A]
group: first
- trx: [B]
group: second
- trx: [C]
group: third
Finch executes EG “first”, then EG “second”, then EG “third”. This type of workload is typical for a DDL stage because order is important, but in this case there’s an easier way: auto-DDL.
For this example, let’s presume:
- A contains a
CREATE TABLE
statement - B contains a
INSERT
statement (to load rows into the table) - C contains an
ALTER TABLE
statement (to add a secondary index)
Do not write a workload
section and, instead, let Finch will automatically generate this workload:
workload:
- trx: [A] # CREATE TABLE
group: ddl1 # automatic
- trx: [B] # INSERT
group: dml1 # automatic
- trx: [C] # ALTER TABLE
group: ddl2 # automatic
This works because of auto-allocation and most of the principles.
Auto-DDL is sufficient when there’s not a lot of data to load (or you’re very patient). But if you want to load a lot of data, you need a parallel load workload like:
workload:
- trx: [A] # Create 2 tables
group: create
- trx: [B] # INSERT INTO table1
group: rows
clients: 8
- trx: [C] # INSERT INTO table2
group: rows
clients: 8
Suppose trx A creates two tables.
The first client group is also its own execution group because of group: create
, and it runs once to create the tables.
The second and third client groups are the same execution group because of group: rows
, and they execute at the same time (P9).
If trx B inserts into the first table, and trx C inserts into the second table, then 16 clients total will parallel load data.