Scope
Data keys have a scope corresponding to run levels plus two special low-level scopes:
global
└──stage
└──workload
└──exec-group
└──client-group
└──client
└──iter
└──trx
└──statement (default)
└──row
└──value
A data key is unique within its scope, and its data generator is called once per scope iteration.
Data Scope | Scope Iteration (Data Generator Called When) | Class |
---|---|---|
global | Finch runs | One time |
stage | Stage runs | One time |
workload | Each client starts a new iter | Multi client |
exec-group | Each client starts a new iter | Multi client |
client-group | Each client starts a new iter | Multi client |
client | Client connects to MySQL or recoverable query error | Single client |
iter | Iter count increases | Single client |
trx | Trx count increases | Single client |
statement | Statement executes | Single client |
row | Each @d per row when statement executes | Special |
value | Each @d when statement executes | Special |
For example, @d with statement scope (the default) is called once per statement execution. Or, @d with iter scope is called once per client iter (start of executing all trx assigned to the client).
As this rest of this page will show, data scope makes it possible to craft both simple and elaborate workloads.
Specify stage.trx[].data.d.scope
for a data key:
stage:
trx:
- file:
data:
d:
generator: int # Required data generator name
scope: trx # Optional data scope
If not specified, statement is the default scope.
Implicit Call | Explicit Call |
---|---|
@d | @d() |
By default, each data key is called once per scope iteration because @d
is an implicit call: Finch calls the data generator when the scope iteration changes.
Implicit calls are the default and canonical case because it just works for the vast majority of benchmarks.
An explicit call, like @d()
, calls the data generator regardless of the scope iteration but always within the data scope.
For example, presume @d
has statement scope and uses the auto-inc
generator:
SELECT @d @d
-- SELECT 1 1
SELECT @d @d()
-- SELECT 1 2
The left query returns 1 1
because the second @d
is an implicit call and the scope (statement) has not changed.
The right query returns 1 2
because the second @d()
is an explicit call, so Finch calls the data generator again even though the scope iteration hasn’t changed.
Row scope is a good example of why explicit calls are sometimes required.
Until then, the rest of this page will continue to default to the canonical case: implicit calls, @d
.
To reason about and explain single client data scopes, let’s use a canonical example:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a, @R
UPDATE @a, @R
-- Trx B n=2
DELETE @a, @R
-- Iter 2
-- Trx A n=3
INSERT @a, @R
UPDATE @a, @R
-- Trx B n=4
DELETE @a, @R
The client executes two trx: trx A is an INSERT
and an UPDATE
; trx B is a DELETE
.
There’s a trx counter: n=1
, n=2
, and so forth.
Remember that the statements shown in iter 1 are the same as in iter 2: it’s one INSERT
executed twice; one UPDATE
executed twice; one DELETE
executed twice.
There are two data keys:
- @a uses the
auto-inc
generator so we know what values it returns when called: 1, 2, 3, and so on. - @R uses the random
int
generator so we don’t know what value it returns when called.
These different data keys will help highlight how data scope works and why it’s important.
Using the canonical example described above, the default data scope (statement) returns:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a=1, @R=101
UPDATE @a=1, @R=492
-- Trx B n=2
DELETE @a=1, @R=239
-- Iter 2
-- Trx A n=3
INSERT @a=2, @R=934
UPDATE @a=2, @R=111
-- Trx B n=4
DELETE @a=2, @R=202
First notice that @a=1
for all three statements in iter 1.
This is because @a is statement scoped: each @a in each statement is unique, so there are three data generators and they each return the first value: 1.
Another way to look at it: statement scoped @a in INSERT @a
is only visible and accessible within that statement; none of the other statements can see or access it.
The values (in iter 1) are all 1 only because each statement scoped @a happens to return that as the first value.
@R demonstrates why statement scope is the default. Typically, benchmarks uses random values and expect different (random) values for each query. Since @R is statement scope, the typical case is generated by default. A better example for @R is something like:
SELECT c FROM t WHERE id = @R
That might benchmark random lookups on column id
.
Using the canonical example described above, trx scope returns:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a=1, @R=505
UPDATE @a=1, @R=505
-- Trx B n=2
DELETE @a=1, @R=293
-- Iter 2
-- Trx A n=3
INSERT @a=2, @R=821
UPDATE @a=2, @R=821
-- Trx B n=4
DELETE @a=2, @R=410
Remember that “trx” in these docs refers to a Finch trx, not a MySQL transaction, although the two are closely related.
First again, notice that @a=1
in the first trx (A) and second trx (B).
This is because @a is trx scoped, so it’s unique to each trx (A and B) and called once per trx (when the trx count n
increases).
But since @a returns the same initial values, it looks the same, so @R demonstrates trx scope better.
Although @R generates random integers, when it’s trx scoped it’s called only once per trx, so @R in each trx has the same value, like @R=505
in client 1 iter 1 trx n=1.
This could be used, for example, to benchmark inserting, updating, and deleting random rows, like:
INSERT INTO t (id, ...) VALUES (@R, ...)
UPDATE t SET ... WHERE id=@R
DELETE FROM t WHERE id=@R
Fun fact: the classic sysbench write-only benchmark does this: it deletes a random row then re-inserts it.
See @del_id
in its config.
Using the canonical example described above, iter scope returns:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a=1, @R=505
UPDATE @a=1, @R=505
-- Trx B n=2
DELETE @a=1, @R=505
-- Iter 2
-- Trx A n=3
INSERT @a=2, @R=821
UPDATE @a=2, @R=821
-- Trx B n=4
DELETE @a=2, @R=821
An iteration (iter) is one execution of all trx assigned to a client. In this example, @a and @R are iter scoped, so they’re called once per iter (per client). Values are the same across trx and only change with each new iter.
Iter scope is useful to “share” values across multiple trx. This is equivalent to combining multiple trx into one and using trx scope.
Client scope has the same scope as iter (one client) but its scope iteration is unique: when the client connects to MySQL or recovers from a query error. Client scope increments at least once: when the client first connects to MySQL. Further increments occur when the client reconnects to MySQL or starts a new iter to recover from certain errors (see Benchmark / Error Handling).
Is this scope useful? Maybe. For example, perhaps there’s a use case for client scoped @d to handle duplicate key errors or deadlocks—recoverable errors that start a new iter. A client scoped @d would know it’s a recoverable error and not just the next iteration, whereas an iter scoped data key couldn’t know this.
It might be helpful to think of client scope as “iter-on-error”.
To reason about and explain multi client data scopes, let’s add a second client to the canonical example:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a, @R
UPDATE @a, @R
-- Trx B n=2
DELETE @a, @R
-- Iter 2
-- Trx A n=3
INSERT @a, @R
UPDATE @a, @R
-- Trx B n=4
DELETE @a, @R
Client 2
-- Iter 1
-- Trx A n=1
INSERT @a, @R
UPDATE @a, @R
-- Trx B n=2
DELETE @a, @R
Using the canonical example described immediately above, client-group scope returns:
Client 1
-- Iter 1
-- Trx A n=1
INSERT @a=1, @R=505
UPDATE @a=1, @R=505
-- Trx B n=2
DELETE @a=1, @R=505
-- Iter 2
-- Trx A n=3
INSERT @a=3, @R=821
UPDATE @a=3, @R=821
-- Trx B n=4
DELETE @a=3, @R=821
Client 2
-- Iter 1
-- Trx A n=1
INSERT @a=2, @R=743
UPDATE @a=2, @R=743
-- Trx B n=2
DELETE @a=2, @R=743
With client group scope, @a and @R are unique to the client group, which means their data generators are shared by all clients in the group. And the client group scope iteration is “Each client starts a new iter”, which means their data generators are called when each client starts a new iter.
Presume this call order:
- Client 1 iter 1
- Client 2 iter 1
- Client 1 iter 2
Since @a is shared by all clients in the group and called when each client starts a new iter, that call order explains the values.
Client group scoped data keys are useful with (pseudo) stateful data generators like @a where the call order matters. This scope allows coordination across multiple clients. For example, it’s necessary to insert values 1..N without duplicates using multiple clients.
With random value generators like @R, client group scope is equivalent to iter scoped presuming no explicit calls.
Exec group scope works the same as client group scope but is unique to all client groups in the exec group. But be careful: since different client groups can execute different trx, make sure any data keys shared across client groups make sense.
Workload scope works the same as client group scope but is unique to all exec groups, which means all clients in the stage. But be careful: since different exec groups can execute different trx, make sure any data keys shared across exec and client groups make sense.
Stage data scope applies to the entire stage but stage scoped data keys are only called once when the stage starts. This might be useful for static or one-time values reused across different exec or client groups. Or, stage scoped data keys can be called explicitly.
Global data scope applies to all stages in a single Finch run.
For example, if Finch is run like finch stage1.yaml stage2.yaml
, global scope applies to both stages.
Global scoped data keys are only called once (when the first query of the first client of the first stage executes).
Or, global scoped data keys can be called explicitly.
Global data scope does not span compute instances. It’s an interesting idea that could work, but is there a use case for sharing a data value across compute instances?
Row scope is intended for use with CSV substitution to produce multi-row INSERT
statements:
Trx File →
INSERT INTO t VALUES
/*!csv 2 (@a, ...)*/
Automatic Transformation →
INSERT INTO t VALUES
(@a(), ...)
,(@a(), ...)
Resulting Values
INSERT INTO t VALUES
(1, ...) -- row 1
,(2, ...) -- row 2
With row scope, the auto-inc
data key, @a, is unique to the statement but called for each row.
As shown above, when used with CSV substitution, Finch automatically transforms @a to an explicit call in each row.
Moreover, it transforms only the first occurrence of every unique data key in the row.
In this example, /*!csv N (@a, @a, @a)*/
produces (1, 1, 1)
for the first row: the first @a is an explicit call, and the latter two @a are copies of the first.
To achieve the same results without CSV substitution, use statement scope and explicit calls—manually write the query like the automatic transformation shown above—or use value scope.
Value scope means every @d has its own unique data generator and is called every time the statement is executed. This sounds like statement scope with explicit calls, but there’s a difference:
Value Scope
@d @d
│ │
SELECT @d, @d
Statement Scope with Explicit Calls
┌─ @d ─┐
│ │
SELECT @d(), @d()
With value scope, each @d has its own data generator. With statement scope and explicit calls, all @d in the statement share/call the same data generator.
Whether or not this makes a difference depends on the data generator.
For (pseudo) stateful generators like auto-inc
, it makes a difference: value scope yields 1 and 1; statement scopes with explicit calls yields 1 and 2.
For random value generators, it might not make a difference, especially since @d can have only one configuration.
If, for example, you want two random numbers with the same generator but configured differently, then you must use two different data keys, one for each configuration.