Concepts

The following concepts are important for understanding how Finch works, and many of them are unique to Finch.

If you’re new to database benchmarking read Fundamentals of Database Benchmarking.

Run Levels

Finch is designed on a hierarchy of run levels:

global
└──stage
   └──workload
      └──exec group
         └──client group
            └──client
               └──iter
                  └──trx
                     └──statement

These are called run levels because they determine if, when, and how various aspects of Finch run. By understanding the run levels, you’ll understand all the core concepts of Finch and how they work.

Let’s start at the bottom because it’s the most direct and concrete concept: SQL statements. Moving up the hierarchy, the concepts become increasingly abstract and specific to Finch.

Statements

A statement is any SQL statement that Finch can execute on MySQL: SELECT, INSERT, UPDATE, DELETE, and so forth.

This is usually synonymous with query, but statement is more general because, for example, would you call BEGIN a query? Maybe, but what about CREATE or SET?

Statements are the lowest level—but most important part—of running Finch: executing SQL statements on MySQL. This shouldn’t be surprising since Finch is a benchmarking tool, but what might be surprising is that you write Finch benchmarks using real SQL statements; they are not wrapped in a scripting language. You write one or more SQL statement in a Finch transaction file.

Transactions

Finch transactions (trx) are plain text files that contain one or more SQL statement to execute. Suppose that you want to benchmark these statements:

SELECT c FROM t WHERE id = @d

BEGIN

SELECT id FROM t WHERE user = @d

UPDATE users SET n = n + 1 WHERE id = @d

COMMIT

You can put those statements in one file and it will work, but it’s better to separate the transactions:

trx-1.sql

SELECT c FROM t WHERE id = @d

trx-2.sql

BEGIN

SELECT id FROM t WHERE user = @d

UPDATE users SET n = n + 1 WHERE id = @d

COMMIT

Ignore the @d placeholders for now; they are Finch data keys described later on this page.

With separate Finch trx, you can craft more advanced and interesting benchmarks, like:

Trx	Clients	TPS
trx-1.sql	100	(Unlimited)
trx-2.sql	20	50

Finch will run the first trx (the single SELECT) on 100 clients as fast as possible, and it will run the second trx (the explicit, multi-statement SQL transaction) on only 20 clients limited to 50 transactions per second (TPS).

Remember the example in the table above because it’s reused in the following sections.

Finch also collects and reports statistics per-trx. (The built-in stats reports combine all trx stats, but a stats plugin receives per-trx stats.)

You can write a benchmark as one large pseudo-transaction (all SQL statements in one file), but for various reasons that will become clear later, it’s better to write each SQL transaction in a separate Finch trx file.

Client

A client is one MySQL client (connection) that executes all SQL statements in all trx assigned to it. Each pass through all assigned trx is one iter (iteration).

You can configure Finch to run any number of clients. This is typical for benchmarking tools, but Finch has two unique features:

Clients can be grouped (described next)
Clients can be assigned different trx

The second point makes Finch especially powerful and flexible. For example, as shown in the table above, you can run 100 clients executing certain transactions, and 20 clients executing other transactions. Having different clients executing different trx is why Finch needs client and execution groups.

Client and Execution Groups

A client group is a logical group of clients with the same trx and limits. From the example in the table above, this snippet of a Finch stage file creates two client groups:

workload:
  - clients: 100      # Client group 1
    trx: [trx-1.sql]  # 

  - clients: 20       # Client group 2
    trx: [trx-2.sql]  #
    tps-clients: 50   #

The first client group will run 100 independent clients, all executing the statements in trx trx-1.sql. The second client group will run 20 independent clients, all executing the statements in trx trx-2.sql, and limited to 50 TPS across all 20 clients. Together, this will create 120 MySQL clients (connections).

An execution group is one or more client groups with the same group name.

Let’s pretend you want to execute another client group after client groups 1 and 2 (above) are done. That requires putting client groups 1 and 2 in a named execution group—let’s call it FOO—and the third client group in another named execution group—let’s call it BAR. A snippet of a Finch stage file is shown below that creates these two execution groups.

workload:
                      # FOO ----------
  - clients: 100      # Client group 1
    trx: [trx-1.sql]  # 
    group: FOO        #
  - clients: 20       # Client group 2
    trx: [trx-2.sql]  #
    tps-clients: 50   #
    group: FOO        #
                         # BAR ----------
  - client: 1            # Client group 3
    trx: [trx-3.sql]     #
    group: BAR           #

Finch processes the workload list top to bottom, so order matters when creating execution groups. All client groups in an execution group run concurrently. And execution groups are run top to bottom in workload order.

Visually, the workload processing (Y axis) and time when the execution groups are run (X axis) can be depicted like:

Finch Execution Group

Both client groups in execution group FOO (blue) run first. When they’re done, the client group in execution group BAR runs next.

This is advanced workload orchestration. Sometimes it’s necessary (like with DDL statements), but it’s optional for most simple benchmarks that just want run everything all at once.

Workload

A workload is a combination of queries, data, and access patterns. In Finch, the previous (lower) run levels combine to create a workload that is defined in the aptly named workload section of a stage file:

stage:
  workload:
    - clients: 100
      trx: [trx-1.sql]
    - clients: 20
      trx: [trx-2.sql]
      tps-clients: 50 # all 20 clients
  trx:
    - file: trx-1.sql
    - file: trx-2.sql

That defines the workload for the running example (the table in Transactions): 100 clients to execute the single SELECT in trx-1.sql and, concurrently, 20 clients to execute the transaction in trx-2.sql limited to 20 TPS (across all 20 clients).

In short, a Finch workload defines which clients run which transitions, plus a few features like QPS and TPS limiters.

Stages

A stage executes a workload. As shown immediately above, a Finch stage is configured in a YAML file. Here’s a real and more complete example:

stage:
  name: read-only
  runtime: 60s
  mysql:
    db: finch
  workload:
    - clients: 1
  trx:
    - file: trx/read-only.sql
      data:
        id:
          generator: "rand-int"
          scope: client
          params:
            max: $params.rows
            dist: normal
        id_100:
          generator: "int-range"
          scope: client
          params:
            range: 100
            max: $params.rows

Every stage has a name: either set by stage.name in the file, or defaulting to the base file name. Every stage needs at least one trx, defined by the stage.trx list. Every stage should have an explicitly defined workload (stage.workload), but in some cases Finch can auto-assign the workload.

The rest is various configuration settings for the stage, workload, data keys, and so forth.

Finch runs one or more stages sequentially as specified on the command line:

finch STAGE_1 [... STAGE_N]

This allows you to do a full setup, benchmark, and cleanup in a single run by writing and specifying those three stages. But you don’t have to; it’s common to set up only once but run a benchmark repeatedly, like:

finch setup.yaml
finch benchmark.yaml
finch benchmark.yaml
finch benchmark.yaml

Compute

The global run level represents one running instances of Finch (all stages) on a compute instance. What’s unique about Finch is that one instance (a server) can coordinate running the stages on other instances (clients):

Finch Distributed Compute

This is optional, but it’s easy to enable and useful for creating a massive load on a MySQL instance with several smaller and cheaper compute instances rather than one huge benchmark server.

To enable, configure a stage with:

compute:
  instances: 2  # 1 server + 1 client

Then start a Finch server and one client like:

Server

finch benchmark.yaml

Client

finch --server 10.0.0.1

The server sends a copy of the stage to the client, so there’s nothing to configure on the client. The client sends its statistics to the server, so you don’t have to aggregate them—Finch does it automatically.

Data Keys

A data generator is a plugin that generates data for statements. Data generators are referenced by user-defined data keys like @d.

“Data key” and “data generator” are synonymous because they’re are two sides of the same coin. For brevity, “data key” (or just “@d”) is used the most.

Name and Configuration

Data keys are used in statements and configured with a specific data generator in the stage file:

Statement

SELECT c FROM t WHERE id = @d

Stage File

stage:
  trx:
    - file: read.sql
      data:
        d: # @d
          generator: "int"
          params:
            max: 25,000
            dist: uniform

@d in a statement, but d (no @ prefix) in a stage file because @ is a reserved character in YAML.

In the stage file, stage.trx[].data.d.generator configures the type of data generator: data key @d uses the int (random integer) data generator. And stage.trx[].data.d.params provides generator-specific configuration; in this case, configuring the int generator to return a uniform random distribution of numbers between 1 and 25,000. When Finch clients execute the SELECT statement above, data key @d will be replaced with random values from this int data generator.

Finch ships with built-in data generators, and it’s easy build your own data generator.

Scope

Data keys are scoped to a run level. To see why data scope is important, consider this transaction:

BEGIN

SELECT c FROM t WHERE id = @d

UPDATE t SET c=c+1 WHERE id = @d

COMMIT

Is @d the same or different in those two statements? The answer depends on the configured scope of @d:

`@d` Scope	Result
statement	Different `@d`: one generator and value for `SELECT @d`, and another generator and value for `UPDATE @d`
trx	Same `@d`: one generator and one value for both

Since the statements are in the same MySQL transactions, it intent is probably that @d is trx scoped so that the value in both statements is the same. But Finch supports very complex workloads, so Data / Scope ranges from simple to complex.

Input/Output

From the statement point of view, most data keys provide input to a statement. But data keys can also be outputs: a statement provides a value to a data key. And a data key can be both. For example, a data key is used to save the insert ID from a statement (output) and provide to another statement (input):

-- save-insert-id: @d
INSERT INTO money_transfer VALUES (...)

INSERT INTO audit_log VALUES (@d, ...)

Finch saves the insert ID returned from MySQL into @d. Then, in the second query, @d returns that insert ID value.

Benchmark Directory

By convention, stages related to the same benchmark are put in the same directory, and trx files are put in a trx/ subdirectory. Here’s an example from the Finch repo:

finch/benchmarks/sysbench % tree .
.
├── _all.yaml
├── read-only.yaml
├── setup.yaml
├── trx
│   ├── insert-rows.sql
│   ├── read-only.sql
│   ├── schema.sql
│   ├── secondary-index.sql
│   └── write-only.sql
└── write-only.yaml

That is a recreation of the venerable sysbench benchmarks:

Stage setup.yaml creates the sysbench schema and inserts rows
Stage read-only.yaml is the sysbench read-only benchmark
Stage write-only.yam is the sysbench write-only benchmark

All trx files are kept under trx/. These are referenced in the stage files with relative paths, like trx/insert-rows.sql.

_all.yaml is a special meta file with parameters for all stages.

These conventions and relatives paths work because Finch changes working directory to the directory of the stage file.

Parameters

Parameters let you to define key-value pairs once and reference them in separate places—the DRY principle: don’t repeat yourself. The simplest example is a single stage and trx file:

Stage

stage:
  params:
    rows: "10,000"
  trx:
    - file: trx.sql

Trx

-- rows: $params.rows
INSERT INTO t (...)

$params.rows in the trx is replaced by the stage.params.rows value (10,000).

You can also override the value on the command line:

finch benchmark.yaml --param rows=500000

Parameters are most useful for multi-stage benchmarks because the shared parameters can be defined in _all.yaml in the same directory as the stages. When _all.yaml exists, Finch automatically applies its parameters to all stages.

For example, in the Benchmark Directory above, parameters in _all.yaml are automatically applied to all stages (setup.yaml, read-only.yaml, and write-only.yaml). When the setup stage is run, params.rows number of rows are inserted into the table. When the other stages are run, random rows between 1 and params.rows are accessed.