Finch Docs
Toggle Dark/Light/Auto modeToggle Dark/Light/Auto modeToggle Dark/Light/Auto modeBack to homepage

Overview

A Finch benchmark is defined, configured, and run by one or more stage.

Finch does not have any built-in, hard-coded, or required stages. You write (and name) the stages you need for a benchmark.

This page presumes and requires familiarity with Finch concepts.

Two Stages

Benchmarks usually need two stages: one to set up the schema and insert rows, and another to execute queries. By convention, Finch calls these two stages “DDL” and “standard”, respectively.

The difference is important because it affects how Finch runs a stage by default:

StageExec OrderRuntime LimitStats
DDLSequentialData/RowsNo
StandardConcurrentTimeYes

These are only defaults that can be overridden with explicit configuration, but they’re correct for most benchmarks. To see why, let’s consider a simple but typical benchmark:

  1. Create a table
  2. Insert rows
  3. Execute queries
  4. Report query stats

DDL

Data definition language (DDL) refers primarily to CREATE and ALTER statements that define schemas, tables, and index. In Finch, a DDL stage applies more broadly to include, for example, INSERT statements used only to populate new tables. (INSERT is actually data manipulation language [DML], not DDL.)

A DDL stage executes the first two steps of a typical benchmark:

  1. Create a table
  2. Insert rows

It’s important that these two steps are executed in order and only once per run. For example, INSERT INTO t is invalid before CREATE TABLE t. Likewise, running CREATE TABLE t more than once is invalid (and will cause an error).

There’s also a finite number of rows to insert. Benchmark setups don’t insert rows for 5 minutes, or some such arbitrary limit, because good benchmarks need more precise data sizes. Like most benchmark tools, Finch can limit this INSERT by row count—insert 1,000,000 rows for example. Finch can also insert rows until the table reaches a certain size. Either way, the INSERT runs until a certain data size (or row count) is reached, then it stops.

In Finch, you can achieve these two steps in a one stage and two trx file, like:

Stage File: setup.yaml

stage:
  trx:
    - file: schema.sql
    - file: rows.sql

Trx File: schema.sql

CREATE TABLE t (
  /* columns */
)

Trx File: rows.sql

-- rows: 1,000,000
INSERT INTO t VALUES /* ... * /

The stage lists the trx in order: schema first, then rows. Trx order is important in this case for two reasons.

First, this minimally-configured stage relies on Finch auto-detecting the DDL and executing the workload sequentially in stage.trx order. (This auto-detection can be disabled or overridden, but it’s usually correct.)

Second, the CREATE TABLE must execute only once, but the INSERT needs to execute one millions times. (Terribly inefficient, but it’s just an example.) The -- rows: 1,000,000 is a data limit, and it applies to the Finch trx where it appears, not to the statement. Intuitively, yes, it should apply only to the statement, but for internal code efficiency reasons it doesn’t yet work this way; it applies to the whole trx. As such, the INSERT must be specified in a separate Fix trx file. But presuming the INSERT statement can be executed in parallel, a separate trx file makes it possible to insert in parallel with multiple clients by configuring the workload:

stage:
  workload:
    - trx: [schema.sql]
    - trx: [rows.sql]
      clients: 16      # paraellel INSERT
  trx:
    - file: schema.sql
    - file: rows.sql

By adding a stage.workload section, you tell Finch how to run each trx. In this case, Finch will execute schema.sql once with one client; then it will execute rows.sql with 16 clients until 1,000,000 rows have been inserted.

This simple example works, but Benchmark / Examples shows better methods and explains some magic happening behind the scenes.

Standard

A standard stage executes queries, measures response time, and reports the statistics. This is what engineers think of and expect from a standard database benchmark:

  1. Execute queries
  2. Report query stats

Finch handles the stats, so your focus is queries and how to execute them (the workload).

Queries

Finch benchmarks are declarative: write the real SQL statement that you want to benchmark. Let’s imagine that you want to benchmark the two most important read queries from your application. Put them in a file called reads.sql (or whatever name you want):

SELECT c1 FROM t WHERE id = @id

SELECT c2 FROM t WHERE n BETWEEN @n AND @PREV LIMIT 10

These are fake queries, so don’t worry about the details. The point is: they’re real SQL statements; no special benchmark scripting language.

Since you probably already know SQL, you can spend time learning:

  1. How and why to model transactions in Finch: Benchmark / Trx
  2. The very simple Finch trx file syntax: Syntax / Trx File
  3. How to use data generators (@d): Data / Generators

The first two are trivial—learn once and done. Data generators can be simple or complex depending on your benchmark.

Workload

Queries are inert until executed, and that’s what the workload does: declare how to execute the queries (SQL statements written in Finch trx files). Like data generators, the workload can be simple or complex depending on your benchmark. Consequently, there’s a whole page just for workload: Benchmark / Workload.

Other Stages

Two examples of other stages are “warm up” and “tear down”. A warm up stage is typically executed before a standard stage to populate database caches. A clean up stage is typically executed after a standard stage to remove the schemas created by the DDL stage.

Finch does not have any built-in, hard-coded, or required stages. You can name your stages (almost) anything and execute them in any order. Aside from some auto-detection (that can be overridden), Finch treats all stages equally.

Multi-stage

You can run multiple stages in a single run of Finch:

finch setup.yaml benchmark.yaml cleanup.yaml

That runs stage setup.yaml, then stage benchmark.yaml, then stage cleanup.yaml.