Overview
A Finch benchmark is defined, configured, and run by one or more stage.
Finch does not have any built-in, hard-coded, or required stages. You write (and name) the stages you need for a benchmark.
This page presumes and requires familiarity with Finch concepts.
Benchmarks usually need two stages: one to set up the schema and insert rows, and another to execute queries. By convention, Finch calls these two stages “DDL” and “standard”, respectively.
The difference is important because it affects how Finch runs a stage by default:
Stage | Exec Order | Runtime Limit | Stats |
---|---|---|---|
DDL | Sequential | Data/Rows | No |
Standard | Concurrent | Time | Yes |
These are only defaults that can be overridden with explicit configuration, but they’re correct for most benchmarks. To see why, let’s consider a simple but typical benchmark:
- Create a table
- Insert rows
- Execute queries
- Report query stats
Data definition language (DDL) refers primarily to CREATE
and ALTER
statements that define schemas, tables, and index.
In Finch, a DDL stage applies more broadly to include, for example, INSERT
statements used only to populate new tables.
(INSERT
is actually data manipulation language [DML], not DDL.)
A DDL stage executes the first two steps of a typical benchmark:
- Create a table
- Insert rows
It’s important that these two steps are executed in order and only once per run.
For example, INSERT INTO t
is invalid before CREATE TABLE t
.
Likewise, running CREATE TABLE t
more than once is invalid (and will cause an error).
There’s also a finite number of rows to insert.
Benchmark setups don’t insert rows for 5 minutes, or some such arbitrary limit, because good benchmarks need more precise data sizes.
Like most benchmark tools, Finch can limit this INSERT
by row count—insert 1,000,000 rows for example.
Finch can also insert rows until the table reaches a certain size.
Either way, the INSERT
runs until a certain data size (or row count) is reached, then it stops.
In Finch, you can achieve these two steps in a one stage and two trx file, like:
Stage File: setup.yaml
stage:
trx:
- file: schema.sql
- file: rows.sql
Trx File: schema.sql
CREATE TABLE t (
/* columns */
)
Trx File: rows.sql
-- rows: 1,000,000
INSERT INTO t VALUES /* ... * /
The stage lists the trx in order: schema first, then rows. Trx order is important in this case for two reasons.
First, this minimally-configured stage relies on Finch auto-detecting the DDL and executing the workload sequentially in stage.trx
order.
(This auto-detection can be disabled or overridden, but it’s usually correct.)
Second, the CREATE TABLE
must execute only once, but the INSERT
needs to execute one millions times.
(Terribly inefficient, but it’s just an example.)
The -- rows: 1,000,000
is a data limit, and it applies to the Finch trx where it appears, not to the statement.
Intuitively, yes, it should apply only to the statement, but for internal code efficiency reasons it doesn’t yet work this way; it applies to the whole trx.
As such, the INSERT
must be specified in a separate Fix trx file.
But presuming the INSERT
statement can be executed in parallel, a separate trx file makes it possible to insert in parallel with multiple clients by configuring the workload:
stage:
workload:
- trx: [schema.sql]
- trx: [rows.sql]
clients: 16 # paraellel INSERT
trx:
- file: schema.sql
- file: rows.sql
By adding a stage.workload
section, you tell Finch how to run each trx.
In this case, Finch will execute schema.sql
once with one client; then it will execute rows.sql
with 16 clients until 1,000,000 rows have been inserted.
This simple example works, but Benchmark / Examples shows better methods and explains some magic happening behind the scenes.
A standard stage executes queries, measures response time, and reports the statistics. This is what engineers think of and expect from a standard database benchmark:
- Execute queries
- Report query stats
Finch handles the stats, so your focus is queries and how to execute them (the workload).
Finch benchmarks are declarative: write the real SQL statement that you want to benchmark.
Let’s imagine that you want to benchmark the two most important read queries from your application.
Put them in a file called reads.sql
(or whatever name you want):
SELECT c1 FROM t WHERE id = @id
SELECT c2 FROM t WHERE n BETWEEN @n AND @PREV LIMIT 10
These are fake queries, so don’t worry about the details. The point is: they’re real SQL statements; no special benchmark scripting language.
Since you probably already know SQL, you can spend time learning:
- How and why to model transactions in Finch: Benchmark / Trx
- The very simple Finch trx file syntax: Syntax / Trx File
- How to use data generators (
@d
): Data / Generators
The first two are trivial—learn once and done. Data generators can be simple or complex depending on your benchmark.
Queries are inert until executed, and that’s what the workload does: declare how to execute the queries (SQL statements written in Finch trx files). Like data generators, the workload can be simple or complex depending on your benchmark. Consequently, there’s a whole page just for workload: Benchmark / Workload.
Two examples of other stages are “warm up” and “tear down”. A warm up stage is typically executed before a standard stage to populate database caches. A clean up stage is typically executed after a standard stage to remove the schemas created by the DDL stage.
Finch does not have any built-in, hard-coded, or required stages. You can name your stages (almost) anything and execute them in any order. Aside from some auto-detection (that can be overridden), Finch treats all stages equally.
You can run multiple stages in a single run of Finch:
finch setup.yaml benchmark.yaml cleanup.yaml
That runs stage setup.yaml
, then stage benchmark.yaml
, then stage cleanup.yaml
.