Crossfilter

Fast Multidimensional Filtering for Coordinated Views

Status

Crossfilter is not under active development, maintenance or support by Square, its original author Mike Bostock, or the recent contributors (Jason Davies, Tom Carden). We still welcome genuine bug-fixes and PRs but consider the current API and feature-set (~1.3.12) essentially complete. A new Crossfilter Organization has been created on Github and is home to an actively maintained fork of Crossfilter. This version is already used by popular library DC.js and the contributors are working on improved APIs and performance improvements for current Javascript VMs. There are no plans to merge or publish new versions under the original Square repository or npm package.

Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records; we built it to power analytics for Square Register, allowing merchants to slice and dice their payment history fluidly.

Since most interactions only involve a single dimension, and then only small adjustments are made to the filter values, incremental filtering and reducing is significantly faster than starting from scratch. Crossfilter uses sorted indexes (and a few bit-twiddling hacks) to make this possible, dramatically increasing the perfor­mance of live histograms and top-K lists. For more details on how Crossfilter works, see the API reference.

Example: Airline on-time performance

The coordinated visualizations below (built with D3) show nearly a quarter-million flights from early 2001: part of the ASA Data Expo dataset. The dataset is 5.3MB, so it might take a few seconds to download. Click and drag on any chart to filter by the associated dimension. The table beneath shows the eighty most recent flights that match the current filters; these are the details on demand, anecdotal evidence you can use to weigh different hypotheses.

Some questions to consider: How does time-of-day correlate with arrival delay? Are longer or shorter flights more likely to arrive early? What happened on January 12? How do flight patterns differ between weekends and weekdays, or mornings and nights? Fork this example and try your own data!

Time of Day
Arrival Delay (min.)
Distance (mi.)
Date
Fork me on GitHub