Easily filter JavaScript objects with Arquero

Coding in JavaScript has many advantages, but data management is probably not at the top of the list. However, there’s good news for those who find JavaScript data challenging: the same “data grammar” ideas behind the hugely popular dplyr R package are also available in JavaScript, thanks to the Arquero library.

Arquero, from the University of Washington’s Interactive Data Lab, is probably best known to Observable JavaScript users, but it’s also available in other ways. One of them is Node.js.

This article will show you how to filter JavaScript objects with Arquero, with some bonus tasks at the end.

Step 1. Charge Arquero

Arquero is a standard library with Observable JavaScript and in Quarto, that’s how I use it. In this case, no installation is necessary. If you are using Arquero in Node, you will need to install it with npm install arquero --save. In the browser, use .

In Observable you can load Arquero with import {aq, op} from "@uwdata/arquero". In the browser, Arquero will be loaded as aq. In Node you can load it with const aq = require('arquero').

The rest of the code in this tutorial should run as is in Observable and Quarto. If you’re using it in an asynchronous environment like Node, you’ll need to make the necessary adjustments for loading and processing data.

Step 2. Transform your data into an Arquero table

You can turn an existing “normal” JavaScript object into an Arquero table with aq.from(my_object).

Another option is to directly import remote data as an Arquero table with Arquero’s load family of functions—functions like aq.loadCSV("myurl.com/mycsvfile.csv") for a CSV file and aq.loadJSON("myjsonurl.com/myjsonfile.json") for a JSON file on the web. You can find more information about table input functions on the Arquero API documentation website.

In order to follow the rest of this tutorial, run the code below to import sample population change data for US states.


states_table = aq.loadCSV("https://raw.githubusercontent.com/smach/SampleData/master/states.csv")

Arquero tables have a particularity view() method for use with Observable JavaScript and in Quarto. The states_table.view() The command returns something like the output shown in Figure 1.

Table with columns for State, Pop_2000, Pop_2010, Pop_2020, PctChange_2000, Pct_change_2010, Sharon Machlis

Figure 1. The result of using the Arquero table view() method.

Observable JavaScript Inputs.table(states_table) (which has clickable column headers for sorting) also works to display an Arquero table.

Outside of Observable you can use states_table.print() to print the table to the console.

Step 3. Filter Rows

Arquero tables have a lot built-in methods for data processing and analysis, including filtering rows for specific conditions with filter().

A note to R users: Arqueros filter() syntax is not as simple as dplyr filter(Region == 'RegionName'). Since this is JavaScript and most functions are not vectorized, you need to create an anonymous function with d => then execute another function inside of it, usually a function of op (imported above with arquero). Even if you’re used to a language other than JavaScript, once familiar with this construct, it’s pretty easy to use.

The usual syntax is:


filter(d => op.opfunction(d.columnname, 'argument')

In this example, the op the function I want is op.equal(), which (as the name suggests) tests for equality. So, the Arquero code for only states in the northeastern region of the United States would be:


states_table
  .filter(d => op.equal(d.Region, 'Northeast'))

You can add .view() at the end to see the results.

A note on the filter() syntax: The code inside filter() is an Arquero array expression. “At first glance, table expressions look like normal JavaScript functions…but wait!” explains the Arquero website API reference website. “Under the hood, Arquero takes a set of function definitions, maps them to strings, then parses, rewrites, and compiles them to efficiently manage data internally.”

What does this mean to you? In addition to the usual JavaScript function syntax, you can also use table expression syntax such as filter("d => op.equal(d.Region, 'Northeast')") Where filter("equal(d.Region, 'Northeast')"). Check out the API reference if you think one of these versions might be more appealing or useful.

It also means that you cannot use any type of JavaScript function in filter() and other Arquero verbs. For instance, for loops are not permitted unless wrapped in a escape() “helps expression.” See the Arquero API reference to learn more.

A note to Python users: Archer filter is designed to create subsets of rows only, not the rows either Where columns, as seen with pandas.filter. (We’ll move on to columns next.)

Filters can be more complex than a single test, with negative or multiple conditions. For example, if you want “one-word state names in the West region”, you would search state names that do not include spaces and Equal Region West. One way to achieve this is !op.includes(d.State, ' ') && op.equal(d.Region, 'West') inside of filter(d =>) anonymous function:


states_table
  .filter(d => !op.includes(d.State, ' ') && 
     op.equal(d.Region, 'West'))

To search and filter by regular expression instead of equality, use op.match() instead of op.equal().

Step 4. Select Columns

Select only some Columns is similar to dplyr select(). In fact, it’s even easier, since you don’t need to turn the selection into an array; argument is just comma separated column names inside select()::


states_table
  .select('State', 'State Code', 'Region', 'Division', 'Pop_2020')

You can rename columns by selecting them, using the syntax: select{{ OldName1: 'NewName1', OldName2: 'NewName2' }). Here is an example :


states_table
  .select({ State: 'State', 'State Code': 'Abbr', Region: 'Region', 
      Division: 'Division', Pop_2020: 'Pop' })

Step 5. Create an array of unique values ​​in a table column

It can be useful to get the unique values ​​of a column as a vanilla JavaScript array, for tasks such as populating an input dropdown. Arquero has several functions to achieve this:

  • dedupe() gets unique values.
  • orderby() sort the results.
  • array() transforms data from an Arquero table column into a conventional JavaScript array.

Here is a way to create a sorted array of unique division names from states_table:


region_array = states_table
  .select('Region')                                      
  .dedupe()                                                                 
  .orderby('Region')
  .array('Region')

Since this new object is a JavaScript array, Arquero methods will no longer work on it, but conventional array methods will. Here is an example :


'The regions are ' + region_array.join(', ')

This code gets the following output:

"The regions are , Midwest, Northeast, South, West"

This first comma in the string above is due to the fact that there is a bad value in the table. If you want to remove empty values ​​like null, you can use the Arquero op.compact() function on the results:


  region_array2 = op.compact(states_table
  .select('Region')                                      
  .dedupe()                                                                 
  .orderby('Region')
  .array('Region')
  )

Another option is to use vanilla JavaScript filter() to remove null values ​​from an array of text strings. Note that the following vanilla JavaScript filter() function for one-dimensional JavaScript arrays is not the same as that of Arquero filter() for two-dimensional Arquero tables:


 region_array3 = states_table
  .select('Region')                                      
  .dedupe()                                                                 
  .orderby('Region')
  .array('Region')
  .filter(n => n)

Observable JavaScript users, including those using Quarto, can also use the md function to add style to string, like bold text with **. So this code

md`The regions are **${region_array2.join(', ')}**.`

produces the following output:


The regions are Midwest, Northeast, South, West

By the way, note that the Intl.ListFormat() JavaScript object makes it easy to add “and” before the last element in a comma-separated string array. So the code


my_formatter = new Intl.ListFormat('en', { style: 'long', type: 'conjunction' });
my_formatter.format(region_array3)

produces the output:


"Midwest, Northeast, South, and West"

There is much more to Arquero

Filtering, selecting, deduplicating and creating arrays barely scratch the surface of what Arquero can do. The library contains verbs for reshaping, merging, data aggregation, etc., as well as op calculation and analysis functions such as mean, median, quantile, rankings, lag and lead. See Introducing Arquero for an overview of more features. Also see An Illustrated Guide to Arquero Verbs and the Arquero API Documentation for a complete list, or visit the Data Wrangler Observable Notebook for an interactive app showing what Arquero can do.

To learn more about Observable JavaScript and Quarto, don’t miss A Beginner’s Guide to Using Observable JavaScript, R, and Python with Quarto and Learn Observable JavaScript with Observable notebooks.

Copyright © 2022 IDG Communications, Inc.

Comments are closed.