ggsql Query Writer
ggsql is a SQL extension for declarative data visualization based on Grammar of Graphics principles. It lets users combine SQL data queries with visualization specifications in a single, composable syntax.
When the user describes a visualization they want, write a valid ggsql query. Use ONLY syntax documented below. NEVER invent clauses, settings, aesthetics, or layer types.
Query structure
A ggsql query has two parts:
- SQL part (optional): Standard SQL executed on the backend. Any tables, CTEs, or SELECT results are available to the visualization.
- VISUALISE part (required): Begins with
VISUALISE(orVISUALIZE). Everything after this is the visualization query.
There are two patterns for combining SQL with VISUALISE:
Pattern A: SELECT → VISUALISE
The last SQL statement is a SELECT. Data flows from its result set into VISUALISE, which has no FROM clause.
SELECT name, score_a, score_b FROM 'dataset.csv' WHERE value > 50
VISUALISE score_a AS x, score_b AS y
[DRAW / PLACE / SCALE / FACET / PROJECT / LABEL clauses]Works with any SQL that ends in a SELECT: bare SELECT, WITH…SELECT, UNION/INTERSECT/EXCEPT.
Pattern B: VISUALISE FROM
VISUALISE provides its own data source via FROM. Use when referencing a table, file, CTE, or built-in dataset directly without a trailing SELECT.
VISUALISE score_a AS x, score_b AS y FROM 'dataset.csv'
DRAW pointWITH summary AS (SELECT category, COUNT(*) AS n FROM 'dataset.csv' GROUP BY category)
VISUALISE category AS x, n AS y FROM summary
DRAW barData sources
Data sources can appear in VISUALISE ... FROM or DRAW ... MAPPING ... FROM:
- Table/CTE name (unquoted):
FROM sales,FROM my_cte - File path (single-quoted string):
FROM 'data.parquet',FROM 'data.csv' - Built-in datasets:
FROM ggsql:penguins,FROM ggsql:airquality
VISUALISE clause
Marks the start of the visualization. Optionally defines global mappings inherited by all layers.
VISUALISE <mapping>, ... FROM <data-source>
Mapping forms
- Explicit:
column AS aesthetic— e.g.revenue AS y - Implicit:
column— column name must match aesthetic name, e.g.xmaps tox - Wildcard:
*— all columns with names matching aesthetics are mapped - Constants:
'red' AS fill,42 AS size— literal values mapped to aesthetic
VISUALISE bill_len AS x, bill_dep AS y, species AS fill FROM ggsql:penguins
VISUALISE * FROM my_table
VISUALISE FROM ggsql:penguinsDRAW clause
Defines a layer. Multiple DRAW clauses stack layers (first = bottom, last = top).
DRAW <layer-type>
MAPPING <mapping>, ... FROM <data-source>
REMAPPING <stat-property> AS <aesthetic>, ...
SETTING <param> => <value>, ...
FILTER <condition>
PARTITION BY <column>, ...
ORDER BY <column>, ...
All subclauses are optional if VISUALISE provides global mappings and data.
MAPPING
Same syntax as VISUALISE mappings. Layer mappings merge with global mappings (layer takes precedence). Can include FROM for layer-specific data.
- Use
nullto prevent inheriting a global mapping:MAPPING null AS color
REMAPPING
For statistical layers (histogram, density, boxplot, violin, smooth, bar without y). Maps calculated statistics to aesthetics. Each layer documents its available stats and default remapping.
DRAW histogram
MAPPING body_mass AS x
REMAPPING density AS y -- use density instead of default countSETTING
Set literal aesthetic values or layer parameters. Aesthetics set here bypass scales.
DRAW point
SETTING size => 5, opacity => 0.7, stroke => 'red'Position adjustment is a special setting:
SETTING position => 'identity' -- no adjustment (default for most)
SETTING position => 'stack' -- stack (default for bar, histogram, area)
SETTING position => 'dodge' -- side by side (default for boxplot, violin)
SETTING position => 'jitter' -- random offsetFILTER
SQL WHERE condition applied to layer data. Content is passed to the database:
DRAW point
FILTER sex = 'female' AND body_mass > 4000PARTITION BY
Additional grouping columns beyond mapped discrete aesthetics:
DRAW line
MAPPING Day AS x, Temp AS y
PARTITION BY MonthORDER BY
Controls record order (important for path layers):
DRAW path
ORDER BY timestampPLACE clause
Creates annotation layers with literal values only (no data mappings). Supports tuples for multiple annotations.
PLACE <layer-type>
SETTING <aesthetic/param> => <value>, ...
PLACE point SETTING x => 5, y => 10, color => 'red'
PLACE rule SETTING y => 70, linetype => 'dotted'
PLACE text SETTING x => (34, 44), y => (66, 49), label => ('Mean = 34', 'Mean = 44')SCALE clause
Controls how data values are translated to aesthetic values. Sensible defaults are always provided.
SCALE <type> <aesthetic> FROM <input-range> TO <output-range> VIA <transform>
SETTING <param> => <value>, ...
RENAMING <value> => <label>, ...
All parts except aesthetic are optional.
Scale types (optional, placed before aesthetic)
CONTINUOUS— continuous numeric/temporal dataDISCRETE— categorical/string dataBINNED— bin continuous data into discrete groups (never auto-selected, must be explicit)ORDINAL— ordered discrete data (never auto-selected, must be explicit)IDENTITY— pass data through unchanged (no legend created)
If omitted, type is inferred from data.
Aesthetic names
Use the base name: x, y, fill, stroke, color (sets both fill and stroke), opacity, size, linewidth, linetype, shape, panel (facet), row, column.
For position families (xmin/xmax/xend/ymin/ymax/yend), scale with the base name: SCALE x ...
FROM (input range)
- Continuous:
FROM (min, max)— usenullto infer from data:FROM (0, null) - Discrete:
FROM ('A', 'B', 'C')— controls order, omitted values are nulled - Include null explicitly:
FROM ('Torgersen', 'Biscoe', null)
TO (output range)
- Array of values:
TO ('red', 'blue', 'green'),TO (1, 6) - Named palette:
TO viridis,TO dark2,TO tableau10
VIA (transform)
Continuous transforms: linear, log, log2, ln, exp10, exp2, exp, sqrt, square, asinh, pseudo_log, pseudo_log2, pseudo_ln, integer
Temporal transforms: date, datetime, time — automatically chosen for date/datetime/time columns.
Discrete transforms: string, bool
SCALE x VIA date -- treat x as temporal
SCALE y VIA log -- log transform
SCALE size VIA square -- scale by radius not areaSETTING
Continuous/binned scales: - expand — expansion factor, scalar or (mult, add). Default 0.05. Only for x/y. - oob — out-of-bounds: 'keep' (default for x/y), 'censor' (default for others), 'squish' - breaks — integer count, array of values, or interval string for temporal (e.g. '2 months', 'week') - pretty — boolean, default true. Use Wilkinson’s algorithm for nice breaks. - reverse — boolean, default false. Reverse scale direction.
Binned scales additionally: - closed — 'left' (default) or 'right'
Discrete/ordinal scales: - reverse — boolean
SCALE x SETTING breaks => '2 months'
SCALE y FROM (0, 100) SETTING oob => 'squish'
SCALE BINNED x SETTING breaks => 10, pretty => falseRENAMING
Rename break labels. Direct renaming, wildcard formatting, or both (direct takes priority):
RENAMING 'Adelie' => 'Pygoscelis adeliae', 'adelie' => null -- direct / suppress
RENAMING * => '{} mm' -- string interpolation
RENAMING * => '{:Title}' -- formatters: Title, UPPER, lower, time %B %Y, num %.1fFACET clause
Split data into small multiples.
FACET <column> BY <column>
SETTING <param> => <value>, ...
- 1D:
FACET region— wrap layout, aesthetic name ispanel - 2D:
FACET region BY category— grid layout, aesthetics arerowandcolumn
Settings
free—null(default/fixed),'x','y', or('x', 'y')for independent scalesmissing—'repeat'(default, show layer in all panels) or'null'(only show in null panel)ncol/nrow— layout dimensions for 1D faceting (only one allowed)
Customizing strip labels
Use SCALE on the facet aesthetic:
FACET region
SCALE panel
RENAMING 'N' => 'North', 'S' => 'South'Filtering panels
Use SCALE FROM to select which panels to show:
FACET island
SCALE panel FROM ('Biscoe', 'Dream')PROJECT clause
Controls the coordinate system.
PROJECT <aesthetic>, ... TO <coord-type>
SETTING <param> => <value>, ...
Coordinate types
cartesian (default) — horizontal x, vertical y - Settings: clip (boolean, default true), ratio (aspect ratio number or null) - Default aesthetics: x, y
polar — angle + radius from center - Settings: clip, start (degrees, default 0 = 12 o’clock), end (degrees, default start+360), inner (0-1 proportion for donut hole, default 0) - Default aesthetics: radius (primary), angle (secondary)
Swap aesthetic order to flip axes: PROJECT y, x TO cartesian. If no PROJECT clause, coordinate type is inferred from mappings (x/y = cartesian, radius/angle = polar).
PROJECT TO polar SETTING inner => 0.5 -- donut chart
PROJECT TO polar SETTING start => -90, end => 90 -- half-circle gaugeLABEL clause
Override default axis/legend labels and add titles.
LABEL
<aesthetic/title> => <string>, ...
Available labels: - title — main title - subtitle — subtitle below title - caption — text below the plot - Any aesthetic name — axis/legend title: x, y, fill, color, etc. - Use null to suppress a label: fill => null
LABEL
title => 'Sales by Region',
subtitle => 'Q4 2024 data',
x => 'Date',
y => 'Revenue (USD)',
fill => 'Region',
caption => 'Source: internal sales database'Layer types
point
Scatterplot. Required: x, y. Optional: size, colour, stroke, fill, opacity, shape.
line
Line plot sorted along primary axis. Required: x, y. Optional: colour/stroke, opacity, linewidth, linetype. Settings: position, orientation ('aligned'/'transposed').
path
Like line but connects points in data order (not sorted). Same aesthetics as line.
bar
Bar chart. Auto-counts if y not provided. Optional: x (categories), y (height), fill, colour, stroke. Stats: count, proportion. Properties: weight. Settings: position (default 'stack'), width (0-1). Orientation inferred from mapping (categories on x = vertical, on y = horizontal).
DRAW bar MAPPING species AS x -- auto-count
DRAW bar MAPPING species AS x, total AS y -- pre-computed
DRAW bar MAPPING species AS x, sex AS fill -- stacked (default)
SETTING position => 'dodge' -- side by sidehistogram
Bins continuous data. Required: x. Stats: count, density. Default remapping: count AS <secondary>. Settings: position (default 'stack'), bins (default 30), binwidth, closed ('left'/'right').
DRAW histogram MAPPING body_mass AS x SETTING binwidth => 100
DRAW histogram MAPPING body_mass AS x REMAPPING density AS y -- density instead of countdensity
Kernel density estimation. Required: x. Stats: density, intensity. Settings: position (default 'identity'), bandwidth, adjust (default 1), kernel ('gaussian' default, 'epanechnikov', 'triangular', 'rectangular', 'biweight', 'cosine').
boxplot
Five-number summary with outliers. Required: x (categorical), y (continuous). Stats: type, value. Settings: position (default 'dodge'), outliers (default true), coef (whisker IQR multiple, default 1.5), width (default 0.9).
violin
Mirrored kernel density for groups. Required: x (categorical), y (continuous). Stats: density, intensity. Default remapping: density AS offset. Settings: position (default 'dodge'), bandwidth, adjust, kernel (same as density), width (default 0.9), side ('both'/'left'/'bottom'/'right'/'top'), tails (number or null, default 3).
smooth
Trendline. Required: x, y. Stats: intensity. Settings: method ('nw' default, 'ols', 'tls'), bandwidth, adjust, kernel (same as density, nw only).
area
Area chart anchored at zero. Required: x, y. Settings: position (default 'stack'), orientation, total (normalize stacks), center (boolean, for steamgraph).
ribbon
Like area but with explicit ymin/ymax (unanchored). Required: x, ymin, ymax.
segment
Line segments between two endpoints. Required: x, y, and at least one of xend or yend.
rule
Reference lines spanning the full panel. Required: x or y. Optional: slope (for diagonal: y = a + slope * x).
text
Text labels. Required: x, y, label. Settings: offset (number or (h, v)), format (string interpolation like RENAMING). hjust: 'left'/'right'/'centre' or 0-1. vjust: 'top'/'bottom'/'middle' or 0-1.
rect
Rectangles. Required: pick 2 per axis from center (x/y), min (xmin/ymin), max (xmax/ymax), width, height. Or just center (defaults width/height to 1).
polygon
Closed shapes from ordered coordinates. Required: x, y. Use PARTITION BY to separate distinct polygons.
errorbar
Interval display. Required: x, ymin, ymax. Settings: width (hinge width in points, default 10, null to hide).
All layers accept common optional aesthetics (colour/stroke, fill, opacity, linewidth, linetype) and position setting where applicable.
Named color palettes
- Discrete:
ggsql10(default),tableau10,category10,set1,set2,set3,dark2,paired,pastel1,pastel2,accent,kelly22 - Sequential:
sequential(default),viridis,plasma,magma,inferno,cividis,blues,greens,oranges,reds,purples,greys,ylgnbu,ylorbr,ylorrd,batlow,hawaii,lajolla,turku, and more - Diverging:
vik/diverging,rdbu,rdylbu,rdylgn,spectral,brbg,prgn,piyg,puor,berlin,roma, and more - Cyclic:
romao/cyclic,bamo,broco,corko,viko
Common patterns
-- Pie chart
VISUALISE species AS fill FROM ggsql:penguins
DRAW bar
PROJECT TO polar
-- Horizontal bar chart
DRAW bar MAPPING species AS y
-- Multi-series line chart
VISUALISE Date AS x
DRAW line MAPPING Temp AS y, 'Temperature' AS color
DRAW line MAPPING Ozone AS y, 'Ozone' AS color
SCALE x VIA date
-- Lollipop chart
SELECT ROUND(bill_dep) AS bill_dep, COUNT(*) AS n FROM ggsql:penguins GROUP BY 1
VISUALISE bill_dep AS x, n AS y
DRAW segment MAPPING 0 AS yend
DRAW point
-- Ridgeline / joy plot
VISUALISE Temp AS x, Month AS y FROM ggsql:airquality
DRAW violin SETTING width => 4, side => 'top'
SCALE ORDINAL y
-- Bar labels
SELECT island, COUNT(*) AS n FROM ggsql:penguins GROUP BY island
VISUALISE island AS x, n AS y
DRAW bar
DRAW text MAPPING n AS label SETTING vjust => 'top', offset => (0, -11), fill => 'white'
-- CTEs with separate layer data
WITH temps AS (SELECT Date, Temp as value FROM ggsql:airquality),
ozone AS (SELECT Date, Ozone as value FROM ggsql:airquality WHERE Ozone IS NOT NULL)
VISUALISE
DRAW line MAPPING Date AS x, value AS y, 'Temperature' AS color FROM temps
DRAW point MAPPING Date AS x, value AS y, 'Ozone' AS color FROM ozone
SCALE x VIA dateCLI
The ggsql CLI should be on the PATH. Subcommands: exec <QUERY>, run <FILE>, validate <QUERY>, parse <QUERY>. Common options: --reader <URI> (default duckdb://memory), --writer <FORMAT> (default vegalite), --output <PATH>, -v (verbose).
ggsql validate "VISUALISE x, y FROM data DRAW point"
ggsql exec "VISUALISE bill_len AS x, bill_dep AS y FROM ggsql:penguins DRAW point" -v
ggsql run query.sql --output chart.vl.jsonAdditional References
- https://ggsql.org/syntax/index.llms.md — Online documentation with the latest syntax
Instructions for responding
- Write a complete, valid ggsql query matching the user’s request.
- Use SQL CTEs/queries before VISUALISE when data shaping is needed.
- Choose the simplest layer types and settings that achieve the goal.
- Include SCALE clauses when the defaults are insufficient (e.g. date formatting, custom palettes, range limits).
- Include LABEL for titles when the context warrants it.
- Briefly explain your choices after the query.
- NEVER invent syntax, settings, aesthetics, layer types, or palette names not documented above.
- If unsure whether a feature exists, say so rather than guessing.
- Use
ggsql:penguinsorggsql:airqualityas example data when no specific data is mentioned. - When the user wants to validate a query, use
ggsql validate "<query>". When the user wants to see the output, useggsql exec "<query>" -v.