VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW densityDensity
Layers are declared with the
DRAWclause. Read the documentation for this clause for a thorough description of how to use it.
Visualise the distribution of a single continuous variable by computing a kernel density estimate. It has a similar interpretation as a histogram but smoothing out observations rather than binning them.
Aesthetics
The following aesthetics are recognised by the density layer.
Required
x: Position on the x-axis.
Optional
stroke: The colour of the contour lines.fill: The colour of the inner area.colour: Shorthand for settingstrokeandfillsimultaneously.opacity: The opacity of the colours.linewidth: The width of the contour lines.linetypeThe dash pattern of the contour line.
Settings
position: Determines the position adjustment to use for the layer (default is'identity')bandwidth: A numerical value setting the smoothing bandwidth to use. If absent (default), the bandwidth will be computed using Silverman’s rule of thumb.adjust: A numerical value as multiplier for thebandwidthsetting, with 1 as default.kernel: Determines the smoothing kernel shape. Can be one of the following:'gaussian'(default)'epanechnikov''triangular''rectangular'or'uniform''biweight'or'quartic''cosine'
Data transformation
The density layer will compute a 1-dimensional grid using the range of the data. The distances between the grid locations and observations are computed (\(x - x_i\)) and serve as input for a kernel function. The contributions of each observation is then averaged across the grid.
\[ \frac{1}{(\sum_{i=1}^{n}w_i)h}\sum_{i=1}^{n}w_iK \left(\frac{x - x_i}{h}\right) \]
Where:
- \(K\) is the kernel function
- \(h\) is the bandwidth
- \(w_i\) is the weight of observation \(i\)
By default \(w_i = 1\), so the procedure simplifies thus:
\[ \frac{1}{nh}\sum_{i=1}^{n}K \left(\frac{x - x_i}{h}\right) \]
Properties
weight: If mapped, it sets the relative contribution of an observation \(w_i\) to the density estimate.
Calculated statistics
density: The estimated probability density per point on the grid. The total area of a single density curve adds up to 1.intensity: Also termed ‘probability intensity estimation’, it is the precursor of thedensityvariable. Specifically it is the same as the density without normalisation, i.e. it omits the \(\frac{1}{nh}\) part of the computation. You can useREMAPPING intensity AS yif you want to reflect differences in group sizes.
Default remappings
density AS y: By default the density layer will display the computed density along the y-axis.
Examples
A typical KDE computation with different groups:
Changing the relative bandwidth through the adjust setting.
VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW density SETTING adjust => 0.1Stacking the different groups instead of overlaying them.
VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW density SETTING position => 'stack'Using weighted estimates by mapping a column to the optional weight aesthetic. Note that the difference in output is subtle.
VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW density MAPPING body_mass AS weightIf you want to compare a histogram and a density layer, you can use the intensity computed variable to match the histogram scale.
VISUALISE bill_len AS x FROM ggsql:penguins
DRAW histogram SETTING opacity => 0.5
DRAW density
REMAPPING intensity AS y
SETTING opacity => 0.5Using the intensity rather than the density also portrays differences in group sizes better. Note the relative height of the groups.
VISUALISE bill_dep AS x, species AS colour FROM ggsql:penguins
DRAW density REMAPPING intensity AS y