SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality
VISUALISE numdate AS x, Temp AS y
DRAW point
DRAW smoothSmooth
Layers are declared with the
DRAWclause. Read the documentation for this clause for a thorough description of how to use it.
Smooth layers are used to display a trendline among a series of observations.
Aesthetics
Required
- Primary axis (e.g.
x): Position along the primary axis. - Secondary axis (e.g.
y): Position along the secondary axis.
Optional
colour/stroke: The colour of the lineopacity: The opacity of the linelinewidth: The width of the linelinetype: The type of line, i.e. the dashing pattern
Settings
method: Choice of the method for generating the trendline. One of the following:'nw'or'nadaraya-watson'estimates the trendline using the Nadaraya-Watson kernel regression method (default).'ols'estimates a straight trendline using ordinary least squares method.'tls'estimates a straight trendline using total least squares method.
The settings below only apply when method => 'nw' and are ignored when using other methods. * bandwidth: A numerical value setting the smoothing bandwidth to use. If absent (default), the bandwidth will be computed using Silverman’s rule of thumb. * adjust: A numerical value as multiplier for the bandwidth setting, with 1 as default. * kernel: Determines the smoothing kernel shape. Can be one of the following: * 'gaussian' (default) * 'epanechnikov' * 'triangular' * 'rectangular' or 'uniform' * 'biweight' or 'quartic' * 'cosine'
Data transformation
Nadaraya-Watson kernel regression
The default method => 'nw' computes a locally weighted average of \(y\).
\[ y(x) = \frac{\sum_{i=1}^nW(x)y_i}{\sum_{i=1}^nW(x)} \]
Where:
- \(W(x)\) is kernel intensity \(w_iK(\frac{x - x_i}{h})\) where
- \(K\) is the kernel function
- \(h\) is the bandwidth
- \(w_i\) is the weight of observation \(i\)
Please note the similarity of \(W(x)\) to the kernel density estimation formula.
Ordinary least squares
The method => 'ols' setting uses ordinary least squares to compute the intercept \(a\) and slope \(b\) of a straight line. The method minimizes the 1-dimensional distance between a point and the vertical projection of that point on the line. Only considering the vertical distances implies having measurement error in \(y\), but not \(x\).
\[ y = a + bx \]
Wherein:
\[ a = E[Y] - bE[X] \]
and
\[ b = \frac{\text{cov}(X, Y)}{\text{var}(X)} = \frac{E[XY] - E[X]E[Y]}{E[X^2]-(E[X])^2} \]
Total least squares
The method => 'tls' setting uses total least squares to compute the intercept \(a\) and slope \(b\) of a straight line. The method minimizes the 2-dimensiontal distance between a point and the perpendicular projection of that point on the line. Minimising the perpendicular distances (rather than just the vertical distances) makes sense if there is uncertainty or measurement error in not just \(y\), but in \(x\) as well. In such case, it is a more accurate depiction of the relationship between \(x\) and \(y\), but it isn’t the best predictor of \(y\) given \(x\).
\[ y = a + bx \]
Wherein:
\[ a = E[Y] - bE[X] \]
and
\[ b = \frac{\text{var}(Y) - \text{var}(X) + \sqrt{(\text{var}(Y) - \text{var}(X))^2 + 4\text{cov}(X, Y)^2}}{2\text{cov}(X, Y)} \]
Properties
weightis available when usingmethod => 'nw', where when mapped, it sets the relative contribution of an observation \(w_i\) to the average.
Calculated statistics
intensitycorresponds to \(y\) in the formulas described in the data transformation section.
Default remappings
intensity AS y: By default the smooth layer will display the \(y\) in the formulas along the y-axis.
Examples
The default method => 'nw' might be too coarse for timeseries.
You can make the fit more granular by reducing the bandwidth, for example using adjust.
SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality
VISUALISE numdate AS x, Temp AS y
DRAW point
DRAW smooth SETTING adjust => 0.2There is a subtle difference between the ordinary and total least squares method.
VISUALISE bill_len AS x, bill_dep AS y FROM ggsql:penguins
DRAW point
DRAW smooth MAPPING 'Ordinary' AS colour SETTING method => 'ols'
DRAW smooth MAPPING 'Total' AS colour SETTING method => 'tls'Simpson’s Paradox is a case where a trend of combined groups is reversed when groups are considered separately.
VISUALISE bill_len AS x, bill_dep AS y, species AS stroke FROM ggsql:penguins
DRAW point SETTING opacity => 0
DRAW smooth SETTING method => 'ols'
DRAW smooth MAPPING 'All' AS stroke SETTING method => 'ols'