Last verified · v1.0
Calculator · math
Ssy Calculator (Total Sum Of Squares For Y)
Compute SSY, the Total Sum of Squares for Y, with 2–10 data points. Instantly measures total variability around the mean for regression and ANOVA.
Inputs
Total Sum of Squares (SSY)
—
Explain my result
Get a plain-English breakdown of your result with practical next steps.
The formula
How the
result is
computed.
What Is SSY? The Total Sum of Squares for Y
SSY — also written as SSY or the Total Sum of Squares — quantifies the total variability present in a set of observed Y values. In regression analysis and ANOVA, SSY serves as the fundamental baseline measure of variance against which any model's performance is judged. It answers one essential question: how far do the observed data points scatter around their own arithmetic mean?
The SSY Formula
The formula for the Total Sum of Squares for Y is:
SSY = Σ (yi − ȳ)²
Where each variable plays a distinct role:
- yi — each individual observed Y value in the dataset (i = 1, 2, …, n)
- ȳ (y-bar) — the arithmetic mean of all Y values, computed as the sum of all yi divided by n
- n — the total number of data points included in the calculation
- Σ — the summation operator, applied across all n observations
Step-by-Step Derivation
Calculating SSY requires four clear steps:
- Compute the mean (ȳ): Add all Y values and divide by n. For Y = {4, 7, 13, 2}, ȳ = 26 / 4 = 6.5.
- Find each deviation: Subtract ȳ from every yi. Continuing: 4 − 6.5 = −2.5, 7 − 6.5 = 0.5, 13 − 6.5 = 6.5, 2 − 6.5 = −4.5.
- Square each deviation: (−2.5)² = 6.25, (0.5)² = 0.25, (6.5)² = 42.25, (−4.5)² = 20.25.
- Sum the squared deviations: SSY = 6.25 + 0.25 + 42.25 + 20.25 = 69.00.
Why Squaring the Deviations Matters
Squaring each deviation serves two important purposes. First, it eliminates cancellation — positive and negative deviations from the mean would otherwise sum to zero, concealing all variability. Second, squaring penalizes large deviations more heavily than small ones, giving SSY heightened sensitivity to outliers. This property makes SSY a reliable and informative measure of overall data spread, regardless of whether values cluster tightly or spread widely.
SSY in Regression and ANOVA
SSY forms the denominator of the coefficient of determination, R²:
R² = 1 − (SSE / SSY) = SSR / SSY
SSY also anchors the fundamental ANOVA partition of total variance:
SSY = SSR + SSE
SSR (Regression Sum of Squares) represents the variation explained by the fitted model, while SSE (Error Sum of Squares) captures unexplained residual variation. According to the Simple Linear Regression Models reference by Jain (Washington University), this partition is the cornerstone of assessing model fit. A large SSR relative to SSY signals a high-performing model; a small SSR signals poor explanatory power. The Penn State STAT 501, Lesson 6.3 on Sequential Sums of Squares further demonstrates how SSY anchors F-tests and model comparison in multiple regression settings.
Practical Example: Monthly Sales Data
A retailer records five months of sales figures (in thousands of dollars): 12, 15, 20, 18, 25. Step 1 — compute the mean: ȳ = 90 / 5 = 18. Step 2 — compute deviations: −6, −3, 2, 0, 7. Step 3 — square each: 36, 9, 4, 0, 49. Step 4 — sum: SSY = 36 + 9 + 4 + 0 + 49 = 98. Before any forecasting model is applied, SSY = 98 fully characterizes the total variability in this sales dataset, providing the reference point for every subsequent R² or F-statistic calculation.
Primary Use Cases for SSY
- Regression diagnostics: SSY benchmarks the total variance a model must explain; high SSY with high SSR indicates a well-fit model.
- ANOVA tables: Labeled SST (Total Sum of Squares), SSY anchors the F-test for overall model significance across groups.
- Model comparison: Identical SSY values across competing models ensure fair R² comparisons without scale distortion.
- Quality control: Engineers measure SSY to quantify process variability before and after interventions, tracking improvement objectively.
- Outlier detection: A single observation contributing a disproportionately large squared deviation signals a potential outlier requiring investigation.
Common Calculation Mistakes
The most frequent error is confusing SSY with the variance. Sample variance equals SSY divided by n − 1; population variance equals SSY divided by n. SSY itself is the raw sum before any division. A second common mistake is substituting the predicted values ŷ for the grand mean ȳ — that operation computes SSE (residual error), not SSY. Always confirm that deviations are measured from the observed mean, not from model predictions.
Authoritative References
This methodology follows the definitions established in the UC Berkeley Data Analysis Toolkit #10: Simple Linear Regression by Kirchner, the algebraic derivations in the University of Colorado statistics lecture notes on SSY and total variability in regression, and the ANOVA partition framework from Penn State STAT 501, Lesson 6.3.
Reference