Last verified · v1.0
Calculator · math
Scatter Plot Linear Regression Calculator
Calculate linear regression slope, y-intercept, correlation coefficient, and predicted y values from up to 8 scatter plot data points.
Inputs
Regression Result
—
Explain my result
Get a plain-English breakdown of your result with practical next steps.
The formula
How the
result is
computed.
Scatter Plot Linear Regression Calculator: Formula, Method & Examples
A scatter plot calculator with linear regression identifies the best-fit straight line through a set of bivariate data points, quantifies the strength of that relationship, and enables accurate predictions at any x value. The method applied is ordinary least squares (OLS) regression, which minimizes the total squared vertical distance between each observed point and the fitted line. This approach is widely preferred because it is mathematically elegant, computationally efficient, and yields unbiased estimates of the slope and intercept under standard statistical assumptions.
The Linear Regression Equation
The regression line follows the standard slope-intercept form:
ŷ = mx + b
Here, ŷ (y-hat) is the predicted y value for a given x, m is the slope of the line, and b is the y-intercept — the value of ŷ when x equals zero. Together, these two parameters fully define the position and angle of the best-fit line through the scatter plot. The regression equation represents the most likely or average outcome for a given input value, and it forms the foundation for all predictions and inferences derived from the bivariate relationship.
Slope Formula (m)
The slope is calculated as:
m = (n∑xy − ∑x∑y) / (n∑x² − (∑x)²)
In this formula, n is the count of data points, ∑xy is the sum of each x multiplied by its paired y value, ∑x and ∑y are the sums of all x and y values respectively, and ∑x² is the sum of each squared x value. The slope describes how much the predicted y changes for every one-unit increase in x. A positive slope indicates that as x increases, y tends to increase; a negative slope indicates the opposite trend. The magnitude of the slope reflects the strength of this directional relationship — steeper slopes suggest stronger changes in y per unit change in x.
Y-Intercept Formula (b)
After computing the slope, the y-intercept is determined by:
b = (∑y − m∑x) / n
This formula anchors the regression line to the centroid of the data (x̄, ȳ), ensuring the line always passes through the mean of both variables — a fundamental property of OLS regression. The y-intercept has practical significance in many applications; for instance, in a sales forecast model, it may represent baseline revenue when advertising spend is zero. However, caution should be exercised when interpreting the intercept if zero lies outside the range of observed x values, as extrapolation beyond the data range introduces greater uncertainty.
Pearson Correlation Coefficient (r)
The Pearson correlation coefficient r measures both the direction and strength of the linear association:
r = (n∑xy − ∑x∑y) / √([n∑x² − (∑x)²][n∑y² − (∑y)²])
Values of r span from −1 to +1. A value of +1 indicates a perfect positive linear relationship, −1 a perfect negative relationship, and 0 indicates no linear pattern. As a general guideline, |r| > 0.7 signals a strong correlation, 0.4–0.7 a moderate correlation, and below 0.4 a weak correlation. The squared correlation coefficient, r², represents the proportion of variance in y that is explained by x, making it a useful measure of how well the regression line fits the observed data.
Worked Example
Consider 5 data points relating hours studied (x) to exam scores (y): (1, 55), (2, 60), (3, 65), (4, 75), (5, 80).
Computed sums: n = 5, ∑x = 15, ∑y = 335, ∑xy = 1,065, ∑x² = 55, ∑y² = 22,875.
- Slope: m = (5×1065 − 15×335) / (5×55 − 15²) = (5325 − 5025) / (275 − 225) = 300 / 50 = 6.0
- Intercept: b = (335 − 6.0×15) / 5 = 245 / 5 = 49.0
- Regression line: ŷ = 6x + 49
- Prediction at x = 6: ŷ = 6(6) + 49 = 85 points
- Correlation: r ≈ 0.994 — a very strong positive linear relationship
Real-World Applications
- Education: Predicting exam performance from hours of study or prior quiz scores
- Business: Forecasting monthly revenue from advertising expenditure
- Science: Modeling how temperature changes affect chemical reaction rates
- Health: Estimating caloric burn from minutes of cardiovascular exercise
- Engineering: Relating applied load to material deformation in stress tests
Sources & Methodology
The slope, intercept, and correlation formulas implemented here follow the standard OLS derivation described in OpenStax Introductory Statistics — The Regression Equation. Applied step-by-step calculation procedures are drawn from SERC at Carleton College — How Do I Calculate a Linear Regression?. Both sources confirm that the least-squares criterion is the standard for fitting a linear model to bivariate scatter plot data. These methods assume that errors are independent, normally distributed, and have constant variance across all x values — conditions that should be verified before drawing strong statistical inferences from the fitted model.
Reference