terican

Last verified · v1.0

Calculator · math

Scatter Plot Linear Regression Calculator

Calculate linear regression slope, y-intercept, correlation coefficient, and predicted y values from up to 8 scatter plot data points.

FreeInstantNo signupOpen source

Inputs

Regression Result

Explain my result

0/3 free

Get a plain-English breakdown of your result with practical next steps.

Regression Result

The formula

How the
result is
computed.

Scatter Plot Linear Regression Calculator: Formula, Method & Examples

A scatter plot calculator with linear regression identifies the best-fit straight line through a set of bivariate data points, quantifies the strength of that relationship, and enables accurate predictions at any x value. The method applied is ordinary least squares (OLS) regression, which minimizes the total squared vertical distance between each observed point and the fitted line. This approach is widely preferred because it is mathematically elegant, computationally efficient, and yields unbiased estimates of the slope and intercept under standard statistical assumptions.

The Linear Regression Equation

The regression line follows the standard slope-intercept form:

ŷ = mx + b

Here, ŷ (y-hat) is the predicted y value for a given x, m is the slope of the line, and b is the y-intercept — the value of ŷ when x equals zero. Together, these two parameters fully define the position and angle of the best-fit line through the scatter plot. The regression equation represents the most likely or average outcome for a given input value, and it forms the foundation for all predictions and inferences derived from the bivariate relationship.

Slope Formula (m)

The slope is calculated as:

m = (n∑xy − ∑x∑y) / (n∑x² − (∑x)²)

In this formula, n is the count of data points, ∑xy is the sum of each x multiplied by its paired y value, ∑x and ∑y are the sums of all x and y values respectively, and ∑x² is the sum of each squared x value. The slope describes how much the predicted y changes for every one-unit increase in x. A positive slope indicates that as x increases, y tends to increase; a negative slope indicates the opposite trend. The magnitude of the slope reflects the strength of this directional relationship — steeper slopes suggest stronger changes in y per unit change in x.

Y-Intercept Formula (b)

After computing the slope, the y-intercept is determined by:

b = (∑y − m∑x) / n

This formula anchors the regression line to the centroid of the data (x̄, ȳ), ensuring the line always passes through the mean of both variables — a fundamental property of OLS regression. The y-intercept has practical significance in many applications; for instance, in a sales forecast model, it may represent baseline revenue when advertising spend is zero. However, caution should be exercised when interpreting the intercept if zero lies outside the range of observed x values, as extrapolation beyond the data range introduces greater uncertainty.

Pearson Correlation Coefficient (r)

The Pearson correlation coefficient r measures both the direction and strength of the linear association:

r = (n∑xy − ∑x∑y) / √([n∑x² − (∑x)²][n∑y² − (∑y)²])

Values of r span from −1 to +1. A value of +1 indicates a perfect positive linear relationship, −1 a perfect negative relationship, and 0 indicates no linear pattern. As a general guideline, |r| > 0.7 signals a strong correlation, 0.4–0.7 a moderate correlation, and below 0.4 a weak correlation. The squared correlation coefficient, r², represents the proportion of variance in y that is explained by x, making it a useful measure of how well the regression line fits the observed data.

Worked Example

Consider 5 data points relating hours studied (x) to exam scores (y): (1, 55), (2, 60), (3, 65), (4, 75), (5, 80).

Computed sums: n = 5, ∑x = 15, ∑y = 335, ∑xy = 1,065, ∑x² = 55, ∑y² = 22,875.

  • Slope: m = (5×1065 − 15×335) / (5×55 − 15²) = (5325 − 5025) / (275 − 225) = 300 / 50 = 6.0
  • Intercept: b = (335 − 6.0×15) / 5 = 245 / 5 = 49.0
  • Regression line: ŷ = 6x + 49
  • Prediction at x = 6: ŷ = 6(6) + 49 = 85 points
  • Correlation: r ≈ 0.994 — a very strong positive linear relationship

Real-World Applications

  • Education: Predicting exam performance from hours of study or prior quiz scores
  • Business: Forecasting monthly revenue from advertising expenditure
  • Science: Modeling how temperature changes affect chemical reaction rates
  • Health: Estimating caloric burn from minutes of cardiovascular exercise
  • Engineering: Relating applied load to material deformation in stress tests

Sources & Methodology

The slope, intercept, and correlation formulas implemented here follow the standard OLS derivation described in OpenStax Introductory Statistics — The Regression Equation. Applied step-by-step calculation procedures are drawn from SERC at Carleton College — How Do I Calculate a Linear Regression?. Both sources confirm that the least-squares criterion is the standard for fitting a linear model to bivariate scatter plot data. These methods assume that errors are independent, normally distributed, and have constant variance across all x values — conditions that should be verified before drawing strong statistical inferences from the fitted model.

Reference

Frequently asked questions

What is a scatter plot linear regression calculator used for?
A scatter plot linear regression calculator finds the best-fit straight line through a set of (x, y) data points using the least-squares method. It computes the slope m, y-intercept b, and Pearson correlation coefficient r, and can predict y at any given x value. Typical applications include forecasting sales from advertising spend, predicting exam scores from study hours, and modeling scientific measurements across many fields.
How is the slope m calculated in linear regression?
The slope m uses the formula m = (n∑xy − ∑x∑y) / (n∑x² − (∑x)²). For 5 data points with ∑x = 15, ∑y = 335, ∑xy = 1065, and ∑x² = 55, the slope equals (5×1065 − 15×335) / (5×55 − 225) = 300/50 = 6.0. This means the predicted y value rises by 6 units for every one-unit increase in x, fully capturing the rate of change across the data set.
What does the Pearson correlation coefficient r indicate about scatter plot data?
The Pearson correlation coefficient r measures the strength and direction of a linear relationship, ranging from −1 to +1. A value of r = +1 is a perfect positive correlation, r = −1 is a perfect negative correlation, and r = 0 means no linear pattern exists. Values with |r| above 0.7 are generally classified as strong correlations, 0.4 to 0.7 as moderate, and below 0.4 as weak. For the study-hours example above, r ≈ 0.994 confirms an almost perfect linear fit.
How many data points are needed for a reliable scatter plot regression?
Linear regression is mathematically possible with as few as 2 data points, but 2 points always produce a perfect r = 1, which carries no statistical meaning. For a meaningful trend, at least 5 to 8 data points are recommended. For formal inference — testing whether the slope is significantly different from zero — introductory statistics curricula typically require 10 to 15 or more observations to reduce the outsized influence of any single outlier.
How do you predict a y value using the regression line equation?
Substitute the desired x value into the regression equation ŷ = mx + b. For example, if the calculated regression line is ŷ = 6x + 49, then at x = 7 hours studied, the predicted score is ŷ = 6(7) + 49 = 91 points. Predictions are most reliable when x falls within the range of the original data set, a process called interpolation. Extending predictions far beyond the observed x range — extrapolation — carries substantially higher uncertainty.
What is the difference between a positive and a negative slope on a scatter plot?
A positive slope (m > 0) means y increases as x increases, indicating a positive linear relationship — for example, more hours of exercise correlating with higher calorie burn. A negative slope (m < 0) means y decreases as x increases — for example, higher product prices correlating with lower unit sales. The absolute value of m determines steepness: a slope of 10 means y changes twice as fast per unit of x compared to a slope of 5.