A scatter plot uses Cartesian coordinates to display two variables for a given set of data.
Scatter plots consist of a horizontal axis, a vertical axis, and a collection of points. Each point on the scatter plot corresponds to one value of the data set, positioned based on the values of each axis. Thus, a scatter plot shows two dimensions at once.
Scatter plots are often used to identify relationships between two variables, such as annual income and years of education. The relationship between the two variables is called the correlation; the closer the data comes to making a straight line, the stronger the correlation.
When analyzing scatter plots, the viewer also looks for the slope and strength of the data pattern. Slope refers to the direction of change in one variable when the other gets bigger. Strength refers to the scatter of the plot: if the points are tightly concentrated around a line, the relationship is strong.
Scatter plots can also show unusual features of the data set, such as clusters, patterns, or outliers, that would be hidden if the data were merely in a table.
Regression lines, or best fit lines, are a type of annotation on scatterplots that show the overall trend of a set of data.
Linear regression is a statistical method for modeling the relationship between two variables. The method works well with scatterplots because scatterplots show two variables. The resulting line from a linear regression analysis can be plotted on a scatterplot of the same data and shows the general trend of the data. The goal of linear regression is to create a mathematical model so one can predict the value of Y when the value of X is known.
While regression lines are most often seen on scatter plots, they are also compatible with bar and column charts with time ordered bars, and line charts.