Simple Bivariate Regression

Multiple regression is a statistical procedure that allows you determine if a group of variables have an impact on another variable. For example, I have always been fascinated with jazz improvisation. Specifically, how does a musician learn to improvise? Over the past several years, I have isolated several variables that have been shown to be statistically significant predictors of jazz improvisation achievement, and I would like to use a pre-existing data set to provide a basic introduction to various regression analyses.

My first example involves a procedure known as simple bivariate regression, which utilizes only one independent variable. I want to know if self-assessment (independent variable) is a statistically significant predictor of jazz improvisation achievement (dependent variable). Participants (N = 102) were student musicians enrolled in their high school jazz ensemble. To measure the independent variable of self-assessment, participants recorded the melody and two impro­vised choruses for B-flat Blues and one improvised chorus for Satin Doll. Immediately following the recording process, participants listened to the recordings and assessed their performances using the Jazz Improvisation Self-Assessment (JISA) measure. To mea­sure the dependent variable of jazz improvisation achievement, a panel of three judges evaluated the recordings for each participant using the Jazz Improvisation Performance Assessment (JIPA) measure.

Frequencies and Descriptive Statistics

Prior to conducting the regression analysis, I examined the frequencies and descriptive statistics for both variables. There were no missing data, and the range of values seemed to be in order.

Correlation Analysis

correlations

  • The correlation between self-assessment and jazz improvisation achievement (r = .487) is considered moderate, as it lies in the middle of the positive continuum.
  • In addition, the correlation is statistically significant (p < .001).

The Regression Analysis: Using SPSS, I regressed the dependent variable (jazz improvisation achievement) on the independent variable (self-assessment).

PART 1: Model Summary

model-summary

  • What is R? In the case of a simple bivariate regression, R is known as the zero order correlation (i.e., the Pearson correlation coefficient between two variables). R (.487) is the same as the correlation coefficient (see above).
  • What is R2? R2 (.238) is an index, which provides the variance explained in the dependent variable by the independent variable. Self-assessment accounts for 23.8% of the variance in jazz improvisation achievement.

PART 2: ANOVA

anova

  • An ANOVA (i.e., F test) is used to test the overall significance of the regression.
  • Sum of squares regression (SSregression) measures the variation in the dependent variable that is explained by the independent variable.
  • Sum of squares residual (SSresidual) measures the variance unexplained by the regression.
  • Degrees of freedom for the regression are equal to the number of the independent variables.
  • Degrees of freedom for the residual (i.e., error) are equal to the sample size minus the number of the independent variables in the equation minus 1.
  • According to the output, the F value (31.167) is statistically significant (p < .001).

PART 3: Regression Equation

coefficients

  • b (1.187) is the unstandardized regression coefficient (SPSS uses an upper case B).
  • B (.487) is the standardized regression coefficient (Beta).
    • The regression coefficient describes the change in the dependent variable for each unit change in the independent variable. So, which regression coefficient (standardized or unstandardized) do we interpret for our current example? It all depends on what is being measured.
    • If the variables were measured using a metric that is easy to interpret, then it would be appropriate to use the unstandardized regression coefficient. For example, let’s say the independent variable was hours spent practicing per week. Since hours spent practicing per week is a meaningful metric that can easily be interpreted by the general public, you could interpret the unstandardized regression coefficient as follows: For each extra hour spent practicing per week, a student’s jazz improvisation achievement score will increase by X units.
    • For the current analysis, both the independent variable and the dependent variable were measured using points derived from a rating scale. These scales do not utilize a meaningful metric that is easy to interpret by the general public. As such, it would be appropriate to interpret the standardized regression coefficient. Why? Because the standardized regression coefficient measures change using standard deviation (SD) units. As a result, the current example can be interpreted as follows: For each standard deviation (SD) increase in self-assessment, jazz improvisation achievement will increase by .487 of a standard deviation (about one half of a SD).
  • The constant (47.465), otherwise known as the intercept, is the predicted score on the dependent variable for someone who scored a zero on the independent variable.
  • According to the SPSS output, the t-statistic (5.583) is significant (p < .001). This tells us that self-assessment is a statistically significant predictor of jazz improvisation achievement.
  • Confidence Interval: As is the case with regression, b is an estimate. We really want to know the actual value of the regression coefficient in the population, so we will refer to the confidence interval. According to the SPSS output, the 95% confidence interval ranges from .765 to 1.609. Interpretation: (a) there is a 95% chance the true regression coefficient falls within this range or (b) if we were to conduct this study 100 times, the b would fall within this range 95 times. Since the range does not include zero, we know that b is statistically significant.

One Reply to “”

Leave a reply to Steve Zdzinski Cancel reply