⬇️ Word Report ⬇️ R Analysis ⬇️ R Plots ⬇️ Dataset (Excel)

1. Simple Linear Regression Model

The relationship between study hours (X) and exam grades (Y) is modeled using the simple linear regression equation:

Y = β₀ + β₁X + ε

This model assumes a linear relationship between the predictor variable (study hours) and the response variable (grade), with ε representing random experimental error.

2. Least-Squares Estimation

The regression parameters are estimated using the least-squares method, which minimizes the sum of squared errors (SSE):

SSE = Σ (Yᵢ − Ŷᵢ)²

The slope and intercept estimators are:

β̂₁ = Σ(Xᵢ − X̄)(Yᵢ − Ȳ) / Σ(Xᵢ − X̄)² = 5.0128
β̂₀ = Ȳ − β̂₁X̄ = 39.7469

The fitted regression line is:

Ŷ = 39.7469 + 5.0128X

Interpretation: Each additional hour of study increases the expected exam grade by approximately 5 points.

Grade Predictor

Enter study hours (X) to estimate the expected exam grade using the fitted model: Ŷ = 39.7469 + 5.0128X

Predicted grade (Ŷ)

3. Inference and Hypothesis Testing

The null hypothesis for the slope is:

H₀: β₁ = 0

The test statistic is:

t = β̂₁ / SE(β̂₁) = 103.72

With 4,998 degrees of freedom, the p-value is far below 0.001, leading to rejection of the null hypothesis. Study hours have a statistically significant effect on exam grades.

95% Confidence Interval for the slope:

(4.918, 5.108)

4. Mean Response and Prediction

The estimated mean grade for a fixed study time x₀ is:

Ŷ(x₀) = β̂₀ + β̂₁x₀

Confidence intervals estimate the mean response, while prediction intervals estimate an individual student’s grade.

Prediction intervals are wider because they include individual variability.

5. Correlation Analysis

The Pearson correlation coefficient is:

r = 0.8263

This indicates a strong positive linear association between study hours and exam grades.

The coefficient of determination is:

R² = 0.6828

Approximately 68.3% of the variability in exam grades is explained by study hours.

Figures (Generated in R)

All figures were generated using R and RStudio and correspond directly to the regression diagnostics discussed above.

6. Conclusion

This project demonstrates how simple linear regression and correlation can be used for modeling, inference, and prediction. The results confirm a strong and statistically significant relationship between study time and academic performance.