Laboratory work 2 (video, part 2). Paired Regression Analysis

View

Ссылка на yuotube

Section Fundamentals of Mathematical Statistics. The topic Paired Regression Analysis

I suggest you a short test. Question 1. Depending on the number of interrelated features, regressions can be… Possible answers. Choose the correct answer.

Question 2. Depending on the type of equation selected, regressions can be… Possible answers. Choose the correct answer.

Question 3. The equation y=1+0.5x defines the following type of regression… Possible answers. Choose the correct answer.

Let’s check it out. Question 1. The correct answer is 2. Question 2. The correct answer is 1. Question 3. The correct answer is 3.

Let’s recall the form of the pair linear regression equation. What I want to remind you of. You can find the regression equation in textbooks in this form or, respectively, in this form. e – random error, a and b are called regression coefficients. X is an independent variable. Y is the dependent variable. The coefficients a and b can be found by these formulas. Let me remind you that the coefficient b shows the average change in the variable Y with a change of 1 unit. The coefficient a shows how fast (fast or slow) the variable Y changes compared to the variable X. If the coefficient a is positive, the variable Y grows at a faster rate than X. If the coefficient a is negative, vice versa.

Let’s consider the following example. The relationship between the weight of mothers X, measured at the beginning of pregnancy (kg), and the weight of newborn baby chimpanzees Y (kg) was studied. The task is to find the linear regression equation. There is a data table. Now we will calculate the products of these variables and the squares XY, respectively. We find the average values. Note that the average value for x is 11.87. The average value for the variable Y is 0.703. Here is the average product. Here is the average for X^2. Here is the average for Y^2. Of course, we will take rounded average values with an accuracy of either hundredths or thousandths.

And now we substitute the linear regression coefficients into the formulas and find their values. The coefficient b is approximately 0.235, a=0.424. Now let’s move the data to an Excel worksheet and build a chart in the correlation field and build a data table. So. Let’s start with the correlation field. We go to the Insert toolbar, find Dot Chart, and select it. Now, we select any point, click the right mouse button, add trend lines, select Linear trend line. Remember to check the Show equation and Show R-squared coefficient boxes. The equation on the screen.

If we look closely, our equation has the form just found using the formula. So, the regression equation that relates the mass of cubs to the mass of their mothers looks like this. Please note again: to plot the trend lines, we used the Dot Chart Insertion. We added linear trend lines. The R-squared coefficient of determination is approximately 0.319, which means that 31.9% of the variation in the weight of newborns is due to the variability in the weight of mothers.

Now we continue to work in Excel. I suggest building a data table. To do this, select Data, Data Analysis, and Regression. The input interval is the mass of cubs. We select column B. The input interval X is the mass of mothers in this case. We select column A. the columns have headers, so don’t forget to check the Tags. The level of reliability. The output interval. So. We get the table. We have already found the R-square. Here, the coefficient a and coefficient b are shown in table 3. So. Further. Note that the probabilities for the coefficients and for the accuracy and significance of the regression equations are also presented here.

And now we will evaluate them. So. This is the coefficient a in our regression equation. This is the coefficient b. This is the coefficient of determination. Now let’s look at the significance of F. The significance of F is the probability of error with which the F-test was performed. I remind you that the F-test allows us to evaluate the significance of the regression equation. The chosen reliability is 95%, so the error is 5% or 0.05. Therefore, we compare the significance of F with 0.05. The significance of F is less than 0.05, which indicates that the regression equation found is statistically significant.

Well. Let’s look at the probabilities for the coefficients. We will also compare them with 0.05. We can say with 95% confidence that the t-test was completed, which means that the regression coefficients are statistically significant. The next two columns are the bounds of the confidence interval for coefficients a and b. Thus, we can say with 95% confidence that the model we have built is adequate and suitable for further use for forecasting and data analysis.

I propose some tasks for you. Please note that here you are supposed to calculate the forecast. The forecast is calculated when the regression model is found. The forecast can be calculated using the Forecast statistical function. Alternatively, you can use the equation you found and manually calculate the forecast. I wish you success. Thank you for your attention.

Last modified: Четверг, 5 декабря 2024, 10:24