Tutorial 2: Association


Recap:
rate(A|B)
How to see if two variables are associated with each other in a graph?
If at time t, the two variables are (m.p) and (p,t), consider (m,p) and plot the graph against time.
The gradient shows if its positive or neg relations

1. The relationship between two variables
- Deterministic
e.g X=Vt+X(0)
time can determine X

-Statistical
e.g when market when up portfolio went up. But that doesn't mean that they are associated through formula.

-Categorial Variables
  odds ratio and risk ratio

-Numerical Values
Scatter diagrams
Linear correlation and coefficient.
R= 0 is no correlation
The stronger its closer to 1 or -1 is the Stronger correlation.

4. Association between numerical variables, linear correlation coefficient,r
Consider a point, the nearest possible point is on it.
Consider another point in addition to the first, the nearest possible line is between both of them
In conclusion, the more points, the closest possible line would be in between all of the points best fit.

Application: Machine learning. We cannot predict the future but by taking the avg of multiple stats and finding a line that best fit the points, we can predict if given a value that is within the range of the formula stats

=> The closer the points are closer to a regression line, the stronger the correlation. The linear model can then do an exact prediction.
R depends on the slope of the regression line.

5. Linear Correlation Coefficient

- Ecological Correlation
Correlation based on aggregated data such as group average or rate.
Association will be overstated based on the aggregated data

-Ecological Fallacy
Deduce the inferences on correlation about an individual based on aggregated data

e.g Avg Height(y) in respect to GDP/Captial(x) and claiming that maybe taller people earn higher GDP
-Atomistic Fallacy
Generalising base on aggregate data

-Attenuation effect
If the data have an oval shape and you change the range, r is going to decrease.

6. Ecological fallacy, atomistic fallacy
Imagine we have two graphs
1) Math against chem scores where each point represent a sch (Ecological Correlation)
2) Math against chem scores where each pt is a student (Correlation)

- Ecological:
Observe: Student body
Conclusion: one student

-Atomistic
Observe: One student
Conclusion: Student body
#Generalisation using a sample is not atomistic

No comments:

Post a Comment