October 8, 2018

Are there certain features that play a larger role in the final prediction? Chi-squared test is one approach for better understanding the importance of each feature by giving you an idea of how much impact each feature will have on the final prediction

```
Chi-squared is used in a "Goodness of fit" test that tells you if your sample data fits within the distribution of a given population. Does your data represent what you would expect to find from this given population?
Chi-squared is also used to test for independence -- comparing two variables in a table to see if they are related to each other. Do the distributions of these variables differ from each other? A small chi square test statistic means there is a relationship (the observed data fits the latter very well). A large chi square statistic means there was a poor fit, and therefore not much of a relationship
```

```
import pandas as pd
import numpy as np
def chi_squared_test(dataframe, observed, expected):
'''
Function recieves in a dataframe that contains a
column that represents your oberserved value, and
a column that represents your expected value. It
then calculates and returns the chi-squared value.
args:
dataframe <pandas> : Pandas Dataframe
observed <string> : column name of your observed value
expected <string> : column name of your expected value
returns:
chi-square <float> : chi squared value
Notes:
If the chi-squared value is about zero, then there is a
near perfect relationship between the two. Take this chi
squared value and compare it to a critical value form a
chi-squared table.
'''
diff = dataframe[observed] - dataframe[expected]
dfff = np.power(diff, 2)
diff = np.divide(diff, dataframe[expected])
chi_squared = np.sum(diff, axis=1).squeeze()
return chi_squared
```

`Chi-squared helsp show the relationship between two categorical variables. `