I am currently writing this post with two days to go until my module 1 project, Linear Regression Analysis of a housing data set, is due. First, I wanted to start off with saying how much of a struggle, yet great experience this was. The project is the most fun I never want to have again. I have learned so much in just a week and a half. The project was a wake up call; in a good way. I honestly did not know how much of the information we were learning was just being regurgitated, or if I was fully grasping all of the concepts.
Fortunately when push came to shove and I started the project, I realized how much I truly learned. It was just my data, a Jupyter notebook, and I. The feeling of empowerment overwhelmed me as I started to code away cleaning the data. As with any relationship, the honeymoon phase only lasts so long. The end of the honeymoon phase came to an end right when I started to try and code the categorical variables.
Looking back on it, I am not sure why the categorical variables seemed so difficult to me. Sure enough with hard work and determination I was able to get through the categorical variable selection process. I used Google as my best friend and researched definitions and examples of categorical variables. The next part was decding whether to use one hot coding or binning techniques. I think for whatever reason this was what scared me the most. Luckily with the help of the labs and stack overflow, I was able to model my categorical data creation off of the researched examples. I decided to use the pandas method “get_dummies” for most of my variables. The one variable I will touch on here is the “yr_built”. I decided to bin this variable into decades using the binning techniques taught in the FlatIron curriculum.
I am still not done this project, but I already have learned so much about myself, the coding project process, and about python. The best part of the whole project has to be the feeling you get when your r-squared value is higher than the required .80. This is the true testament to the OSEMN process. After all the strenuous data manipulation and cleaning, knowing your model is accurately predicting the target variable is such a rewarding feeling. I am only at the beginning and I know Data Science has already blown my mind and will continue to do so along this journey.