Learn Data Science in 3 Months
6/24
Week 1 - Learn Python - EdX https://www.edx.org/course/introducti...
- Siraj Raval https://www.youtube.com/watch?v=T5pRl...
Week 2 - Statistics & Probability - KhanAcademy https://www.khanacademy.org/math/stat...
Week 3 - Data Pre-processing, Data Vis, Exploratory Data Analysis - EdX
https://www.edx.org/course/introducti...
Week 4 - Kaggle Project #1
Week 5-6 - Algorithms & Machine Learning - Columbia https://courses.edx.org/courses/cours...
Week 7 - Deep Learning - Part 1 and 2 of DL Book https://www.deeplearningbook.org/
- Siraj Raval https://www.youtube.com/watch?v=vOppz...
Week 8 - Kaggle Project #2 Week 9 - Databases (SQL + NoSQL) - Udacity
https://www.udacity.com/course/intro-...
- EdX https://www.edx.org/course/introducti...
Week 10 - Hadoop & Map Reduce + Spark - Udacity https://www.udacity.com/course/intro-...
- Spark Workshop https://stanford.edu/~rezab/sparkclas...
Week 11 - Data Storytelling - Edx https://www.edx.org/course/analytics-...
Week 12- Kaggle Project #3
Learn Machine Learning in 3 Months
第 1 个月
第一周 线性代数 https://ocw.mit.edu/courses/mathemati...
第二周 微积分 https://www.youtube.com/playlist?list...
第三周 https://www.edx.org/course/introducti...
第四周 算法 https://www.coursera.org/courses?lang...
第 2 个月
第一周 learn python for data science https://www.youtube.com/watch?v=T5pRl...
Math of Intelligence https://www.youtube.com/watch?v=xRJCO...
Intro to Tensorflow https://www.youtube.com/watch?v=2FmcH...
第二周 Intro to ML (Udacity) https://eu.udacity.com/course/intro-t...
第三四周 ML Project Ideas https://github.com/NirantK/awesome-pr...
第 3 个月(深度学习)
第一周 Intro to Deep Learning https://www.youtube.com/watch?v=vOppz... 第二周 Deep Learning by
Fast.AI http://course.fast.ai/
第三四周 按照我的 github 重新实现深度学习项目 https://github.com/llSourcell?tab=rep..
Linear regression
Logistic regression
Random forest
Gradient boosting
PCA
k-mean clustering
k nearest neighbors
Natural language processing (2 sessions)
Exploratory data analysis
Python web APIs
Feature engineering (2 sessions)
Object-oriented programming
Forecasting
Linear regression
Logistic regression
SVM
Random forest
Gradient boosting
PCA
k-means
Collaborative filtering
kNN
ARIMA
Business use case -> Domain expertise
Data gathering from vary data source (balance vs. unbalance dataset)
Whether the data is in the right format cleansing, wrangling, exploring EDA and how to handle the
missing value, to better put into ML algorithm. (Feature Engineering -> also apply some stats knowledge
to check Mean, Median, Mode)
Feature selection (regression back elimination, p-value)
Modeling (ML, DL algorithm select 1. accuracy. 2. Confusion matrix. 3. Cross validation
Coding library
Python:
The Inplace parameter
The inplace parameter is commonly used with the following methods:
dropna()
drop_duplicates()
fillna()
query()
rename()
reset_index()
sort_index()
sort_values()
import itertools
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
train, test = shampoo [0:25], shampoo[25:36]
prediction = list()
for i in range(len(test)):
try:
model = ARIMA(train, order)
model_fit = model.fit(disp=0)
pred_y = model_fit.forecast()[0]
predictions.append(pred_y)
error = mean_squared_error(test,predictions)
print('ARIMA%s MSE = %.2f'%(order,error))
except:
continue