Numpy – basic data types

you can have int64 and float64…

dtype – data type object

The default data type is floating point

import numpy as np
a = np.array([1,2,3])
a.dtype
dtype('int64')
b = np.array([1.,2.,3.])
b.dtype
dtype('float64')

You can explicitly specify which data-type you want:

c = np.array([1, 2, 3], dtype=float)
c.dtype
dtype('float64')

Other types:

complex:
d = np.array([1+2j,3+4j,5+6*1j])
d.dtype
dtype(‘complex128’)
Bool:
e = np.array([True, False, False, True])
e.dtype
dtype(‘bool’)
Strings:
f = np.array([‘Bonjour’, ‘Hello’, ‘Hallo’,])
f.dtype
dtype(‘S7’)
even more:
  • int32
  • int64
  • uint32
  • uint64

NumPY – Learn it, constantly try to keep up … i know it is pain but that is life…

Someone more eloquent than me said: in NumPy there are two different data types for dealing with rows and columns of numbers. Be careful of this because they look similar, but simple mathematical operations such as multiply on the two data types can have different meanings. The matrix data type behaves more like matrices in MATLAB.

First install ipython. it is cool to have notebooks!

how to install:

/cube/apps/py3.5.2/bin/pip3 install –upgrade pip

/cube/apps/py3.5.2/bin/pip3 install jupyter

now run it:

/cube/apps/py3.5.2/bin/jupyter notebook

choose new -> python3

and you are good to go…

Ok let’s start with nice tutorial:

Numpy arrays:

For me this  was simple and it is very helpful to go through the numpy tutorial …

cool function: arange, linspace, ones, zeros, eye and diag

shaping is also cool, e.g example

import numpy as np

a = np.array([[[1],[2]],[[3],[4]]])

a.shape

(2,2,1) -> what does it mean – It means that we have 2 arrays, with 2 arrays each with one element each…

It took me a while to digest this function … but now I am ok…

to reshape this combination of arrays to more simple thing you can do this:

a.shape = (4,1) # what you did is you asked for 4 arrays one element each instead of 2,2,1, e.g.

array([[1],
       [2],
       [3],
       [4]])

Cool, ha

 

 

 

 

 

 

 

 

 

 

 

 

Machine Learning according to Sasha

Everyone is expert in machine learning… Everyone slaps on the resume or CV I know this… On interviews, when you ask candidate some questions about ML you quickly realize they actually do not know anything under the surface… which is painful experience for both parties.

Why is this is beyond me, I guess people think they can make money pretending to be data scientist …

Ok let’s dig into some interesting stuff…

Supervised Learning

When we want to learn from our data by specifying some target variable or value …

Classification – what class an instance of data fall into – simple as that

The target variables could be:

  1. nominal value, like true or false, zero or one, animal or plant …
  2. infinite number of numeric values …  (regression!!)

Regression – prediction of a numeric value  – do you remember those school days and “best-fit” line ….

Problem facing machine learning algorithms is that there are solutions to problems out there that are not deterministic … example would be motivation in humans… That is hard to model…

I for example used vectored fuzzywuzzy algorithm to mach sentences. 

Ok, so what are expert systems! The expert systems are interesting part of machine learning. Basically “expert system” is system that can substitute something that is expert in something.

Think about mathematician or statistician doing something manually on numbers. Well expert system can do the same or better, more precise.

If you measure some subject, you are taking about some rows and columns. Well those columns could be called “features” or “attributes“.

We will have a table “instance” with “features“.

In bellow table we have a patterns of how different races and ethnicity handles (withdraws and deposits annually) for their bank accounts.

deposit withdraw account type race ethnicity
1 100000 70000 checking white Serbia
2 50000 45000 savings white America
3 20000 25000 checking black America
4 50000 10000 savings white Japan
5 200000 100000 checking brown Argentina

The first two features are numeric so they can take a decimal values.

The third feature is binary it can be in this case only 1 or 0.

The fourth column can be enumerated by integers, thus race colors represent numbers, 1,2,3,… n

So we want to do classification on this data set. first we need to come up with classification algorithm and train that algorithm. To do that we need to have a training set, a data .

We have 5 training examples

We have 4 features and one target variable

In classification problem the target variable are called classes and there is assumed to be a finite number of classes.

We will assume that our test set is above.

Machine learning algorithms have a desired level of accuracy. Can we describe that level of accuracy or knowledge representation. It depends. Some algorithms do have knowledge representation some don’t.

Examples of knowledge representation might be:

set of rules

probability distribution

example from the training set

Machine Learning tasks:

So we are working on the classification task.

Classification is prediction of class where the instance of data will be.

Regression is another task in machine learning. Regression is prediction of a numerical value.

Classification and regression are examples of supervised learning …

Opposite of supervised learning we have unsupervised learning. There is no label or target value in data under unsupervised learning . For example clustering,  finding statistical values that describe data or reducing the data from many features to a small number of features for visualization purposes are unsupervised learning tasks.

 

Supervised learning tasks

  • k-Nearest Neighbors           Linear Algorithm
  • Naive Bayes                            Locally weighted linear Algorithm
  • Support vector machines    Ridge Algorithm
  • Decision trees                        Lasso

Unsupervised learning tasks

  • k-Means                                  Expectation maximization Algorithm
  • DBSCAN                                   Parzen window Algorithm

 

 

What is important to understand in ML?

How to choose algorithm?

Consider the goal?

What are you trying to get out?

maybe is probability of lowering risk for the bank or similar interests of users who order some product from the retail bank.

So the answer is what data you have or could collect.

Obviously, if you are looking for target values you need to look at supervised learning.

if the value you are looking for is 1/0, yes,no  a/b/c/ black/yellow/white then you will use classification. if you are looking into number of values then you are going to use regression, e.a. 0.00 – 100.00, -100 to 100 or +∞ -∞

 

Opposite is for unsupervised learning …

Trying to fit data into some discrete group would need clustering algorithm. If you want to have some numerical estimate of how strong the fit is in each discrete group, then you should use density estimation algorithm.

It is absolute to know your data. Know your data:

Data features are nominal or continuous?

Are there missing values in the features?

If there are missing values, why is that?

Are there outliers in the data?

Are you looking for something that is very infrequent?

How to Develop ML Application