1 "String".
upper()
2 [Link]
3 Series/+-* number
4 [Link]
5 Series.is_unique
6 [Link]
7 Series1+Series2 Nan if there is no matching between two
8 sales_h1 = sales_q1.add(sales_q2, fill_value=0) [Link](other, level=None, fill_value
9 Series.value_counts() Series.value_counts(normalize=F
normalize = True:
10 dict(series)
11 sorted(series)
12 [Link](axis=None) Series or DataFrames with a single element a
13 Series.sort_values() Series.sort_values(*, axis=0, ascending=
Normalize: If True then the object returned w
14 Series.sort_index()
15 value in [] or "value" in [Link]
16 [Link](key, default=None) Returns default value if not found.
17 pokemon[[1, 2, 4]] = ["Firemon", "Flamemon", "Blazemon"] overwrite value
18 pokemon_df = pd.read_csv("[Link]", usecols = ["Pokemon"])
pokemon_series = pokemon_df.squeeze("columns").copy()
19 google = google.sort_values() # google.sort_values(inplace = True) ca
20 [Link]()
21 [Link]()
22 [Link](arg, na_action=None) arg: mapping correspondence
re is no matching between two series
d(other, level=None, fill_value=None, axis=0)[source]
alue_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True) = false → also show Nan
dropna
Normalize If True then the object returned will cont
ataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is
t_values(*, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]
If True then the object returned will contain the relative frequencies of the unique values.
efault value if not found. can be used for dataframe as well
sort_values(inplace = True) can be recreated by the below syntax
ping correspondence
so show Nan
en the object returned will contain the relative frequencies of the unique values.
to a Series. Otherwise the object is unchanged.
, key=None)[source]
Time method
list tuple
要素の順番(order) あり(シーケンス) あり(シーケンス)
変更 可(Mutable) 不可(Immutable)
重複 要素の重複を許容する 要素の重複を許容する
listよりメモリ使用量スペースが少な
辞書のキーにできない
い
setの要素にできない 辞書のキーにできる
補足
setの要素にできる
簡易的なClassの代わりにnamed
tuplesが使える
l1 = list() t1 = tuple()
空の状態での作成
l1 = [] t1 = ()
t1 = ('a','b','c')
t1 = 'a', 'b', 'c'
初期化 l1 =['a', 'b','c'] #一要素ではカンマを忘れずに
t1 = ('a',)
t1 = 'a',
初期化 l1 = list(['a', 'b', 'c']) t1 = tuple(('a', 'b', 'c'))
(Class指定) l1 = list(('a', 'b', 'c')) t1 = tuple(['a', 'b', 'c'])
要素数の取得 len(l1) len(t1)
# 末尾へ
[Link]('d')
l1 += ['d']
追加 -
# 特定の位置へ
[Link](1, 'e')
l1[1:1] = 'e'
l1 = ['a', 'b', 'c']
l1[2] = 'x'
#無いとIndexError
置換 -
l1[9] = 'x'
#これはOK(末尾追加)
l1[9:] = 'x'
削除( by Position ) del l1[2] -
削除( by Value ) [Link]('a') -
削除( by Key ) - -
[Link]() # 無理矢理だが
削除( 全件クリア )
del l1[:] t1 = tuple()
要素の参照 # start
(スライス) l1[0]
# start:end
l1[0:2]
listと同じ
# last
l1[-1]
# by 2
l1[::2]
# デフォ=末尾から(-1)
# 無いとIndexError
取得&削除 [Link]() -
# 位置指定
[Link](2)
#追加
append()
LIFO(Stack) -
#取り出し(pop(-1)と同じ)
pop()
#追加
append()
FIFO(Queue) -
#取り出し
pop(0)
要素の位置を取得 [Link]('b') listと同じ
#True/False
存在チェック listと同じ
'a' in l1
l1 =[[1,2],[3,4],[5,6]] t1 = ((1,2),(3,4),(5,6))
二次元 # 要素の参照 # 要素の参照
l1[1][2] t1[1][1]
l1 = [1,2,3]
マージ l2 = [4,5,6]
Merge -
[Link](l2)
l1 = ['a','b','c'] t1 = (1, 2, 3)
l2 = ['d', 'e', 'f'] t2 = (4, 5, 6)
l1 +=l2 t3 = t1 + t2
# これは結果が異なる
マージ(2)
l1 = ['a','b']
l2 = ['c', 'd']
マージ(2)
[Link](l2)
--> ['a', 'b', ['c', 'd']]
特定の値を持つ要素
[Link]('a') [Link]('a')
の数を取得
ソート [Link]()
-
(破壊的) [Link](reverse=True)
ソート # sorted()=>list
(非破壊的) l2 = sorted(l1) t2 = tuple(sorted(t1))
並び順を逆に
[Link]() -
(破壊的)
並び順を逆に l2 = reversed(l1) t2 = tuple(reversed(t1))
(非破壊的) l2 = l1[::-1] t2 = t1[::-1]
a = [1, 2, 3] a = (1, 2, 3)
b=a b=a
---
コピー(浅い)
import copy
a = (1, 2, 3)
b = [Link](a)
a = [1, 2, 3] import copy
b = [Link]() a = (1, 2, 3)
--- b = [Link](a)
コピー(深い) c = list(a) ---
--- c = tuple(a)
d = a[:] ---
d = a[:]
値の合計 sum(l1) listと同じ
値の最大 max(l1) listと同じ
値の最小 min(l1) listと同じ
l1 = ['a', 'b', 'c']
変換(Stringへ) ','.join(l1) listと同じ
--> a,b,c
変換(Listへ) - list(t1)
変換(Tupleへ) tuple(l1) -
変換(Tupleへ) tuple(l1) -
変換(Setへ) set(l1) set(t1)
l1 = [['a', 'b'], ['c', 'd'], ['e', 'f']] t1 = (('a', 'b'), ('c', 'd'), ('e', 'f'))
d1 = dict(l1) d1 = dict(t1)
---
変換(Dictへ)
k = ['a', 'b', 'c']
v = [1, 2, 3]
d1 = dict(zip(k, v))
複数のシーケンスか
ら
順番に取り出し
zip(l1,l2) listと同じ
内包表記 [x for x in l1] tuple(x for x in t1)
mutableな 可能 可能
オブジェクトの格納 l1 =['a', [1, 2, 3]] t1 = ('a', [1, 2, 3])
集合演算(和) - -
集合演算(差) - -
集合演算(積) - -
集合演算(対象差) - -
キーによる参照 - -
キーの取得とループ - -
値の取得とループ - -
キー&値ペアの
取得とループ - -
キーと値の入れ替え - -
set dictionary
なし 3.7~あり ※注
可(Mutable) 可(Mutable)
キーの重複を許容しない
要素の重複を許容しない
値の重複を許容する
集合演算が可能 keyはユニークであること
keyが重複した場合は値を上書き
要素はユニーク
(upsert)
追加・置換はUpsert
listやtupleの重複排除に利用可
d1 = dict()
s1 = set()
d1 = {}
s1 = {'a', 'b', 'c'} d1 = {'a': 1, 'b': 2, 'c': 3}
s1 = set({'a', 'b', 'c'}) d1 = dict(a=1, b=2, c=3)
s1 = set(['a', 'b', 'c']) d1 = dict({'a':1, 'b':2, 'c':3})
s1 = set(('a', 'b', 'c')) d1 = dict((('a',1), ('b',2), ('c',3)))
len(s1) len(d1)
[Link]('d') d1[key] = val
s1 |= {'d'} [Link]({'e': 4})
[Link](e=4)
[Link](dict(e=4))
追加と同じ(upsert) 追加と同じ(upsert)
- -
[Link]('d')
-
s1 -= {'d'}
- del d1[key]
[Link]() [Link]()
s1 = set() d1 = {}
- -
#無いとKeyError # 無いとKeyError
[Link]('a') [Link](key)
# 無いとdefault # 無いとdefault
[Link]('a', default) [Link](key, default)
- -
- -
- -
key in d1 #True/False
listと同じ
val in [Link]() #True/False
s1 = {(1,2), (3,4)} # valにdictを格納可能
#setの入れ子は不可 d1 = {'a': {'x': 1}, 'b': {'y': 2}}
× s1 = {{1, 2}, {3, 4}}
s1 = {1, 2, 3} d1 = {'a': 1, 'b': 2}
s2 = {4, 5, 6} d2 = {'b': 9, 'c':3}
s3 = [Link](s2) [Link](d2)
※key重複時は後者(d2)の値を反映
s1 = {1, 2, 3}
s2 = {4, 5, 6}
s3 = s1 | s2
-
-
d1 = {'a': 3, 'b': 2, 'c': 1, 'd': 3}
len({k: v for k, v in [Link]() if v ==
- 3})
---
sum(v == 3 for v in [Link]())
- -
# sorted()=>list
d2 = sorted([Link](), key=lambda
s2 = set(sorted(s1))
x: x[1])
# 用途??
- -
- -
a = {1, 2, 3} a = {'a': 1, 'b': 2, 'c': 3}
b=a b=a
a = {1, 2, 3} a = {'a': 1, 'b': 2, 'c': 3}
b = [Link]() b = [Link]()
---
c = set(a)
sum([Link]())
listと同じ
sum([Link]())
max([Link]())
listと同じ
max([Link]())
min([Link]())
listと同じ
min([Link]())
,'.join([Link]()) >
listと同じ ,'.join([Link]())
list([Link]())
list(s1)
list([Link]())
tuple([Link]())
tuple(s1)
tuple(s1) tuple([Link]())
tuple([Link]())
set([Link]())
- set([Link]())
set([Link]())
s1 = {('a',1),('b',2),('c', 3)}
d1 = dict(s1))
---
-
s1 = {'a', 'b', 'c'}
s2 = {1, 2, 3}
d1 = dict(zip(s1, s2))
zip(s1,s2)は可能だが
組や順番は未保証
s1 = {'a', 'b', 'c'} -
s2 = {1, 2, 3}
l3 = zip(s1, s2)
--> {('a', 1), ('c', 3), ('b', 2)}
{x for x in s1} {k: v for k, v in [Link]()}
不可 Keyは不可(Type Error)
s1 = {'a', [1, 2, 3]} d1 = {[1, 2, 3]: 1}
-->TypeError Valueは可能
d1 = {'a': [1, 2, 3]}
s1 | s2
-
[Link](s2)
s1 - s2
-
[Link](s2)
s1 & s2
-
[Link](s2)
s1 ^ s2>
s1.symmetric_ -
difference(s2)
#キーが無いとKeyError発生
d1[key]
#無いとNoneが返る
-
[Link](key)
#無いとdefaultが返る
[Link](key,default)
[Link]()
-
for key in [Link]():
[Link]()
-
for val in [Link]():
# ( k, v )のペアがtupleで戻る
- [Link]()
for key, value in [Link]():
- d2 = {v: k for k, v in [Link]()}
Category Continuous
Chi square t-test
Category
Anova
t-test Correlation
Continuous
Paired t test ・A paired t-test is used when we are interested in the difference between two variables fo
・Often the two variables are separated by time.
・For example, in the Dixon and Massey data set we have cholesterol levels in 1952 and chol
Two samples t test a method used to test whether the unknown population means of two groups are equal or not.
e between two variables for the same subject.
erol levels in 1952 and cholesterol levels in 1962 for each subject
wo groups are equal or not.
Confidence interval for difference of two means, dependent samples
Weight loss example, lbs
Background The 365 team has developed a diet and an exercise program for losing weight. It seems that it works like a charm. However,
You have a sample of 10 people who have already completed the 12-week program. The second sheet in shows the data in
Task 1 Calculate the mean and standard deviation of the dataset
Task 2 Determine the appropriate statistic to use
Task 3 Calculate the 95% confidence interval
Task 4 Interpret the result
Optional You can try to calculate the 90% and 99% confidence intervals to see the difference. There is no solution provided for these
Solution:
Subject Weight before (lbs) Weight after (lbs) Difference
1 228.58 204.74 -23.83 Task 1: Mean -20.02
2 244.01 223.95 -20.06 St. deviation 6.86
3 262.46 232.94 -29.52
4 224.32 212.04 -12.28 Task 2: Population variance is unknown
5 202.14 191.74 -10.41 We have a small sample
6 246.98 233.47 -13.51 We assume that the population is normally d
7 195.86 177.60 -18.25 The appropriate statistic to use is the t-statist
8 231.88 213.85 -18.03
9 243.32 218.85 -24.47
10 266.74 236.86 -29.87
Note that the solution is exactly the same no matter the u
hat it works like a charm. However, you are interested in how much weight are you likely to lose.
second sheet in shows the data in kg, if you feel more comfortable using kg as a unit of measurement
re is no solution provided for these cases.
Task 3:
95% CI, t9,0.025 2.26
n variance is unknown
a small sample T CI low CI high
me that the population is normally distributed 95% -24.93 -15.12
opriate statistic to use is the t-statistic
Task 4: You are 95% confident that you will lose between 24.93lbs and 15.12lbs,
given that you follow the program as strict as the sample
is exactly the same no matter the unit of measurement
en 24.93lbs and 15.12lbs,
# A custom IQR function
def iqr(column):
return [Link](0.75) - [Link](0.25)
# Print IQR of the temperature_c column
print(sales["temperature_c"].agg(iqr))