0% found this document useful (0 votes)
48 views49 pages

Pandas

Pandas is a fast and flexible open source data analysis tool built on Python. It contains 3 main data structures: Series (1D), DataFrame (2D), and Panel (ND). A Series is a combination of data, index, and dtype. It supports indexing, slicing, and direct item assignment. However, it does not allow setting multiple values to the same index.

Uploaded by

subodhaade2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views49 pages

Pandas

Pandas is a fast and flexible open source data analysis tool built on Python. It contains 3 main data structures: Series (1D), DataFrame (2D), and Panel (ND). A Series is a combination of data, index, and dtype. It supports indexing, slicing, and direct item assignment. However, it does not allow setting multiple values to the same index.

Uploaded by

subodhaade2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation
tool, built on top of the Python programming language.

Pandas composed of 3 different data structures:


Series [1 Dim]
DataFrame [2 Dim]
Panel [N-Dim][Combo of series and DataFrame ]

Series
In [1]: import pandas as pd

In [2]: # create an empty series


[Link]()

C:\Users\hakim\AppData\Local\Temp/ipykernel_1572/[Link]: DeprecationWarnin
g: The default dtype for empty Series will be 'object' instead of 'float64' in a fut
ure version. Specify a dtype explicitly to silence this warning.
[Link]()

Out[2]: Series([], dtype: float64)

In [3]: [Link](dtype='object')

Out[3]: Series([], dtype: object)

[Link](
data=None,
index=None,
dtype: 'Dtype | None' = None,

name=None,
copy: 'bool' = False,
fastpath: 'bool' = False,
)

Features of Series
In [5]: # It is cobination of 3 things
#- Data/values
#- Index
#- dtype
[Link]([10,20,30,40])

Out[5]: 0 10
1 20
2 30
3 40
dtype: int64
In [6]: # It can accept homo. and hetro. data
# homo
[Link]([12.3,4.5,6.8])

Out[6]: 0 12.3
1 4.5
2 6.8
dtype: float64

In [7]: [Link]([10,20,19.0])

Out[7]: 0 10.0
1 20.0
2 19.0
dtype: float64

In [8]: # when we supply hetro. data or str data then dtype is object
[Link](['A',10,20,'30'])

Out[8]: 0 A
1 10
2 20
3 30
dtype: object

In [9]: # supply all str data


[Link](['A','B','C'])

Out[9]: 0 A
1 B
2 C
dtype: object

In [ ]: # Difference between int32(4 bytes) and int64(8 bytes)

In [10]: a = [Link]([10,20,30,40,50])
a

Out[10]: 0 10
1 20
2 30
3 40
4 50
dtype: int64

In [11]: a.__sizeof__()

Out[11]: 168

In [12]: b = [Link]([10,20,30,40,50],dtype='int8')
b

Out[12]: 0 10
1 20
2 30
3 40
4 50
dtype: int8
In [13]: b.__sizeof__()

Out[13]: 133

In [14]: # background data structure is an array


a

Out[14]: 0 10
1 20
2 30
3 40
4 50
dtype: int64

In [15]: # indexing is possible


# access 40
a[3]

Out[15]: 40
In [16]: a[-1] # in series -ve indexing is nt allowed

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\[Link] in get_loc(s
elf, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)


~\AppData\Local\Temp/ipykernel_1572/[Link] in <module>
----> 1 a[-1] # in series -ve indexing is nt allowed

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in __getitem__(sel
f, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in _get_value(self,
label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = [Link].get_loc(label)
1052 return [Link]._get_values_for_loc(self, loc, label)
1053

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\[Link] in get_loc(s
elf, key, method, tolerance)
385 return self._range.index(new_key)
386 except ValueError as err:
--> 387 raise KeyError(key) from err
388 raise KeyError(key)
389 return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: -1

In [17]: # Is direct item assignment is possible??


a

Out[17]: 0 10
1 20
2 30
3 40
4 50
dtype: int64

In [18]: a[0] = 100


In [19]: a

Out[19]: 0 100
1 20
2 30
3 40
4 50
dtype: int64

In [20]: id(a)

Out[20]: 2479599958864

In [21]: a[4] = 999

In [22]: a

Out[22]: 0 100
1 20
2 30
3 40
4 999
dtype: int64

In [23]: id(a)

Out[23]: 2479599958864

# After the change,id doesnt change and changes persist in the same object
# hence Series is a mutable data struture

In [25]: # slicing supported


a[:]

Out[25]: 0 100
1 20
2 30
3 40
4 999
dtype: int64

In [26]: a[:2]

Out[26]: 0 100
1 20
dtype: int64

In [27]: #access 100,30,999


a[::2]

Out[27]: 0 100
2 30
4 999
dtype: int64
In [28]: # replace 20,30,40 by 2,3,4 resp.
a

Out[28]: 0 100
1 20
2 30
3 40
4 999
dtype: int64

In [29]: a[1:4]

Out[29]: 1 20
2 30
3 40
dtype: int64

In [30]: a[1:4] = [2,3,4]

In [31]: a

Out[31]: 0 100
1 2
2 3
3 4
4 999
dtype: int64

In [32]: # replace 100 and 999 by 0


a[::4]

Out[32]: 0 100
4 999
dtype: int64

In [33]: a[::4] = 0

In [34]: a

Out[34]: 0 0
1 2
2 3
3 4
4 0
dtype: int64

In [38]: a[0],a[4] = (100,100)

In [39]: a

Out[39]: 0 100
1 2
2 3
3 4
4 100
dtype: int64
In [42]: # if we want to add multiple values at same position then we cant add
a[0] = [10,20]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'li
st'

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)


C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in __setitem__(sel
f, key, value)
1061 try:
-> 1062 self._set_with_engine(key, value)
1063 except (KeyError, ValueError):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in _set_with_engine
(self, key, value)
1098 validate_numeric_casting([Link], value) # type: ignore[arg-typ
e]
-> 1099 self._values[loc] = value
1100

ValueError: setting an array element with a sequence.

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)


TypeError: int() argument must be a string, a bytes-like object or a number, not 'li
st'

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)


~\AppData\Local\Temp/ipykernel_1572/[Link] in <module>
1 # if we want to add multiple values at same position then we cant add
----> 2 a[0] = [10,20]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in __setitem__(sel
f, key, value)
1068 else:
1069 # GH#12862 adding a new key to the Series
-> 1070 [Link][key] = value
1071
1072 except TypeError as err:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in __setitem__(se
lf, key, value)
721
722 iloc = self if [Link] == "iloc" else [Link]
--> 723 iloc._setitem_with_indexer(indexer, value, [Link])
724
725 def _validate_key(self, key, axis: int):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in _setitem_with_
indexer(self, indexer, value, name)
1730 self._setitem_with_indexer_split_path(indexer, value, name)
1731 else:
-> 1732 self._setitem_single_block(indexer, value, name)
1733
1734 def _setitem_with_indexer_split_path(self, indexer, value, name: str):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in _setitem_singl
e_block(self, indexer, value, name)
1966
1967 # actually do the set
-> 1968 [Link]._mgr = [Link]._mgr.setitem(indexer=indexer, value=value)
1969 [Link]._maybe_update_cacher(clear=True)
1970

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\[Link] in seti
tem(self, indexer, value)
353
354 def setitem(self: T, indexer, value) -> T:
--> 355 return [Link]("setitem", indexer=indexer, value=value)
356
357 def putmask(self, mask, new, align: bool = True):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\[Link] in appl
y(self, f, align_keys, ignore_failures, **kwargs)
325 applied = [Link](f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\[Link] in setite
m(self, indexer, value)
951 # setting a single element for each dim and with a rhs that coul
d
952 # be e.g. a list; see GH#6043
--> 953 values[indexer] = value
954
955 elif exact_match and is_categorical_dtype(arr_value.dtype):

ValueError: setting an array element with a sequence.

In [45]: a[0] = bytes((1,2))

In [46]: a

Out[46]: 0 b'\x01\x02'
1 2
2 3
3 4
4 100
dtype: object

Manipulation of index

In [47]: import numpy as np


s = [Link]([Link](101,111))
s

Out[47]: 0 101
1 102
2 103
3 104
4 105
5 106
6 107
7 108
8 109
9 110
dtype: int32
In [54]: # as we cant use -ve index then hwo to fetch last element
[Link](1)

Out[54]: 9 110
dtype: int32

In [53]: s[len(s)-1]

Out[53]: 110

In [55]: s[len(s)-1:]

Out[55]: 9 110
dtype: int32

In [57]: # if we wnt to access only data


#returns an array of value
[Link]

Out[57]: array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [58]: [Link](s)

Out[58]: array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [59]: #list
list(s)

Out[59]: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

In [61]: #dict
print(dict(s))

{0: 101, 1: 102, 2: 103, 3: 104, 4: 105, 5: 106, 6: 107, 7: 108, 8: 109, 9: 110}

In [63]: # access index


[Link]

Out[63]: RangeIndex(start=0, stop=10, step=1)

In [64]: # check dtype


[Link]

Out[64]: dtype('int32')

In [65]: # series type


type(s)

Out[65]: [Link]

In [66]: # check dim


[Link]

Out[66]: 1

In [67]: #check shape


[Link]

Out[67]: (10,)
In [75]: # now lets see Index manipulation
s = [Link]([Link](101,111),index=range(10,20))
s

Out[75]: 10 101
11 102
12 103
13 104
14 105
15 106
16 107
17 108
18 109
19 110
dtype: int32

In [70]: s

Out[70]: 10 101
11 102
12 103
13 104
14 105
15 106
16 107
17 108
18 109
19 110
dtype: int32

In [ ]: # after creation of Series, change index


[Link] = range(10)

In [73]: s

Out[73]: 0 101
1 102
2 103
3 104
4 105
5 106
6 107
7 108
8 109
9 110
dtype: int32

In [82]: # scalar series: 10 customers with same branch_name


b = [Link]('SBI-Pune',index=[1,2,3,4,5,1,7,8,1,1])
b

Out[82]: 1 SBI-Pune
2 SBI-Pune
3 SBI-Pune
4 SBI-Pune
5 SBI-Pune
1 SBI-Pune
7 SBI-Pune
8 SBI-Pune
1 SBI-Pune
1 SBI-Pune
dtype: object
In [83]: # we have duplicate index
b[1]

Out[83]: 1 SBI-Pune
1 SBI-Pune
1 SBI-Pune
1 SBI-Pune
dtype: object

In [84]: [Link]('SBI-Pune',index=['A','B',3,4,5,1,7,8,1,1])

Out[84]: A SBI-Pune
B SBI-Pune
3 SBI-Pune
4 SBI-Pune
5 SBI-Pune
1 SBI-Pune
7 SBI-Pune
8 SBI-Pune
1 SBI-Pune
1 SBI-Pune
dtype: object

In [85]: c = [Link](['SBI','SBI','BOI','SBI'])
c

Out[85]: 0 SBI
1 SBI
2 BOI
3 SBI
dtype: object

In [88]: # i dont want to access BOI using index


for i in [Link]('BOI'):
#print(i)
if i == 0:
print()

-1
-1
0
-1

In [89]: [Link]('B')

Out[89]: 0 False
1 False
2 True
3 False
dtype: bool

In [93]: #above boolean output we can supply as a n index to fetch values with True
c[[Link]('B')]

Out[93]: 2 BOI
dtype: object
Lets create a series of a string and do the data analysis
In [ ]: n = [Link]([])
n

to do analysis we need questions??

In [ ]: # fetch employee with inital letter P


In [ ]: # we can use boolean output as an input in index


In [ ]: # convert names in upper case


In [ ]: # sort in alphabatic order


In [ ]: #in descending order

In [ ]: # create a series which wil have length of each name



In [1]: import pandas as pd

In [2]: s = [Link](['Viraj','Sushen','Tasmeen','Abhishek','Pallavi','Onkar','Sachin','Sanja
s

Out[2]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [3]: # check first 3 record


[Link](3)

Out[3]: 0 Viraj
1 Sushen
2 Tasmeen
dtype: object

In [4]: s[:3]

Out[4]: 0 Viraj
1 Sushen
2 Tasmeen
dtype: object

In [5]: # fetch last record


[Link](1)

Out[5]: 7 Sanjay
dtype: object

In [7]: # plz sort the names in ascending order


s.sort_values()

Out[7]: 3 Abhishek
5 Onkar
4 Pallavi
6 Sachin
7 Sanjay
1 Sushen
2 Tasmeen
0 Viraj
dtype: object
In [8]: # in above case index is un ordered
s.sort_values(ignore_index=True)

Out[8]: 0 Abhishek
1 Onkar
2 Pallavi
3 Sachin
4 Sanjay
5 Sushen
6 Tasmeen
7 Viraj
dtype: object

In [10]: s.sort_values().sort_index()

Out[10]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object
In [11]: print(dir(s))
['T', '_AXIS_LEN', '_AXIS_ORDERS', '_AXIS_REVERSED', '_AXIS_TO_AXIS_NUMBER', '_HANDL
ED_TYPES', '__abs__', '__add__', '__and__', '__annotations__', '__array__', '__array
_priority__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__conta
ins__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir
__', '__divmod__', '__doc__', '__eq__', '__finalize__', '__float__', '__floordiv__',
'__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstat
e__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__imod__', '__
imul__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipo
w__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lo
ng__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg_
_', '__new__', '__nonzero__', '__or__', '__pos__', '__pow__', '__radd__', '__rand_
_', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rm
atmul__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rsub__', '_
_rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof_
_', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '__xor_
_', '_accessors', '_accum_func', '_add_numeric_operations', '_agg_by_level', '_agg_e
xamples_doc', '_agg_see_also_doc', '_align_frame', '_align_series', '_arith_method',
'_as_manager', '_attrs', '_binop', '_can_hold_na', '_check_inplace_and_allows_duplic
ate_labels', '_check_inplace_setting', '_check_is_chained_assignment_possible', '_ch
eck_label_or_level_ambiguity', '_check_setitem_copy', '_clear_item_cache', '_clip_wi
th_one_bound', '_clip_with_scalar', '_cmp_method', '_consolidate', '_consolidate_inp
lace', '_construct_axes_dict', '_construct_axes_from_arguments', '_construct_resul
t', '_constructor', '_constructor_expanddim', '_convert', '_convert_dtypes', '_dat
a', '_dir_additions', '_dir_deletions', '_drop_axis', '_drop_labels_or_levels', '_du
plicated', '_find_valid_index', '_flags', '_from_mgr', '_get_axis', '_get_axis_nam
e', '_get_axis_number', '_get_axis_resolvers', '_get_block_manager_axis', '_get_bool
_data', '_get_cacher', '_get_cleaned_column_resolvers', '_get_index_resolvers', '_ge
t_label_or_level_values', '_get_numeric_data', '_get_value', '_get_values', '_get_va
lues_tuple', '_get_with', '_gotitem', '_hidden_attrs', '_index', '_indexed_same', '_
info_axis', '_info_axis_name', '_info_axis_number', '_init_dict', '_init_mgr', '_inp
lace_method', '_internal_names', '_internal_names_set', '_is_cached', '_is_copy', '_
is_label_or_level_reference', '_is_label_reference', '_is_level_reference', '_is_mix
ed_type', '_is_view', '_item_cache', '_ixs', '_logical_func', '_logical_method', '_m
ap_values', '_maybe_update_cacher', '_memory_usage', '_metadata', '_mgr', '_min_coun
t_stat_function', '_name', '_needs_reindex_multi', '_protect_consolidate', '_reduc
e', '_reindex_axes', '_reindex_indexer', '_reindex_multi', '_reindex_with_indexers',
'_replace_single', '_repr_data_resource_', '_repr_latex_', '_reset_cache', '_reset_c
acher', '_set_as_cached', '_set_axis', '_set_axis_name', '_set_axis_nocheck', '_set_
is_copy', '_set_labels', '_set_name', '_set_value', '_set_values', '_set_with', '_se
t_with_engine', '_slice', '_stat_axis', '_stat_axis_name', '_stat_axis_number', '_st
at_function', '_stat_function_ddof', '_take_with_is_copy', '_typ', '_update_inplac
e', '_validate_dtype', '_values', '_where', 'abs', 'add', 'add_prefix', 'add_suffi
x', 'agg', 'aggregate', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmi
n', 'argsort', 'array', 'asfreq', 'asof', 'astype', 'at', 'at_time', 'attrs', 'autoc
orr', 'axes', 'backfill', 'between', 'between_time', 'bfill', 'bool', 'clip', 'combi
ne', 'combine_first', 'compare', 'convert_dtypes', 'copy', 'corr', 'count', 'cov',
'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'div', 'divide', 'divmo
d', 'dot', 'drop', 'drop_duplicates', 'droplevel', 'dropna', 'dtype', 'dtypes', 'dup
licated', 'empty', 'eq', 'equals', 'ewm', 'expanding', 'explode', 'factorize', 'ffil
l', 'fillna', 'filter', 'first', 'first_valid_index', 'flags', 'floordiv', 'ge', 'ge
t', 'groupby', 'gt', 'hasnans', 'head', 'hist', 'iat', 'idxmax', 'idxmin', 'iloc',
'index', 'infer_objects', 'interpolate', 'is_monotonic', 'is_monotonic_decreasing',
'is_monotonic_increasing', 'is_unique', 'isin', 'isna', 'isnull', 'item', 'items',
'iteritems', 'keys', 'kurt', 'kurtosis', 'last', 'last_valid_index', 'le', 'loc', 'l
t', 'mad', 'map', 'mask', 'max', 'mean', 'median', 'memory_usage', 'min', 'mod', 'mo
de', 'mul', 'multiply', 'name', 'nbytes', 'ndim', 'ne', 'nlargest', 'notna', 'notnul
l', 'nsmallest', 'nunique', 'pad', 'pct_change', 'pipe', 'plot', 'pop', 'pow', 'pro
d', 'product', 'quantile', 'radd', 'rank', 'ravel', 'rdiv', 'rdivmod', 'reindex', 'r
eindex_like', 'rename', 'rename_axis', 'reorder_levels', 'repeat', 'replace', 'resam
ple', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsu
b', 'rtruediv', 'sample', 'searchsorted', 'sem', 'set_axis', 'set_flags', 'shape',
'shift', 'size', 'skew', 'slice_shift', 'sort_index', 'sort_values', 'squeeze', 'st
d', 'str', 'sub', 'subtract', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_cl
ipboard', 'to_csv', 'to_dict', 'to_excel', 'to_frame', 'to_hdf', 'to_json', 'to_late
x', 'to_list', 'to_markdown', 'to_numpy', 'to_period', 'to_pickle', 'to_sql', 'to_st
ring', 'to_timestamp', 'to_xarray', 'transform', 'transpose', 'truediv', 'truncate',
'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'value_counts', 'value
s', 'var', 'view', 'where', 'xs']

In [16]: # convert series to csv file


s.to_csv('[Link]',index=False)

In [14]: s

Out[14]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [17]: s2 = [Link](['Suraj','Sarika','Shavez','Sheela','Jyoti','Sandip'])
s2

Out[17]: 0 Suraj
1 Sarika
2 Shavez
3 Sheela
4 Jyoti
5 Sandip
dtype: object

In [18]: # lets combine series


[Link](s2)

Out[18]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
0 Suraj
1 Sarika
2 Shavez
3 Sheela
4 Jyoti
5 Sandip
dtype: object
In [19]: # to make proper indexing
[Link](s2,ignore_index=True)

Out[19]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
8 Suraj
9 Sarika
10 Shavez
11 Sheela
12 Jyoti
13 Sandip
dtype: object

In [21]: # apply() Invoke function on values of Series.


[Link](len)

Out[21]: 0 5
1 6
2 7
3 8
4 7
5 5
6 6
7 6
dtype: int64

In [23]: # all names needed in upper case


[Link]([Link])

Out[23]: 0 VIRAJ
1 SUSHEN
2 TASMEEN
3 ABHISHEK
4 PALLAVI
5 ONKAR
6 SACHIN
7 SANJAY
dtype: object

In [27]: [Link](lambda nm:'Mr.'+ nm)

Out[27]: 0 [Link]
1 [Link]
2 [Link]
3 [Link]
4 [Link]
5 [Link]
6 [Link]
7 [Link]
dtype: object
In [28]: s

Out[28]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [31]: # findout people name ending with n


[Link]('n')

Out[31]: 0 False
1 True
2 True
3 False
4 False
5 False
6 True
7 False
dtype: bool

In [32]: s[[Link]('n')]

Out[32]: 1 Sushen
2 Tasmeen
6 Sachin
dtype: object

In [33]: #fetch name with length of name > 6

Out[33]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [35]: [Link](len)>6

Out[35]: 0 False
1 False
2 True
3 True
4 True
5 False
6 False
7 False
dtype: bool

In [36]: s[[Link](len)>6]

Out[36]: 2 Tasmeen
3 Abhishek
4 Pallavi
dtype: object
In [38]: s.sort_values()

Out[38]: 3 Abhishek
5 Onkar
4 Pallavi
6 Sachin
7 Sanjay
1 Sushen
2 Tasmeen
0 Viraj
dtype: object

In [39]: s.sort_values(ascending=False)

Out[39]: 0 Viraj
2 Tasmeen
1 Sushen
7 Sanjay
6 Sachin
4 Pallavi
5 Onkar
3 Abhishek
dtype: object

In [45]: # take names without any vowels


'Viraj'.replace('a','')

Out[45]: 'Virj'

In [47]: [Link](map({'a':'','e':'','i':'','o':'','u':''}))

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4884/[Link] in <module>
----> 1 [Link](map({'a':'','e':'','i':'','o':'','u':''}))

TypeError: map() must have at least two arguments.

In [49]: [Link]({'Viraj':1})

Out[49]: 0 1.0
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
dtype: float64

In [50]: [Link]([Link](i,'') if i in ['a','e','i','o','u'])

File "C:\Users\hakim\AppData\Local\Temp/ipykernel_4884/[Link]", line 1


[Link]([Link](i,'') if i in ['a','e','i','o','u'])
^
SyntaxError: invalid syntax
In [54]: for i in s:
for nm in i:
print(nm,end='')
print()

Viraj
Sushen
Tasmeen
Abhishek
Pallavi
Onkar
Sachin
Sanjay

In [58]: [Link](lambda x:[Link]().replace('a','').replace('e','').replace('i','').replace('o

Out[58]: 0 vrj
1 sshn
2 tsmn
3 bhshk
4 pllv
5 nkr
6 schn
7 snjy
dtype: object

In [59]: s

Out[59]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [61]: # reverse each name


[Link](lambda nm:nm[::-1])

Out[61]: 0 jariV
1 nehsuS
2 neemsaT
3 kehsihbA
4 ivallaP
5 raknO
6 nihcaS
7 yajnaS
dtype: object

In [67]: for i in [Link](reversed):


print(''.join(list(i)))

jariV
nehsuS
neemsaT
kehsihbA
ivallaP
raknO
nihcaS
yajnaS
In [68]: [Link]([Link])

Out[68]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [70]: #access alternate name


s[::2]

Out[70]: 0 Viraj
2 Tasmeen
4 Pallavi
6 Sachin
dtype: object

In [76]: s[::2]

Out[76]: 0 Viraj
1 Sushen
2 Tasmeen
3 Abhishek
4 Pallavi
5 Onkar
6 Sachin
7 Sanjay
dtype: object

In [77]: g = [Link](['123-ABC','456-PQR'])
g

Out[77]: 0 123-ABC
1 456-PQR
dtype: object

In [78]: [Link](lambda x:x[::-1])

Out[78]: 0 jariV
1 nehsuS
2 neemsaT
3 kehsihbA
4 ivallaP
5 raknO
6 nihcaS
7 yajnaS
dtype: object
In [82]: [Link]({'Viraj':4764})

Out[82]: 0 4764.0
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
dtype: float64

In [ ]: ​
Series
In [1]: import pandas as pd

In [ ]: #Series(data,index,dtype)

In [6]: d = [10,20,30,40]
ind = ['A','B','C','D']
[Link](data=d,index=ind)

Out[6]: A 10
B 20
C 30
D 40
dtype: int64

In [4]: # positional args and swaping the objects


[Link](ind,d)

Out[4]: 10 A
20 B
30 C
40 D
dtype: object

In [7]: # lets play with dtype


[Link](d) #default dtype is int64

Out[7]: 0 10
1 20
2 30
3 40
dtype: int64

In [8]: #lets change dtype


[Link](d,dtype='float')

Out[8]: 0 10.0
1 20.0
2 30.0
3 40.0
dtype: float64

In [9]: [Link](d,dtype='float32')

Out[9]: 0 10.0
1 20.0
2 30.0
3 40.0
dtype: float32

In [10]: [Link](d,dtype='f')

Out[10]: 0 10.0
1 20.0
2 30.0
3 40.0
dtype: float32
In [13]: # float64
[Link](d,dtype='f8')

Out[13]: 0 10.0
1 20.0
2 30.0
3 40.0
dtype: float64

In [19]: # using numpy


import numpy as np
[Link](d,dtype=np.float32)

Out[19]: 0 10.0
1 20.0
2 30.0
3 40.0
dtype: float32

In [21]: import numpy as np


[Link](d,dtype=np.str_)

Out[21]: 0 10
1 20
2 30
3 40
dtype: object

In [22]: # if series is already created and i want to change dtype????


sf = [Link](d)
sf

Out[22]: 0 10
1 20
2 30
3 40
dtype: int64

In [23]: #convert dtype


[Link]('str')
# astype is temp. -- it gives only conversion

Out[23]: 0 10
1 20
2 30
3 40
dtype: object

In [24]: [Link]('object')

Out[24]: 0 10
1 20
2 30
3 40
dtype: object
In [25]: sf #original sf is unchanged

Out[25]: 0 10
1 20
2 30
3 40
dtype: int64

In [27]: print(dir(sf))

['T', '_AXIS_LEN', '_AXIS_ORDERS', '_AXIS_REVERSED', '_AXIS_TO_AXIS_NUMBER', '_HAN


DLED_TYPES', '__abs__', '__add__', '__and__', '__annotations__', '__array__', '__a
rray_priority__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '_
_contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict_
_', '__dir__', '__divmod__', '__doc__', '__eq__', '__finalize__', '__float__', '__
floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem
__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv_
_', '__imod__', '__imul__', '__init__', '__init_subclass__', '__int__', '__invert_
_', '__ior__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__
le__', '__len__', '__long__', '__lt__', '__matmul__', '__mod__', '__module__', '__
mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pos__', '__pow
__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr
__', '__rfloordiv__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__round_
_', '__rpow__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem_
_', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__tru
ediv__', '__weakref__', '__xor__', '_accessors', '_accum_func', '_add_numeric_oper
ations', '_agg_by_level', '_agg_examples_doc', '_agg_see_also_doc', '_align_fram
e', '_align_series', '_arith_method', '_as_manager', '_attrs', '_binop', '_can_hol
d_na', '_check_inplace_and_allows_duplicate_labels', '_check_inplace_setting', '_c
heck is chained assignment possible' ' check label or level ambiguity' ' check s
In [28]: sf

Out[28]: 0 10
1 20
2 30
3 40
dtype: int64

In [29]: [Link]

Out[29]: (4,)

In [30]: [Link]

Out[30]: 1

DataFrame
# 2D data structure
# composed of rows and columns
# We can create multiple rows and multiple columns with different data types
# powerful option for data science and analysis
# it contains so many options for selection, filteration, merging, deletion..
# Dataframe is a combination of multiple series

In [31]: import pandas as pd


In [32]: # Empty dataFrame
print([Link]())

Empty DataFrame
Columns: []
Index: []

# Structure of Dataframe
[Link]()
"""
(data=None,
index: 'Axes | None' = None,
columns: 'Axes | None' = None,
dtype: 'Dtype | None' = None,
copy: 'bool | None' = None,
)
"""

Creation of df
# Using: list, tuple,set, dict, numpy,series...

In [33]: [Link]([10,20,30,40])

Out[33]: 0

0 10

1 20

2 30

3 40

In [35]: [Link]([10,20,30,40]).shape

Out[35]: (4, 1)

In [36]: [Link]([10,20,30,40]).ndim

Out[36]: 2

In [34]: [Link]([10,20,30,40])

Out[34]: 0 10
1 20
2 30
3 40
dtype: int64

In [37]: [Link]([10,20,30,40]).shape

Out[37]: (4,)

In [38]: [Link]([10,20,30,40]).ndim

Out[38]: 1
In [39]: # list of list
k = [[1,2],[3,4]]
[Link](k)
# each internal list becomes a row

Out[39]: 0 1

0 1 2

1 3 4

In [40]: # Lets change index and column name


[Link](k,index=[101,102],columns=['Data_1','Data_2'])

Out[40]: Data_1 Data_2

101 1 2

102 3 4

In [42]: # positional arguments


#[Link](k,[101,102],['Data_1','Data_2'])
[Link](k,['Data_1','Data_2'],[101,201])

Out[42]: 101 201

Data_1 1 2

Data_2 3 4

# but if we interchange positions


#it will change output

In [43]: # create a df using a tuple


t = 7,8,9 # packing of data
[Link](t)

Out[43]: 0

0 7

1 8

2 9

In [44]: 7,8,9

Out[44]: (7, 8, 9)
In [49]: # it accepts homo. Hetro values
#[Link]([10,20])
#[Link]([10,20,30,40.])
[Link]([10,'20','A','45',67,'B'])

Out[49]: 0

0 10

1 20

2 A

3 45

4 67

5 B

In [50]: # Using set


s = {(1,2,3),(1,2,5)}
# one tuple is one row
[Link](s)

Out[50]: 0 1 2

0 1 2 3

1 1 2 5

In [51]: s = {2,19,10,1,0,99,4}
[Link](s)

Out[51]: 0

0 0

1 1

2 2

3 99

4 4

5 19

6 10

In [57]: # Using Dict


d = {'name':['Ashok','Seema'],'age':[23,24],'place':['pune','sangli']}
[Link](d)
# key become a column label
# values will be added with respect to keys

Out[57]: name age place

0 Ashok 23 pune

1 Seema 24 sangli
In [53]: d = {'name':['Ashok','Seema'],'age':[23,24,45],'place':['pune','sangli']}
[Link](d)

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_7584/[Link] in <module>
1 d = {'name':['Ashok','Seema'],'age':[23,24,45],'place':['pune','sangli']}
----> 2 [Link](d)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\[Link] in __init__(self,
data, index, columns, dtype, copy)
612 elif isinstance(data, dict):
613 # GH#38939 de facto copy defaults to False only in non-dict ca
ses
--> 614 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=cop
y, typ=manager)
615 elif isinstance(data, [Link]):
616 import [Link] as mrecords

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\[Link] i
n dict_to_mgr(data, index, columns, dtype, typ, copy)
462 # TODO: can we get rid of the dt64tz special case above?
463
In [62]: dg = [Link]([Link](20,45,5),columns=['age'])
dg

Out[62]: age

0 28

1 28

2 31

3 30

4 30

In [65]: # age = 28
dg[[Link] == 28]

Out[65]: age

0 28

1 28

In [66]: [Link]('age==28')

Out[66]: age

0 28

1 28

In [67]: # Using Series


s = [Link]([120,300,500,750])
s

Out[67]: 0 120
1 300
2 500
3 750
dtype: int64
In [68]: [Link](s)

Out[68]: 0

0 120

1 300

2 500

3 750

In [69]: # Using numpy


import numpy as np
n = [Link](5)
n

Out[69]: array([0.53627799, 0.81381906, 0.35778332, 0.75316441, 0.94354277])

In [70]: n = [Link]((5,4))
n

Out[70]: array([[0.48607796, 0.68671019, 0.37005585, 0.18746466],


[0.8645104 , 0.19885713, 0.49144365, 0.15115642],
[0.45557155, 0.41497094, 0.18698425, 0.16966141],
[0.46457068, 0.06416644, 0.70635964, 0.9814973 ],
[0.42752716, 0.29029562, 0.72440363, 0.98770071]])

In [71]: [Link](n)

Out[71]: 0 1 2 3

0 0.486078 0.686710 0.370056 0.187465

1 0.864510 0.198857 0.491444 0.151156

2 0.455572 0.414971 0.186984 0.169661

3 0.464571 0.064166 0.706360 0.981497

4 0.427527 0.290296 0.724404 0.987701

In [76]: # labeling the column


[Link](n,columns=['a','v','c','d'])

Out[76]: a v c d

0 0.486078 0.686710 0.370056 0.187465

1 0.864510 0.198857 0.491444 0.151156

2 0.455572 0.414971 0.186984 0.169661

3 0.464571 0.064166 0.706360 0.981497

4 0.427527 0.290296 0.724404 0.987701


In [77]: #u may keep same column name
[Link](n,columns=['a','v','c','a'])

Out[77]: a v c a

0 0.486078 0.686710 0.370056 0.187465

1 0.864510 0.198857 0.491444 0.151156

2 0.455572 0.414971 0.186984 0.169661

3 0.464571 0.064166 0.706360 0.981497

4 0.427527 0.290296 0.724404 0.987701

In [80]: g = [Link]([10,20,30],[10,20,10])
g[10]

Out[80]: 10 10
10 30
dtype: int64

In [83]: # lets create a dataframe with different data types


y = [Link]({'Name':['A','V','D','A','F','A'],
'Age':[23,45,60,23,18,90],
'salary':[25.,45.,67.,66,55,89]})
y

Out[83]: Name Age salary

0 A 23 25.0

1 V 45 45.0

2 D 60 67.0

3 A 23 66.0

4 F 18 55.0

5 A 90 89.0

In [84]: [Link]

Out[84]: Name object


Age int64
salary float64
dtype: object

In [86]: # need a summary of dataframe


# return /Print a concise summary of a DataFrame.
[Link]()

<class '[Link]'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 6 non-null object
1 Age 6 non-null int64
2 salary 6 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 272.0+ bytes
In [87]: # Descriptive statistics
[Link]() #default works on numeric data

Out[87]: Age salary

count 6.000000 6.000000

mean 43.166667 57.833333

std 28.024394 21.784551

min 18.000000 25.000000

25% 23.000000 47.500000

50% 34.000000 60.500000

75% 56.250000 66.750000

max 90.000000 89.000000

In [88]: # if we want to check desciption of object column


[Link](include='object')

Out[88]: Name

count 6

unique 4

top A

freq 3

In [91]: [Link]()

Out[91]: array(['A', 'V', 'D', 'F'], dtype=object)

In [92]: [Link]()

Out[92]: 4

In [94]: [Link].value_counts()
# it counts number of occurances of each category

Out[94]: A 3
V 1
D 1
F 1
Name: Name, dtype: int64

In [ ]: ​
In [1]: import numpy as np
import pandas as pd
import [Link] as plt

In [7]: # load dataset


#pd.read_csv(r'C:\Users\hakim\Downloads\[Link]')
df = pd.read_csv('C:\\Users\hakim\Downloads\[Link]')
df

Out[7]: FirstName New


Job Title Email Address Duration Country Empty
LastName column

5247-
Chadwick 1126-03-27 4084-
0 Cashier Chadwick_Gordon5732@[Link] Gabon NaN
Gordon [Link]Z 7638-
0340

3722-
Healthcare 9490-11-24 3713-
1 Gil_Lindop9421@[Link] Gil Lindop Germany NaN
Specialist [Link]Z 3784-
4667

2101-
7708-12-09 6418-
2 Audiologist Lillian_Burge7970@[Link] Lillian Burge Egypt NaN
[Link]Z 4041-
2162

0720-
Cedrick 4121-06-08 6612-
3 Clerk Cedrick_Farrant4152@[Link] France NaN
Farrant [Link]Z 6768-
4737

6625-
Healthcare Leslie 0893-05-29 7785-
4 Leslie_Wright2861@[Link] Seychelles NaN
Specialist Wright [Link]Z 7435-
0444

... ... ... ... ... ... ... ...

0532-
Lucy 4529-07-05 1318-
995 Webmaster Lucy_Whatson4266@[Link] Barbados NaN
Whatson [Link]Z 2153-
2268

6810-
Healthcare 1296-08-27 Saudi 6415-
996 Rose_Kirby6758@[Link] Rose Kirby NaN
Specialist [Link]Z Arabia 0575-
0738

4171-
Insurance 0704-08-08 2722-
997 Logan_Silva1028@[Link] Logan Silva Singapore NaN
Broker [Link]Z 8456-
1171

7571-
HR 2871-01-09 Antigua and 3307-
998 Aileen_Wise351@[Link] Aileen Wise NaN
Specialist [Link]Z Barbuda 6622-
5084

7173-
HR Regina 1688-01-25 Equatorial 6042-
999 Regina_Grey6872@[Link] NaN
Coordinator Grey [Link]Z Guinea 6326-
1836

1000 rows × 7 columns

In [8]: # check column names


[Link]

Out[8]: Index(['Job Title', 'Email Address', 'FirstName LastName', 'Duration',


'Country', 'Empty', 'New column'],
dtype='object')
In [13]: # as we can see there is a need to change column names
# use rename function
# change Job Title
[Link](columns={'Job Title':'JobTitle'},inplace= True)

In [14]: [Link]

Out[14]: Index(['JobTitle', 'Email Address', 'FirstName LastName', 'Duration',


'Country', 'Empty', 'New column'],
dtype='object')

In [15]: # 2nd apporach to change column


[Link] = ['JobTitle', 'EmailAddress', 'f_l_name', 'Duration',
'Country', 'Empty', 'Credit_card']

In [16]: [Link]

Out[16]: Index(['JobTitle', 'EmailAddress', 'f_l_name', 'Duration', 'Country', 'Empty',


'Credit_card'],
dtype='object')

In [10]: # Access single column using dot . operator


[Link]

Out[10]: 0 Gabon
1 Germany
2 Egypt
3 France
4 Seychelles
...
995 Barbados
996 Saudi Arabia
997 Singapore
998 Antigua and Barbuda
999 Equatorial Guinea
Name: Country, Length: 1000, dtype: object

In [20]: # another option to access column is


df['JobTitle']

Out[20]: 0 Cashier
1 Healthcare Specialist
2 Audiologist
3 Clerk
4 Healthcare Specialist
...
995 Webmaster
996 Healthcare Specialist
997 Insurance Broker
998 HR Specialist
999 HR Coordinator
Name: JobTitle, Length: 1000, dtype: object
In [22]: # if we want to access multiple columns
df[['JobTitle','Duration']]

Out[22]: JobTitle Duration

0 Cashier 1126-03-27 [Link]Z

1 Healthcare Specialist 9490-11-24 [Link]Z

2 Audiologist 7708-12-09 [Link]Z

3 Clerk 4121-06-08 [Link]Z

4 Healthcare Specialist 0893-05-29 [Link]Z

... ... ...

995 Webmaster 4529-07-05 [Link]Z

996 Healthcare Specialist 1296-08-27 [Link]Z

997 Insurance Broker 0704-08-08 [Link]Z

998 HR Specialist 2871-01-09 [Link]Z

999 HR Coordinator 1688-01-25 [Link]Z

1000 rows × 2 columns

# df. ==> series


# df[] ==> series
# df[[]] ==> DataFrame

now lets work on rows


In [27]: [Link] # it gives all records

Out[27]: 0 Cashier
1 Healthcare Specialist
2 Audiologist
3 Clerk
4 Healthcare Specialist
...
995 Webmaster
996 Healthcare Specialist
997 Insurance Broker
998 HR Specialist
999 HR Coordinator
Name: JobTitle, Length: 1000, dtype: object

In [26]: # indexing over series


[Link][0]

Out[26]: 'Cashier'

In [31]: [Link](1) #returns series

Out[31]: 0 Cashier
Name: JobTitle, dtype: object
In [28]: # multitple elements using slicing
[Link][:10]

Out[28]: 0 Cashier
1 Healthcare Specialist
2 Audiologist
3 Clerk
4 Healthcare Specialist
5 Auditor
6 Cashier
7 CNC Operator
8 Staffing Consultant
9 Retail Trainee
Name: JobTitle, dtype: object

In [30]: # above output i want in Df


df[['JobTitle']][:10]

Out[30]: JobTitle

0 Cashier

1 Healthcare Specialist

2 Audiologist

3 Clerk

4 Healthcare Specialist

5 Auditor

6 Cashier

7 CNC Operator

8 Staffing Consultant

9 Retail Trainee

In [34]: [Link][:5][::-1]

Out[34]: 4 Seychelles
3 France
2 Egypt
1 Germany
0 Gabon
Name: Country, dtype: object

In [37]: [Link][3]

Out[37]: 'France'

In [38]: # replace france by India


[Link][3] = 'India'

C:\Users\hakim\AppData\Local\Temp/ipykernel_8480/[Link]: SettingWithCopyWar
ning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: [Link]


ser_guide/[Link]#returning-a-view-versus-a-copy ([Link]
ndas-docs/stable/user_guide/[Link]#returning-a-view-versus-a-copy)
[Link][3] = 'India'
In [39]: [Link]

Out[39]: 0 Gabon
1 Germany
2 Egypt
3 India
4 Seychelles
...
995 Barbados
996 Saudi Arabia
997 Singapore
998 Antigua and Barbuda
999 Equatorial Guinea
Name: Country, Length: 1000, dtype: object

In [41]: # Access dataframe and 2-3 columns


df[['f_l_name','Country','JobTitle']]

Out[41]: f_l_name Country JobTitle

0 Chadwick Gordon Gabon Cashier

1 Gil Lindop Germany Healthcare Specialist

2 Lillian Burge Egypt Audiologist

3 Cedrick Farrant India Clerk

4 Leslie Wright Seychelles Healthcare Specialist

... ... ... ...

995 Lucy Whatson Barbados Webmaster

996 Rose Kirby Saudi Arabia Healthcare Specialist

997 Logan Silva Singapore Insurance Broker

998 Aileen Wise Antigua and Barbuda HR Specialist

999 Regina Grey Equatorial Guinea HR Coordinator

1000 rows × 3 columns


In [45]: # fetch 50% records
#df[['f_l_name','Country','JobTitle']][:500]
#df[['f_l_name','Country','JobTitle']].head(500)
#df[['f_l_name','Country','JobTitle']][::2]
#df[['f_l_name','Country','JobTitle']].tail(500)

Out[45]: f_l_name Country JobTitle

500 Havana Gray Turkmenistan Systems Administrator

501 Leah Villiger Moldova Clerk

502 Matthew Addley Rwanda Baker

503 Stacy Benson Antigua and Barbuda Ambulatory Nurse

504 Liliana Baker Trinidad and Tobago Lecturer

... ... ... ...

995 Lucy Whatson Barbados Webmaster

996 Rose Kirby Saudi Arabia Healthcare Specialist

997 Logan Silva Singapore Insurance Broker

998 Aileen Wise Antigua and Barbuda HR Specialist

999 Regina Grey Equatorial Guinea HR Coordinator

500 rows × 3 columns

Access the columns using loc(location) and iloc(int


loc)
loc: is label based selection/column names we can use for selection
purpose

In [51]: #df[:,:] it wont work on df


# solution??
[Link][:,:] #[row,column]

Out[51]: JobTitle EmailAddress f_l_name Duration Country Empty Credit_card

1126-03-
Chadwick 5247-4084-
0 Cashier Chadwick_Gordon5732@[Link] 27 Gabon NaN
Gordon 7638-0340
[Link]Z

9490-11-
Healthcare 3722-3713-
1 Gil_Lindop9421@[Link] Gil Lindop 24 Germany NaN
Specialist 3784-4667
[Link]Z

7708-12-
Lillian 2101-6418-
2 Audiologist Lillian_Burge7970@[Link] 09 Egypt NaN
Burge 4041-2162
[Link]Z

4121-06-
Cedrick 0720-6612-
3 Clerk Cedrick_Farrant4152@[Link] 08 India NaN
Farrant 6768-4737
[Link]Z

0893-05-
Healthcare Leslie 6625-7785-
4 Leslie_Wright2861@[Link] 29 Seychelles NaN
Specialist Wright 7435-0444
[Link]Z

... ... ... ... ... ... ... ...

4529-07-
Lucy 0532-1318-
995 Webmaster Lucy_Whatson4266@[Link] 05 Barbados NaN
Whatson 2153-2268
[Link]Z

1296-08-
Healthcare Rose Saudi 6810-6415-
996 Rose_Kirby6758@[Link] 27 NaN
Specialist Kirby Arabia 0575-0738
[Link]Z

0704-08-
Insurance Logan 4171-2722-
997 Logan_Silva1028@[Link] 08 Singapore NaN
Broker Silva 8456-1171
[Link]Z

2871-01- Antigua
HR Aileen 7571-3307-
998 Aileen_Wise351@[Link] 09 and NaN
Specialist Wise 6622-5084
[Link]Z Barbuda

1688-01-
HR Regina Equatorial 7173-6042-
999 Regina_Grey6872@[Link] 25 NaN
Coordinator Grey Guinea 6326-1836
[Link]Z

1000 rows × 7 columns


In [52]: # select 10 rows
[Link][:10,:]# [10 rows , all columns]
# here u can observe stop is inclusive
# hence if i want to access 10 rows then wil put 9

Out[52]: JobTitle EmailAddress f_l_name Duration Country Empty Credit_card

1126-03-
Chadwick 5247-4084-
0 Cashier Chadwick_Gordon5732@[Link] 27 Gabon NaN
Gordon 7638-0340
[Link]Z

9490-11-
Healthcare 3722-3713-
1 Gil_Lindop9421@[Link] Gil Lindop 24 Germany NaN
Specialist 3784-4667
[Link]Z

7708-12-
Lillian 2101-6418-
2 Audiologist Lillian_Burge7970@[Link] 09 Egypt NaN
Burge 4041-2162
[Link]Z

4121-06-
Cedrick 0720-6612-
3 Clerk Cedrick_Farrant4152@[Link] 08 India NaN
Farrant 6768-4737
[Link]Z

0893-05-
Healthcare Leslie 6625-7785-
4 Leslie_Wright2861@[Link] 29 Seychelles NaN
Specialist Wright 7435-0444
[Link]Z

4330-03-
Johnathan 7668-5840-
5 Auditor Johnathan_Kelly6958@[Link] 09 Malawi NaN
Kelly 4116-7784
[Link]Z

6350-05-
Oliver 2527-6370-
6 Cashier Oliver_May393@[Link] 22 Tajikistan NaN
May 6481-8305
[Link]Z

7476-02-
CNC Mandy 1161-3378-
7 Mandy_Jefferson5919@[Link] 09 Cyprus NaN
Operator Jefferson 8384-8763
[Link]Z

7556-01-
Staffing Manuel 7234-7585-
8 Manuel_Aldridge5304@[Link] 02 Chad NaN
Consultant Aldridge 6765-4063
[Link]Z

6818-06-
Retail Nicole Burkina 3447-7506-
9 Nicole_Vane2814@[Link] 03 NaN
Trainee Vane Faso 8463-8164
[Link]Z

1843-06-
Healthcare Chester 5400-7711-
10 Chester_Wills2904@[Link] 19 Denmark NaN
Specialist Wills 3768-0034
[Link]Z

In [55]: # select 10-15 index row, first 2 columns


[Link][10:15,'JobTitle':'EmailAddress']

Out[55]: JobTitle EmailAddress

10 Healthcare Specialist Chester_Wills2904@[Link]

11 Mobile Developer Erick_Redwood6161@[Link]

12 Investment Advisor Havana_Marshall9254@[Link]

13 Global Logistics Supervisor Ethan_Blythe262@[Link]

14 Banker Jacob_Emmett4946@[Link]

15 Associate Professor Alessia_Hale9847@[Link]


In [58]: # row 21:25 and column f_l_name,Country, Creditcard
[Link][21:25,'f_l_name':'Credit_card':2]

Out[58]: f_l_name Country Credit_card

21 Alma Brennan Guyana 7063-8350-8578-5531

22 Barney Dempsey New Zealand 8807-4573-2724-5220

23 Angel Mackenzie Spain 5515-8700-4838-7870

24 Analise Turner Liberia 5631-0776-5646-1144

25 Britney Weasley Morocco 0484-4647-5288-1133

In [62]: # my requirement is to access random row: 10,20,55,90


# column: Duration,JobTitle,Credit_Card
[Link][[10,20,55,90],['Duration','JobTitle','Credit_card']]

Out[62]: Duration JobTitle Credit_card

10 1843-06-19 [Link]Z Healthcare Specialist 5400-7711-3768-0034

20 1233-10-14 [Link]Z Pharmacist 5617-3685-4812-7221

55 1224-09-29 [Link]Z Treasurer 2172-5454-3475-5308

90 2717-01-21 [Link]Z Baker 5501-8051-5887-1838

In [64]: # select random 500 rows from df


df[['f_l_name','Country','JobTitle']].sample(3)

Out[64]: f_l_name Country JobTitle

484 Stephanie Nanton Taiwan Food Technologist

754 Gabriel Lindop Tajikistan IT Support Staff

415 Rufus Vollans East Timor (Timor-Leste) Designer


In [65]: [Link](frac=.5)

Out[65]: JobTitle EmailAddress f_l_name Duration Country Empty Credit_card

7641-01-
Rick 4528-2683-
560 Loan Officer Rick_Coates6953@[Link] 15 Congo NaN
Coates 1214-1305
[Link]Z

3611-08-
Staffing Angel 5515-8700-
23 Angel_Mackenzie4135@[Link] 17 Spain NaN
Consultant Mackenzie 4838-7870
[Link]Z

9194-03-
Systems Elena 8012-2427-
574 Elena_Saunders7388@[Link] 27 South Africa NaN
Administrator Saunders 8270-0606
[Link]Z

4374-11-
Ramon 8575-6072-
706 Designer Ramon_Lane4884@[Link] 25 Honduras NaN
Lane 3601-6566
[Link]Z

8621-02-
Staffing Ivy 0025-5546-
38 Ivy_Latham170@[Link] 02 Latvia NaN
Consultant Latham 4858-4740
[Link]Z

... ... ... ... ... ... ... ...

7831-02-
Grace Bosnia and 1363-8640-
695 Loan Officer Grace_Robertson5161@[Link] 18 NaN
Robertson Herzegovina 8405-3471
[Link]Z

0855-01-
Skylar 3564-4045-
472 Bookkeeper Skylar_Murray4851@[Link] 17 Dominica NaN
Murray 5888-1845
[Link]Z

2990-12-
Associate Carmella 3200-8678-
495 Carmella_Morris2386@[Link] 14 Germany NaN
Professor Morris 5731-0386
[Link]Z

0893-05-
Healthcare Leslie 6625-7785-
4 Leslie_Wright2861@[Link] 29 Seychelles NaN
Specialist Wright 7435-0444
[Link]Z

6794-05-
Health Josh 6275-7433-
170 Josh_Collins1896@[Link] 31 Micronesia NaN
Educator Collins 7282-6214
[Link]Z

500 rows × 7 columns

In [66]: # it give summary of dataset


[Link]()

<class '[Link]'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 JobTitle 1000 non-null object
1 EmailAddress 1000 non-null object
2 f_l_name 1000 non-null object
3 Duration 1000 non-null object
4 Country 1000 non-null object
5 Empty 0 non-null float64
6 Credit_card 1000 non-null object
dtypes: float64(1), object(6)
memory usage: 54.8+ KB

# column contains name/label as well as index


iloc: integer/index based access
In [67]: [Link][:,:]

Out[67]: JobTitle EmailAddress f_l_name Duration Country Empty Credit_card

1126-03-
Chadwick 5247-4084-
0 Cashier Chadwick_Gordon5732@[Link] 27 Gabon NaN
Gordon 7638-0340
[Link]Z

9490-11-
Healthcare 3722-3713-
1 Gil_Lindop9421@[Link] Gil Lindop 24 Germany NaN
Specialist 3784-4667
[Link]Z

7708-12-
Lillian 2101-6418-
2 Audiologist Lillian_Burge7970@[Link] 09 Egypt NaN
Burge 4041-2162
[Link]Z

4121-06-
Cedrick 0720-6612-
3 Clerk Cedrick_Farrant4152@[Link] 08 India NaN
Farrant 6768-4737
[Link]Z

0893-05-
Healthcare Leslie 6625-7785-
4 Leslie_Wright2861@[Link] 29 Seychelles NaN
Specialist Wright 7435-0444
[Link]Z

... ... ... ... ... ... ... ...

4529-07-
Lucy 0532-1318-
995 Webmaster Lucy_Whatson4266@[Link] 05 Barbados NaN
Whatson 2153-2268
[Link]Z

1296-08-
Healthcare Rose Saudi 6810-6415-
996 Rose_Kirby6758@[Link] 27 NaN
Specialist Kirby Arabia 0575-0738
[Link]Z

0704-08-
Insurance Logan 4171-2722-
997 Logan_Silva1028@[Link] 08 Singapore NaN
Broker Silva 8456-1171
[Link]Z

2871-01- Antigua
HR Aileen 7571-3307-
998 Aileen_Wise351@[Link] 09 and NaN
Specialist Wise 6622-5084
[Link]Z Barbuda

1688-01-
HR Regina Equatorial 7173-6042-
999 Regina_Grey6872@[Link] 25 NaN
Coordinator Grey Guinea 6326-1836
[Link]Z

1000 rows × 7 columns

In [68]: # work on rows


# incase of iloc stop is exclusive
[Link][:5,:]

Out[68]: JobTitle EmailAddress f_l_name Duration Country Empty Credit_card

Chadwick 1126-03-27 5247-4084-


0 Cashier Chadwick_Gordon5732@[Link] Gabon NaN
Gordon [Link]Z 7638-0340

Healthcare 9490-11-24 3722-3713-


1 Gil_Lindop9421@[Link] Gil Lindop Germany NaN
Specialist [Link]Z 3784-4667

Lillian 7708-12-09 2101-6418-


2 Audiologist Lillian_Burge7970@[Link] Egypt NaN
Burge [Link]Z 4041-2162

Cedrick 4121-06-08 0720-6612-


3 Clerk Cedrick_Farrant4152@[Link] India NaN
Farrant [Link]Z 6768-4737

Healthcare Leslie 0893-05-29 6625-7785-


4 Leslie_Wright2861@[Link] Seychelles NaN
Specialist Wright [Link]Z 7435-0444
In [70]: #work on columns: 1st 3 columns
[Link][:5,:3] #stop is exclusive

Out[70]: JobTitle EmailAddress f_l_name

0 Cashier Chadwick_Gordon5732@[Link] Chadwick Gordon

1 Healthcare Specialist Gil_Lindop9421@[Link] Gil Lindop

2 Audiologist Lillian_Burge7970@[Link] Lillian Burge

3 Clerk Cedrick_Farrant4152@[Link] Cedrick Farrant

4 Healthcare Specialist Leslie_Wright2861@[Link] Leslie Wright

In [72]: # access emailaddress and credi card


[Link][:,1::5]

Out[72]: EmailAddress Credit_card

0 Chadwick_Gordon5732@[Link] 5247-4084-7638-0340

1 Gil_Lindop9421@[Link] 3722-3713-3784-4667

2 Lillian_Burge7970@[Link] 2101-6418-4041-2162

3 Cedrick_Farrant4152@[Link] 0720-6612-6768-4737

4 Leslie_Wright2861@[Link] 6625-7785-7435-0444

... ... ...

995 Lucy_Whatson4266@[Link] 0532-1318-2153-2268

996 Rose_Kirby6758@[Link] 6810-6415-0575-0738

997 Logan_Silva1028@[Link] 4171-2722-8456-1171

998 Aileen_Wise351@[Link] 7571-3307-6622-5084

999 Regina_Grey6872@[Link] 7173-6042-6326-1836

1000 rows × 2 columns


In [73]: # reverse the columns
[Link][:,::-1]

Out[73]: Credit_card Empty Country Duration f_l_name EmailAddress JobTitle

1126-03-
5247-4084- Chadwick
0 NaN Gabon 27 Chadwick_Gordon5732@[Link] Cashier
7638-0340 Gordon
[Link]Z

9490-11-
3722-3713- Healthcare
1 NaN Germany 24 Gil Lindop Gil_Lindop9421@[Link]
3784-4667 Specialist
[Link]Z

7708-12-
2101-6418- Lillian
2 NaN Egypt 09 Lillian_Burge7970@[Link] Audiologist
4041-2162 Burge
[Link]Z

4121-06-
0720-6612- Cedrick
3 NaN India 08 Cedrick_Farrant4152@[Link] Clerk
6768-4737 Farrant
[Link]Z

0893-05-
6625-7785- Leslie Healthcare
4 NaN Seychelles 29 Leslie_Wright2861@[Link]
7435-0444 Wright Specialist
[Link]Z

... ... ... ... ... ... ... ...

4529-07-
0532-1318- Lucy
995 NaN Barbados 05 Lucy_Whatson4266@[Link] Webmaster
2153-2268 Whatson
[Link]Z

1296-08-
6810-6415- Saudi Rose Healthcare
996 NaN 27 Rose_Kirby6758@[Link]
0575-0738 Arabia Kirby Specialist
[Link]Z

0704-08-
4171-2722- Logan Insurance
997 NaN Singapore 08 Logan_Silva1028@[Link]
8456-1171 Silva Broker
[Link]Z

Antigua 2871-01-
7571-3307- Aileen HR
998 NaN and 09 Aileen_Wise351@[Link]
6622-5084 Wise Specialist
Barbuda [Link]Z

1688-01-
7173-6042- Equatorial Regina HR
999 NaN 25 Regina_Grey6872@[Link]
6326-1836 Guinea Grey Coordinator
[Link]Z

1000 rows × 7 columns

In [75]: # access random columns


[Link][[0,34,98,100],[3,0,5]]

Out[75]: Duration JobTitle Empty

0 1126-03-27 [Link]Z Cashier NaN

34 3152-07-22 [Link]Z Paramedic NaN

98 8071-06-02 [Link]Z Staffing Consultant NaN

100 0497-05-25 [Link]Z Operator NaN


In [78]: # find out count titlewise
[Link].value_counts()

Out[78]: Healthcare Specialist 28


Staffing Consultant 26
Health Educator 25
Physician 24
Machine Operator 24
CNC Operator 23
Electrician 23
Associate Professor 23
Inspector 22
Web Developer 22
Pharmacist 21
Project Manager 21
Ambulatory Nurse 21
HR Coordinator 21
Webmaster 20
Paramedic 20
IT Support Staff 20
Global Logistics Supervisor 20
Doctor 20
Lecturer 19
Systems Administrator 19
Assistant Buyer 19
Stockbroker 19
Banker 18
Operator 18
Steward 18
Clerk 18
Dentist 17
Software Engineer 17
Cook 16
Service Supervisor 16
Bellman 16
Budget Analyst 15
Laboratory Technician 15
Baker 15
Cashier 14
Fabricator 14
Call Center Representative 14
Cash Manager 14
Business Broker 14
Investment Advisor 14
Design Engineer 14
Food Technologist 13
Loan Officer 13
Mobile Developer 12
Retail Trainee 12
Bookkeeper 12
Biologist 12
Audiologist 12
Auditor 12
Production Painter 11
Accountant 11
Designer 11
Executive Director 11
Front Desk Coordinator 11
Chef Manager 11
Treasurer 11
Insurance Broker 11
HR Specialist 11
Restaurant Manager 6
Name: JobTitle, dtype: int64
In [ ]: ​

You might also like