0% found this document useful (0 votes)
54 views11 pages

NumPy W3school

This document provides a comprehensive overview of NumPy, covering its foundational theory, array creation, indexing, data types, and memory management. It also delves into various probability distributions such as normal, binomial, Poisson, and uniform distributions, along with their theoretical underpinnings and practical implementations. Additionally, it discusses universal functions (ufuncs) in NumPy, highlighting their architecture and optimization features.

Uploaded by

maytinh182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views11 pages

NumPy W3school

This document provides a comprehensive overview of NumPy, covering its foundational theory, array creation, indexing, data types, and memory management. It also delves into various probability distributions such as normal, binomial, Poisson, and uniform distributions, along with their theoretical underpinnings and practical implementations. Additionally, it discusses universal functions (ufuncs) in NumPy, highlighting their architecture and optimization features.

Uploaded by

maytinh182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

NumPy: Fundamentals, Probability Distributions,

and Universal Functions


Compiled Documentation
July 15, 2025

Contents
1 Introduction to NumPy 3
1.1 Foundational Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Creating Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Memory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Indexing & Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Access Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Copy vs View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Memory Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Probability Distributions in NumPy 5


2.1 Normal Distribution (Gaussian Distribution) . . . . . . . . . . . . . . . . 5
2.1.1 Measure-Theoretic Theory . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Combinatorial Theory . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Stochastic Processes Theory . . . . . . . . . . . . . . . . . . . . . 6
2.4 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Universal Functions in NumPy 8


3.1 Nature of ufuncs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 C-level Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Creating custom ufuncs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 PyUFunc_FromFuncAndData Mechanism . . . . . . . . . . . . . 8
3.3 Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Broadcasting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Rounding Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4.1 IEEE 754 Rounding Modes . . . . . . . . . . . . . . . . . . . . . 9
3.5 Logarithm Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.1 Floating-point arithmetic optimization . . . . . . . . . . . . . . . 9

1
4 Advanced NumPy Features 10
4.1 Structured Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Masked Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Conclusion 11

2
1 Introduction to NumPy
1.1 Foundational Theory
NumPy is built on the concept of multi-dimensional arrays (ndarray) with 3 key charac-
teristics:
• Contiguous memory layout: Data is stored continuously in memory
• Homogeneous data types: All elements have the same data type
• Vectorized operations: Operations are applied to the entire array without loops

Listing 1: Basic NumPy Import


1 import numpy as np
2 print ( " NumPy version : " , np . __version__ )
3 # This code imports NumPy and prints its version number

1.2 Creating Arrays


1.2.1 Memory Analysis
NumPy arrays use less memory than Python lists because:
• They don’t contain type information for each element
• They use fixed-size headers (typically ∼96 bytes)
• Memory alignment according to CPU requirements

Listing 2: Memory Comparison


1 arr = np . array ([1 , 2 , 3] , dtype = np . int8 ) # Only uses 3 bytes
2 lst = [1 , 2 , 3] # Uses ~28 bytes per int object
3 # This demonstrates the memory efficiency of NumPy arrays

1.3 Indexing & Slicing


1.3.1 Access Mechanism
Base pointer + Offset calculated using the formula:
offset = i × strides[0] + j × strides[1] + . . . (1)
• Slicing creates a view referencing the original buffer
• Fancy indexing creates a new copy

Listing 3: Array Slicing Example


1 data = np . arange (10 , dtype = np . int64 )
2 view = data [3:7] # Stride =8 bytes , offset =24
3 # Creates a view of elements 3 through 6 , sharing memory with
original array

3
1.4 Data Types
1.4.1 Optimization
• Type coercion: Automatic type conversion according to promotion rules

• Memory alignment: Aligned to 32/64-bit

• Structured arrays: Support for complex dtypes

Listing 4: Data Type Examples


1 # Implicit type conversion
2 arr = np . array ([1 , 2.5]) # float64
3 # Structured dtype
4 dt = np . dtype ([( ’ name ’ , ’ S10 ’) , ( ’ age ’ , ’ i4 ’) ])
5 # The first example shows automatic promotion to float64
6 # The second example creates a structured data type with name and
age fields

1.5 Copy vs View


1.5.1 Memory Mechanism
• View shares buffer memory:

– base attribute points to the original array

• Copy creates a new buffer:

– Uses O(n) memory and time

Listing 5: View vs Copy Example


1 a = np . arange (4) # Buffer : [0 , 1 , 2 , 3]
2 b = a [1:3] # View : shares memory with a
3 c = a [1:3]. copy () # Copy : independent memory
4 print ( b . base is a ) # True - b is a view of a
5 print ( c . base is a ) # False - c is a copy
6 # This demonstrates the difference between views and copies

4
2 Probability Distributions in NumPy
2.1 Normal Distribution (Gaussian Distribution)
2.1.1 Measure-Theoretic Theory
Defined by the density function:
1 (x−µ)2
f (x) = √ e− 2σ2 (2)
σ 2π
Important properties:
• Central Limit Theorem (CLT)

• Infinitely differentiable
R∞ √
• Integral −∞ e−x dx = π
2

Listing 6: Normal Distribution Sampling and Visualization


1 import numpy as np
2 import seaborn as sns
3 import matplotlib . pyplot as plt
4

5 mu , sigma = 0 , 0.1
6 samples = np . random . normal ( mu , sigma , 10000)
7

8 # Visualize KDE
9 sns . kdeplot ( samples , bw_adjust =0.5)
10 plt . title ( ’ Normal Distribution ( Gaussian ) ’)
11 plt . show ()
12 # This code generates 10 ,000 samples from a normal distribution
13 # with mean 0 and standard deviation 0.1 , then plots the density

2.2 Binomial Distribution


2.2.1 Combinatorial Theory
PMF: P (k) = nk pk (1 − p)n−k


Properties:
• E[X] = np

• V ar(X) = np(1 − p)

• Converges to Normal as n → ∞ (De Moivre-Laplace)

Listing 7: Binomial Distribution Sampling and Visualization


1 n , p = 10 , 0.5
2 samples = np . random . binomial (n , p , 1000)
3

4 # Histogram with theoretical bars

5
5 x = np . arange (0 , n +1)
6 pmf = np . math . comb (n , x ) * p ** x * (1 - p ) **( n - x )
7 sns . histplot ( samples , stat = ’ density ’ , discrete = True )
8 plt . vlines (x , 0 , pmf , colors = ’r ’ , lw =5 , alpha =0.5)
9 plt . show ()
10 # This code generates 1 ,000 samples from a binomial distribution
11 # with n =10 and p =0.5 , then compares the histogram with the
theoretical PMF

2.3 Poisson Distribution


2.3.1 Stochastic Processes Theory
Models rare events:
λk e−λ
P (k) = (3)
k!
Related to:

• Poisson process

• Exponential distribution (Time between events)

Listing 8: Poisson Distribution Sampling and Visualization


1 lambda_ = 4
2 samples = np . random . poisson ( lambda_ , 10000)
3

4 # Compare with theoretical PMF


5 k = np . arange (0 , 15)
6 pmf = np . exp ( - lambda_ ) * ( lambda_ ** k ) / np . array ([ np . math .
factorial ( i ) for i in k ])
7 sns . histplot ( samples , stat = ’ density ’ , discrete = True )
8 plt . plot (k , pmf , ’ro - ’)
9 plt . show ()
10 # This code generates 10 ,000 samples from a Poisson distribution
11 # with lambda =4 , then compares the histogram with the theoretical
PMF

2.4 Uniform Distribution


2.4.1 Measure Theory
Density function:
1
f (x) = ∀x ∈ [a, b] (4)
b−a
Properties:

• Maximum entropy when there are no constraints

• Basis for Monte Carlo methods

6
Listing 9: Uniform Distribution Example
1 # Generate uniform samples
2 a , b = 0 , 10
3 samples = np . random . uniform (a , b , 5000)
4

5 # Plot histogram and theoretical PDF


6 plt . hist ( samples , bins =30 , density = True , alpha =0.7)
7 plt . axhline ( y =1/( b - a ) , color = ’r ’ , linestyle = ’ - ’)
8 plt . title ( ’ Uniform Distribution ’)
9 plt . show ()
10 # This code generates 5 ,000 samples from a uniform distribution
11 # between 0 and 10 , and plots the histogram with the theoretical
PDF

7
3 Universal Functions in NumPy
3.1 Nature of ufuncs
3.1.1 C-level Architecture
• Ufuncs are implemented as optimized C loops

• Use type-specific loops for each data type

• Support automatic broadcasting and buffering

Listing 10: Examining ufunc Properties


1 import numpy as np
2

3 # Check ufunc information


4 add_ufunc = np . add
5 print ( " Inputs : " , add_ufunc . nin ) # 2
6 print ( " Outputs : " , add_ufunc . nout ) # 1
7 print ( " Signature : " , add_ufunc . signature ) # None element - wise
8 # This code examines the properties of the addition ufunc ,
9 # showing it takes 2 inputs and produces 1 output

3.2 Creating custom ufuncs


3.2.1 PyUFunc_FromFuncAndData Mechanism
• frompyfunc vs vectorize

• numba.vectorize for higher performance

Listing 11: Creating and Benchmarking Custom ufuncs


1 # Create ufunc from Python function
2 def custom_relu ( x ) :
3 return x if x > 0 else 0
4

5 vec_relu = np . vectorize ( custom_relu , otypes =[ np . float64 ])


6 ufunc_relu = np . frompyfunc ( custom_relu , 1 , 1)
7

8 # Benchmark
9 arr = np . random . randn (1 e6 )
10 % timeit vec_relu ( arr ) # ~500 ms
11 % timeit ufunc_relu ( arr ) # ~200 ms
12 # This code creates two versions of a ReLU function and compares
their performance
13 # frompyfunc is generally faster than vectorize for simple
operations

8
3.3 Basic Arithmetic
3.3.1 Broadcasting Rules
• Align shapes from the right

• Add 1 to missing dimensions

• Dimensions must be either 1 or equal

Listing 12: Broadcasting Example


1 A = np . array ([[1 ,2] , [3 ,4]])
2 B = np . array ([10 ,20])
3

4 # Broadcasting mechanism
5 print ( A + B ) # [[11 ,22] , [13 ,24]]
6 # This demonstrates how broadcasting works by adding a 1 D array
7 # to each row of a 2 D array automatically

3.4 Rounding Functions


3.4.1 IEEE 754 Rounding Modes
• around: Round to specified decimal places

• floor: Round down to nearest integer

• ceil: Round up

• trunc: Truncate decimal part

Listing 13: Rounding Function Examples


1 arr = np . array ([1.2345 , -2.5678])
2 print ( np . around ( arr , 2) ) # [1.23 , -2.57]
3 print ( np . floor ( arr ) ) # [1. , -3.]
4 print ( np . rint ( arr ) ) # [1. , -3.] ( round to nearest integer )
5 # This code demonstrates different rounding functions in NumPy
6 # Each with slightly different behavior for positive and negative
numbers

3.5 Logarithm Functions


3.5.1 Floating-point arithmetic optimization
• log1p(x) = ln(1+x) more accurate when x ≈ 0

• logaddexp: Calculate log(exp(a) + exp(b)) with numerical stability

9
Listing 14: Optimized Logarithm Functions
1 # Standard vs optimized logarithm
2 x_small = 1e -10
3 print ( np . log (1 + x_small ) ) # May lose precision
4 print ( np . log1p ( x_small ) ) # More accurate
5

6 # Numerical stability in log - space calculations


7 a , b = 1000 , 1000
8 # Direct calculation would overflow
9 result = np . logaddexp (a , b )
10 print ( result ) # log ( e ^1000 + e ^1000) = 1000 + log (2) 1000.693
11 # These functions provide numerical stability for calculations
12 # involving very small or very large numbers

4 Advanced NumPy Features


4.1 Structured Arrays
Structured arrays allow for complex data types with named fields:
Listing 15: Structured Arrays Example
1 # Create a structured array for personnel records
2 personnel_dtype = np . dtype ([
3 ( ’ name ’ , ’ U20 ’) ,
4 ( ’ age ’ , ’ i4 ’) ,
5 ( ’ salary ’ , ’ f8 ’) ,
6 ( ’ department ’ , ’ U20 ’)
7 ])
8

9 # Create an array with this structure


10 employees = np . array ([
11 ( ’ Alice ’ , 30 , 75000.0 , ’ Engineering ’) ,
12 ( ’ Bob ’ , 35 , 65000.0 , ’ Marketing ’) ,
13 ( ’ Charlie ’ , 45 , 85000.0 , ’ Management ’)
14 ] , dtype = personnel_dtype )
15

16 # Access by field name


17 print ( employees [ ’ name ’ ])
18 print ( employees [ ’ salary ’ ]. mean () )
19 # This demonstrates how to create and use structured arrays
20 # with named fields for complex data organization

4.2 Masked Arrays


Masked arrays allow for handling missing or invalid data:
Listing 16: Masked Arrays Example
1 from numpy import ma
2

10
3 # Create data with some invalid values
4 data = np . array ([1 , 2 , -999 , 4 , -999 , 6])
5 masked_data = ma . masked_values ( data , -999)
6

7 # Operations ignore masked values


8 print ( masked_data . mean () ) # Average of [1 , 2 , 4 , 6]
9 print ( masked_data . std () ) # Standard deviation of [1 , 2 , 4 , 6]
10

11 # Fill masked values


12 filled_data = masked_data . filled (0) # Replace with zeros
13 print ( filled_data ) # [1 , 2 , 0 , 4 , 0 , 6]
14 # Masked arrays are useful for handling missing data
15 # without affecting statistical calculations

4.3 Memory Management


Understanding memory layout is crucial for optimizing NumPy operations:
Listing 17: Memory Layout Analysis
1 # Create arrays with different memory layouts
2 C_array = np . ones ((1000 , 1000) , order = ’C ’) # Row - major
3 F_array = np . ones ((1000 , 1000) , order = ’F ’) # Column - major
4

5 # Check memory layout


6 print ( C_array . flags )
7 print ( F_array . flags )
8

9 # Performance comparison
10 % timeit C_array . sum ( axis =0) # Sum columns ( faster for C - order )
11 % timeit F_array . sum ( axis =0) # Sum columns ( faster for F - order )
12 # Understanding memory layout is important for performance
13 # operations that access memory in the same order as storage are
faster

5 Conclusion
NumPy provides a powerful foundation for numerical computing in Python through its
efficient array operations, comprehensive mathematical functions, and optimized imple-
mentation. The combination of contiguous memory layout, homogeneous data types, and
vectorized operations enables high-performance computations that would be significantly
slower using standard Python data structures.
The probability distributions module offers tools for statistical analysis and simula-
tion, while universal functions (ufuncs) provide optimized element-wise operations that
can be customized for specific applications. Understanding these core components is
essential for effective scientific computing and data analysis with Python.

11

You might also like