NumPy: Fundamentals, Probability Distributions,
and Universal Functions
Compiled Documentation
July 15, 2025
Contents
1 Introduction to NumPy 3
1.1 Foundational Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Creating Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Memory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Indexing & Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Access Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Copy vs View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Memory Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Probability Distributions in NumPy 5
2.1 Normal Distribution (Gaussian Distribution) . . . . . . . . . . . . . . . . 5
2.1.1 Measure-Theoretic Theory . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Combinatorial Theory . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Stochastic Processes Theory . . . . . . . . . . . . . . . . . . . . . 6
2.4 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Universal Functions in NumPy 8
3.1 Nature of ufuncs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 C-level Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Creating custom ufuncs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 PyUFunc_FromFuncAndData Mechanism . . . . . . . . . . . . . 8
3.3 Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Broadcasting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Rounding Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4.1 IEEE 754 Rounding Modes . . . . . . . . . . . . . . . . . . . . . 9
3.5 Logarithm Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.1 Floating-point arithmetic optimization . . . . . . . . . . . . . . . 9
1
4 Advanced NumPy Features 10
4.1 Structured Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Masked Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Conclusion 11
2
1 Introduction to NumPy
1.1 Foundational Theory
NumPy is built on the concept of multi-dimensional arrays (ndarray) with 3 key charac-
teristics:
• Contiguous memory layout: Data is stored continuously in memory
• Homogeneous data types: All elements have the same data type
• Vectorized operations: Operations are applied to the entire array without loops
Listing 1: Basic NumPy Import
1 import numpy as np
2 print ( " NumPy version : " , np . __version__ )
3 # This code imports NumPy and prints its version number
1.2 Creating Arrays
1.2.1 Memory Analysis
NumPy arrays use less memory than Python lists because:
• They don’t contain type information for each element
• They use fixed-size headers (typically ∼96 bytes)
• Memory alignment according to CPU requirements
Listing 2: Memory Comparison
1 arr = np . array ([1 , 2 , 3] , dtype = np . int8 ) # Only uses 3 bytes
2 lst = [1 , 2 , 3] # Uses ~28 bytes per int object
3 # This demonstrates the memory efficiency of NumPy arrays
1.3 Indexing & Slicing
1.3.1 Access Mechanism
Base pointer + Offset calculated using the formula:
offset = i × strides[0] + j × strides[1] + . . . (1)
• Slicing creates a view referencing the original buffer
• Fancy indexing creates a new copy
Listing 3: Array Slicing Example
1 data = np . arange (10 , dtype = np . int64 )
2 view = data [3:7] # Stride =8 bytes , offset =24
3 # Creates a view of elements 3 through 6 , sharing memory with
original array
3
1.4 Data Types
1.4.1 Optimization
• Type coercion: Automatic type conversion according to promotion rules
• Memory alignment: Aligned to 32/64-bit
• Structured arrays: Support for complex dtypes
Listing 4: Data Type Examples
1 # Implicit type conversion
2 arr = np . array ([1 , 2.5]) # float64
3 # Structured dtype
4 dt = np . dtype ([( ’ name ’ , ’ S10 ’) , ( ’ age ’ , ’ i4 ’) ])
5 # The first example shows automatic promotion to float64
6 # The second example creates a structured data type with name and
age fields
1.5 Copy vs View
1.5.1 Memory Mechanism
• View shares buffer memory:
– base attribute points to the original array
• Copy creates a new buffer:
– Uses O(n) memory and time
Listing 5: View vs Copy Example
1 a = np . arange (4) # Buffer : [0 , 1 , 2 , 3]
2 b = a [1:3] # View : shares memory with a
3 c = a [1:3]. copy () # Copy : independent memory
4 print ( b . base is a ) # True - b is a view of a
5 print ( c . base is a ) # False - c is a copy
6 # This demonstrates the difference between views and copies
4
2 Probability Distributions in NumPy
2.1 Normal Distribution (Gaussian Distribution)
2.1.1 Measure-Theoretic Theory
Defined by the density function:
1 (x−µ)2
f (x) = √ e− 2σ2 (2)
σ 2π
Important properties:
• Central Limit Theorem (CLT)
• Infinitely differentiable
R∞ √
• Integral −∞ e−x dx = π
2
Listing 6: Normal Distribution Sampling and Visualization
1 import numpy as np
2 import seaborn as sns
3 import matplotlib . pyplot as plt
4
5 mu , sigma = 0 , 0.1
6 samples = np . random . normal ( mu , sigma , 10000)
7
8 # Visualize KDE
9 sns . kdeplot ( samples , bw_adjust =0.5)
10 plt . title ( ’ Normal Distribution ( Gaussian ) ’)
11 plt . show ()
12 # This code generates 10 ,000 samples from a normal distribution
13 # with mean 0 and standard deviation 0.1 , then plots the density
2.2 Binomial Distribution
2.2.1 Combinatorial Theory
PMF: P (k) = nk pk (1 − p)n−k
Properties:
• E[X] = np
• V ar(X) = np(1 − p)
• Converges to Normal as n → ∞ (De Moivre-Laplace)
Listing 7: Binomial Distribution Sampling and Visualization
1 n , p = 10 , 0.5
2 samples = np . random . binomial (n , p , 1000)
3
4 # Histogram with theoretical bars
5
5 x = np . arange (0 , n +1)
6 pmf = np . math . comb (n , x ) * p ** x * (1 - p ) **( n - x )
7 sns . histplot ( samples , stat = ’ density ’ , discrete = True )
8 plt . vlines (x , 0 , pmf , colors = ’r ’ , lw =5 , alpha =0.5)
9 plt . show ()
10 # This code generates 1 ,000 samples from a binomial distribution
11 # with n =10 and p =0.5 , then compares the histogram with the
theoretical PMF
2.3 Poisson Distribution
2.3.1 Stochastic Processes Theory
Models rare events:
λk e−λ
P (k) = (3)
k!
Related to:
• Poisson process
• Exponential distribution (Time between events)
Listing 8: Poisson Distribution Sampling and Visualization
1 lambda_ = 4
2 samples = np . random . poisson ( lambda_ , 10000)
3
4 # Compare with theoretical PMF
5 k = np . arange (0 , 15)
6 pmf = np . exp ( - lambda_ ) * ( lambda_ ** k ) / np . array ([ np . math .
factorial ( i ) for i in k ])
7 sns . histplot ( samples , stat = ’ density ’ , discrete = True )
8 plt . plot (k , pmf , ’ro - ’)
9 plt . show ()
10 # This code generates 10 ,000 samples from a Poisson distribution
11 # with lambda =4 , then compares the histogram with the theoretical
PMF
2.4 Uniform Distribution
2.4.1 Measure Theory
Density function:
1
f (x) = ∀x ∈ [a, b] (4)
b−a
Properties:
• Maximum entropy when there are no constraints
• Basis for Monte Carlo methods
6
Listing 9: Uniform Distribution Example
1 # Generate uniform samples
2 a , b = 0 , 10
3 samples = np . random . uniform (a , b , 5000)
4
5 # Plot histogram and theoretical PDF
6 plt . hist ( samples , bins =30 , density = True , alpha =0.7)
7 plt . axhline ( y =1/( b - a ) , color = ’r ’ , linestyle = ’ - ’)
8 plt . title ( ’ Uniform Distribution ’)
9 plt . show ()
10 # This code generates 5 ,000 samples from a uniform distribution
11 # between 0 and 10 , and plots the histogram with the theoretical
PDF
7
3 Universal Functions in NumPy
3.1 Nature of ufuncs
3.1.1 C-level Architecture
• Ufuncs are implemented as optimized C loops
• Use type-specific loops for each data type
• Support automatic broadcasting and buffering
Listing 10: Examining ufunc Properties
1 import numpy as np
2
3 # Check ufunc information
4 add_ufunc = np . add
5 print ( " Inputs : " , add_ufunc . nin ) # 2
6 print ( " Outputs : " , add_ufunc . nout ) # 1
7 print ( " Signature : " , add_ufunc . signature ) # None element - wise
8 # This code examines the properties of the addition ufunc ,
9 # showing it takes 2 inputs and produces 1 output
3.2 Creating custom ufuncs
3.2.1 PyUFunc_FromFuncAndData Mechanism
• frompyfunc vs vectorize
• numba.vectorize for higher performance
Listing 11: Creating and Benchmarking Custom ufuncs
1 # Create ufunc from Python function
2 def custom_relu ( x ) :
3 return x if x > 0 else 0
4
5 vec_relu = np . vectorize ( custom_relu , otypes =[ np . float64 ])
6 ufunc_relu = np . frompyfunc ( custom_relu , 1 , 1)
7
8 # Benchmark
9 arr = np . random . randn (1 e6 )
10 % timeit vec_relu ( arr ) # ~500 ms
11 % timeit ufunc_relu ( arr ) # ~200 ms
12 # This code creates two versions of a ReLU function and compares
their performance
13 # frompyfunc is generally faster than vectorize for simple
operations
8
3.3 Basic Arithmetic
3.3.1 Broadcasting Rules
• Align shapes from the right
• Add 1 to missing dimensions
• Dimensions must be either 1 or equal
Listing 12: Broadcasting Example
1 A = np . array ([[1 ,2] , [3 ,4]])
2 B = np . array ([10 ,20])
3
4 # Broadcasting mechanism
5 print ( A + B ) # [[11 ,22] , [13 ,24]]
6 # This demonstrates how broadcasting works by adding a 1 D array
7 # to each row of a 2 D array automatically
3.4 Rounding Functions
3.4.1 IEEE 754 Rounding Modes
• around: Round to specified decimal places
• floor: Round down to nearest integer
• ceil: Round up
• trunc: Truncate decimal part
Listing 13: Rounding Function Examples
1 arr = np . array ([1.2345 , -2.5678])
2 print ( np . around ( arr , 2) ) # [1.23 , -2.57]
3 print ( np . floor ( arr ) ) # [1. , -3.]
4 print ( np . rint ( arr ) ) # [1. , -3.] ( round to nearest integer )
5 # This code demonstrates different rounding functions in NumPy
6 # Each with slightly different behavior for positive and negative
numbers
3.5 Logarithm Functions
3.5.1 Floating-point arithmetic optimization
• log1p(x) = ln(1+x) more accurate when x ≈ 0
• logaddexp: Calculate log(exp(a) + exp(b)) with numerical stability
9
Listing 14: Optimized Logarithm Functions
1 # Standard vs optimized logarithm
2 x_small = 1e -10
3 print ( np . log (1 + x_small ) ) # May lose precision
4 print ( np . log1p ( x_small ) ) # More accurate
5
6 # Numerical stability in log - space calculations
7 a , b = 1000 , 1000
8 # Direct calculation would overflow
9 result = np . logaddexp (a , b )
10 print ( result ) # log ( e ^1000 + e ^1000) = 1000 + log (2) 1000.693
11 # These functions provide numerical stability for calculations
12 # involving very small or very large numbers
4 Advanced NumPy Features
4.1 Structured Arrays
Structured arrays allow for complex data types with named fields:
Listing 15: Structured Arrays Example
1 # Create a structured array for personnel records
2 personnel_dtype = np . dtype ([
3 ( ’ name ’ , ’ U20 ’) ,
4 ( ’ age ’ , ’ i4 ’) ,
5 ( ’ salary ’ , ’ f8 ’) ,
6 ( ’ department ’ , ’ U20 ’)
7 ])
8
9 # Create an array with this structure
10 employees = np . array ([
11 ( ’ Alice ’ , 30 , 75000.0 , ’ Engineering ’) ,
12 ( ’ Bob ’ , 35 , 65000.0 , ’ Marketing ’) ,
13 ( ’ Charlie ’ , 45 , 85000.0 , ’ Management ’)
14 ] , dtype = personnel_dtype )
15
16 # Access by field name
17 print ( employees [ ’ name ’ ])
18 print ( employees [ ’ salary ’ ]. mean () )
19 # This demonstrates how to create and use structured arrays
20 # with named fields for complex data organization
4.2 Masked Arrays
Masked arrays allow for handling missing or invalid data:
Listing 16: Masked Arrays Example
1 from numpy import ma
2
10
3 # Create data with some invalid values
4 data = np . array ([1 , 2 , -999 , 4 , -999 , 6])
5 masked_data = ma . masked_values ( data , -999)
6
7 # Operations ignore masked values
8 print ( masked_data . mean () ) # Average of [1 , 2 , 4 , 6]
9 print ( masked_data . std () ) # Standard deviation of [1 , 2 , 4 , 6]
10
11 # Fill masked values
12 filled_data = masked_data . filled (0) # Replace with zeros
13 print ( filled_data ) # [1 , 2 , 0 , 4 , 0 , 6]
14 # Masked arrays are useful for handling missing data
15 # without affecting statistical calculations
4.3 Memory Management
Understanding memory layout is crucial for optimizing NumPy operations:
Listing 17: Memory Layout Analysis
1 # Create arrays with different memory layouts
2 C_array = np . ones ((1000 , 1000) , order = ’C ’) # Row - major
3 F_array = np . ones ((1000 , 1000) , order = ’F ’) # Column - major
4
5 # Check memory layout
6 print ( C_array . flags )
7 print ( F_array . flags )
8
9 # Performance comparison
10 % timeit C_array . sum ( axis =0) # Sum columns ( faster for C - order )
11 % timeit F_array . sum ( axis =0) # Sum columns ( faster for F - order )
12 # Understanding memory layout is important for performance
13 # operations that access memory in the same order as storage are
faster
5 Conclusion
NumPy provides a powerful foundation for numerical computing in Python through its
efficient array operations, comprehensive mathematical functions, and optimized imple-
mentation. The combination of contiguous memory layout, homogeneous data types, and
vectorized operations enables high-performance computations that would be significantly
slower using standard Python data structures.
The probability distributions module offers tools for statistical analysis and simula-
tion, while universal functions (ufuncs) provide optimized element-wise operations that
can be customized for specific applications. Understanding these core components is
essential for effective scientific computing and data analysis with Python.
11