Basrah University of Oil and Gas
College of Oil and Gas Engineering
Department of Oil and Gas Engineering
Random Data Widget in Orange
Prepared by:
Yousef R. Nouri
Hussein A. Hassan
Supervisor:
AL. ALi Al-Eidani
AL. Jaffar A. Mustafa
3rd Stage — Morning Study
October, 2025
Table of Contents
List of Figures ii
1 Purpose and Interface 1
1.1 Inputs and Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Interface Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Data Shape Options (Distributions) 2
2.1 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 The Output Table and Visualization 3
3.1 Observing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Example Use Case 6
4.1 Pressure (psi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1.1 Normal (Gaussian) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.2 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.3 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.4 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.5 Student’s t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Temperature (°C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.4 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Gas Rate (MSCF/day) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.2 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.3 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.4 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
i
List of Figures
1.1 The interface controls of the Random Data widget. . . . . . . . . . . . . . . . . . . . . . . 1
3.1 The Random Data, Data Table, and Distributions widgets. . . . . . . . . . . . . . . . . . 3
3.2 Data Table widget showing the data generated form the Random Data widget, with 1000
instances/sample size and 100 variables for each of the normal, gamma, and uniform
distributions resulting in 300 features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Fitted Normal Distribution Graph made using the data in Var100 from the Data Table in
Figure 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Fitted Kernel Density Distribution Graph made using the data in Var200 from the Data
Table in Figure 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.5 Fitted Gamma Distribution Graph made using the data in Var300 from the Data Table
in Figure 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 5 variables/features for each distribution type for the pressure. 6
4.2 Normal Distribution Fitted Graph made using the data in the Pressure(psi)-Normal col-
umn from the Data Table in Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Kernel Density Distribution Fitted Graph made using the data in the Pressure(psi)-
Uniform column from the Data Table in Figure 4.1. . . . . . . . . . . . . . . . . . . . . . 8
4.4 Pareto Distribution Fitted Graph made using the data in the Pressure(psi)-Exponential
column from the Data Table in Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.5 Beta Distribution Fitted Graph made using the data in the Pressure(psi)-Gamma column
from the Data Table in Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.6 Normal Distribution Fitted Graph made using the data in the Pressure(psi)-Student’s t
column from the Data Table in Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.7 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 5 variables/features for each distribution type to the Temperature. 12
4.8 Scatter Graph showing a relationship between Temperature(x1) and Pressure(y1) made
using the data in the x1 and y1 columns from the Data Table in Figure 4.7. . . . . . . . . 13
4.9 Distribution Graph showing 60% probability of a ”temperature high” (1) and 40% of
”temperature low” (0) made using the data in the Temperature(°C)-Bernoulli column
from the Data Table in Figure 4.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.10 Beta Distribution Fitted Graph made using the data in the Temperature(°C)-Binomial
column from the Data Table in Figure 4.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.11 Distribution Graph made using the data in the Temperature(°C)-Discrete Uniform column
from the Data Table in Figure 4.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
ii
4.12 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 7 variables/features for each distribution type to the Gas Rate. 17
4.13 Line Graph made using the data in the Gas Rate-Multinomial1,2,3, and 4 columns from
the Data Table in Figure 4.12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.14 Noraml Distribution Fitted Graph made using the data in the Gas Rate-Hypergeometric
column from the Data Table in Figure 4.12. . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.15 Beta Distribution Fitted Graph made using the data in the Gas Rate-Negative Binomial
column from the Data Table in Figure 4.12. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.16 Gamma Distribution Fitted Graph made using the data in the Gas Rate-Poisson column
from the Data Table in Figure 4.12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
1 Purpose and Interface
The primary function of the Random Data widget is to generate random data samples. It allows the
user to create synthetic data sets where the variables are governed by selected probability distributions.
The underlying distributions utilized are drawn from Scipy’s stats module.
1.1 Inputs and Outputs
• Inputs: The widget takes None as input, as it generates the data internally.
• Outputs: The output is the randomly generated data.
1.2 Interface Controls
The user interface allows for control over the size and composition of the generated dataset:
1. Sample Size (Number of Rows): Users define the size of the data sample, which corresponds
to the number of rows or instances in the output table (default is 1000).
2. Distribution Variables: For each selected distribution, the user specifies the number of variables
to generate for that distribution.
3. Adding/Removing Variables: Users can select “Add more variables...” to choose new distribu-
tions from a list, thereby adding new columns (variables) to the generated dataset. Distributions
can be removed by pressing the “X” button in the top left corner of the distribution settings.
4. Generation: The user must press the Generate button to output the data set.
Figure 1.1 The interface controls of the Random Data widget.
Page | 1
2 Data Shape Options (Distributions)
The Random Data widget supports various standard and advanced distributions, categorized below based
on whether they are continuous or discrete random variables:
2.1 Continuous Distributions
Distribution Type Parameters Required
Normal Continuous Number of variables, the mean, and the variance.
Uniform Continuous Number of variables, and the lower and upper bound
of the distribution.
Exponential Continuous Number of variables.
Gamma Continuous Number of variables, the shape, and the scale (larger
scale parameter means more spread out distribu-
tion).
Student’s t Continuous Number of variables and the degrees of freedom.
Bivariate normal Continuous Fixed to 2 variables; requires mean and variance of
each variable, and the covariance matrix.
2.2 Discrete Distributions
Distribution Type Parameters Required
Bernoulli Discrete Number of variables and the probability mass func-
tion.
Binomial Discrete Number of variables, the number of trials, and prob-
ability of success.
Discrete uniform Discrete Number of variables and the number of values per
variable.
Multinomial Discrete Probabilities (must sum to one) and the number of
trials. The number of probabilities determines the
final number of variables generated.
Hypergeometric Discrete Number of variables, number of objects, positives,
and trials.
Negative binomial Discrete Number of variables, number of successes, and the
probability of a success.
Poisson Discrete Number of variables and the event rate (expected
number of occurrences).
Page | 2
3 The Output Table and Visualization
The data set generated by the Random Data widget is typically channeled to other widgets in the
workflow for inspection and use.
3.1 Observing the Data
Users commonly observe the resulting data set using the Data Table widget and the Distributions widget.
Figure 3.1 The Random Data, Data Table, and Distributions widgets.
• Data Table: This widget visualizes the generated data in a spreadsheet format, showing the
instances (rows) and the variables (columns) created based on the chosen distributions.
Figure 3.2 Data Table widget showing the data generated form the Random Data widget, with 1000
instances/sample size and 100 variables for each of the normal, gamma, and uniform dis-
tributions resulting in 300 features.
Page | 3
• Distributions: This widget helps visualize the value distributions of the generated data features
in a graph.
Figure 3.3 Fitted Normal Distribution Graph made using the data in Var100 from the Data Table in
Figure 3.2.
Figure 3.4 Fitted Kernel Density Distribution Graph made using the data in Var200 from the Data
Table in Figure 3.2.
Page | 4
Figure 3.5 Fitted Gamma Distribution Graph made using the data in Var300 from the Data Table in
Figure 3.2.
Page | 5
4 Example Use Case
In this example, we will have three variables: Pressure, Temperature, and Gas Rate. With each of these
variables, we will use different distribution methods: 5 for the Pressure, 4 for the Temperature, and 4
for the Gas Rate, With a constant Sample Rate of 1000.
4.1 Pressure (psi)
Figure 4.1 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 5 variables/features for each distribution type for the pressure.
Page | 6
4.1.1 Normal (Gaussian) Distribution
A symmetric bell-shaped distribution — most values cluster around the mean, fewer far away.
Parameters:
• Mean (µ): the central/average value.
• Standard deviation (σ): typical spread around the mean (σ 2 is variance).
Values Used:
• Mean = 4200 psi
• Standard deviation = 100 psi (so ∼95% of values ≈ 4200 ± 200)
Figure 4.2 Normal Distribution Fitted Graph made using the data in the Pressure(psi)-Normal column
from the Data Table in Figure 4.1.
Page | 7
4.1.2 Uniform Distribution
Every value between the low and high bounds is equally likely.
Parameters:
• Low (min): smallest possible value.
• High (max): largest possible value.
Values Used:
• Low = 4000 psi
• High = 4400 psi
Figure 4.3 Kernel Density Distribution Fitted Graph made using the data in the Pressure(psi)-
Uniform column from the Data Table in Figure 4.1.
Page | 8
4.1.3 Exponential Distribution
Right-skewed distribution, often used for waiting-times. It produces many small values and a few large
ones. (To use for a physical positive quantity like pressure you can scale/shift it so the mean is around
the target.)
Parameters:
• Rate (λ) or Scale (1/λ) depending on the implementation:
– Rate λ = 1/mean
– Scale = mean (= 1/λ)
Values Used:
• Rate λ = 1/4200 ≈ 0.0002381, or Scale = 4200
• set Scale = 4200 (so mean ≈ 4200), or
• generate exponential values and add a baseline (e.g., add 3800) so values sit in realistic pressure
range.
Figure 4.4 Pareto Distribution Fitted Graph made using the data in the Pressure(psi)-Exponential
column from the Data Table in Figure 4.1.
Page | 9
4.1.4 Gamma Distribution
Flexible right-skewed distribution (generalizes exponential). Good for positive-only quantities with skew.
Parameters:
• Shape (k or α): controls the shape (k = 1 → exponential; larger k → more symmetric).
• Scale (θ): multiplies the distribution; mean = k · θ.
Values Used:
• Choose shape = 2 and scale = 2100 → mean = 2 × 2100 = 4200 psi.
• (Alternative: shape = 3, scale = 1400 also gives mean 4200 but less skew.)
Figure 4.5 Beta Distribution Fitted Graph made using the data in the Pressure(psi)-Gamma column
from the Data Table in Figure 4.1.
Page | 10
4.1.5 Student’s t Distribution
Like the normal distribution but with heavier tails — more probability of larger deviations (outliers).
Useful to model measurements with occasional big deviations.
Parameters:
• Degrees of freedom (df ): lower df → heavier tails; higher df → closer to normal.
• Location (µ) and scale (s) (if widget provides them): location = center, scale = spread.
Values Used:
• df = 5 (heavy-ish tails but not extreme)
• Location = 4200
• Scale = 80–120 (choose 100 to match Normal spread)
Figure 4.6 Normal Distribution Fitted Graph made using the data in the Pressure(psi)-Student’s t
column from the Data Table in Figure 4.1.
Page | 11
4.2 Temperature (°C)
Figure 4.7 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 5 variables/features for each distribution type to the Tempera-
ture.
Page | 12
4.2.1 Bivariate Normal Distribution
A distribution describing two correlated normal variables — useful for simulating temperature variation
that may depend on another factor (e.g., pressure). Each variable follows a normal distribution, but
they are related.
Parameters:
• Mean vector (µ1 , µ2 ): average values of the two correlated variables.
• Covariance matrix (Σ): describes how the two variables vary together (correlation).
Values Used:
• Means = (90, 4200) (temperature and pressure means)
" #
4 30
• Covariance =
30 10,000
Figure 4.8 Scatter Graph showing a relationship between Temperature(x1) and Pressure(y1) made
using the data in the x1 and y1 columns from the Data Table in Figure 4.7.
Page | 13
4.2.2 Bernoulli Distribution
Used for binary outcomes — for instance, ”temperature high” (1) or ”temperature low” (0). This is
mainly for understanding the concept rather than a real continuous temperature.
Parameters :
• Probability (p): chance of success (1).
Values Used:
• p = 0.6 (60% of the time temperature is ”high”)
Figure 4.9 Distribution Graph showing 60% probability of a ”temperature high” (1) and 40% of
”temperature low” (0) made using the data in the Temperature(°C)-Bernoulli column
from the Data Table in Figure 4.7.
Page | 14
4.2.3 Binomial Distribution
Represents the number of successes in a series of independent yes/no trials. Here, it can model the count
of readings above a threshold in a small sample.
Parameters:
• Number of trials (n): number of independent attempts.
• Probability of success (p): likelihood of one success.
Values Used:
• n = 10 readings
• p = 0.5 probability of being above 90°C
Figure 4.10 Beta Distribution Fitted Graph made using the data in the Temperature(°C)-Binomial
column from the Data Table in Figure 4.7.
Page | 15
4.2.4 Discrete Uniform Distribution
Generates discrete integer values between two limits, all equally likely. Used to simulate sensor readings
that only report integer temperatures.
Parameters:
• Low (min): smallest integer value.
• High (max): largest integer value.
Values Used:
• Low = 85°C
• High = 95°C
Figure 4.11 Distribution Graph made using the data in the Temperature(°C)-Discrete Uniform col-
umn from the Data Table in Figure 4.7.
Page | 16
4.3 Gas Rate (MSCF/day)
Figure 4.12 Data Table widget showing the data generated from the Random Data widget, with 1000
instances/sample size and 7 variables/features for each distribution type to the Gas Rate.
Page | 17
4.3.1 Multinomial Distribution
Generalization of the binomial for more than two categories — used to model categorical rates such as
”low,” ”medium,” ”high,” and ”very high” gas flow outcomes.
Parameters:
• Number of trials (n): number of total observations.
• Probability vector (p1 , p2 , ..., pk ): probabilities for each category.
Values Used:
• n=1
• p = [0.2, 0.5, 0.25, 0.05] for low, medium, high, very high
Figure 4.13 Line Graph made using the data in the Gas Rate-Multinomial1,2,3, and 4 columns from
the Data Table in Figure 4.12.
Page | 18
4.3.2 Hypergeometric Distribution
Used for drawing samples without replacement — for example, selecting readings classified as ”high rate”
from a fixed population.
Parameters:
• Population size (N ): total items.
• Number of successes in population (K): items classified as high rate.
• Number of draws (n): samples taken.
Values Used:
• N = 100 total gas samples
• K = 40 high-rate samples
• n = 10 draws
Figure 4.14 Noraml Distribution Fitted Graph made using the data in the Gas Rate-Hypergeometric
column from the Data Table in Figure 4.12.
Page | 19
4.3.3 Negative Binomial Distribution
Models the number of failures before a fixed number of successes — useful for representing the number
of low-flow intervals before achieving a desired gas output.
Parameters:
• Number of successes (r): target count of successful flow intervals.
• Probability of success (p): likelihood of one success.
Values Used:
• r=5
• p = 0.4
Figure 4.15 Beta Distribution Fitted Graph made using the data in the Gas Rate-Negative Binomial
column from the Data Table in Figure 4.12.
Page | 20
4.3.4 Poisson Distribution
Models how many independent events occur in a fixed time period — often used for event counts like
gas bubble releases or pulse flows.
Parameters:
• Lambda (λ): expected number of events per period (also the mean).
Values Used:
• λ = 410 (average daily gas rate)
Figure 4.16 Gamma Distribution Fitted Graph made using the data in the Gas Rate-Poisson column
from the Data Table in Figure 4.12.
Page | 21