-
Notifications
You must be signed in to change notification settings - Fork 234
Addition of weighted selection to custom providers #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4a70b22
ccd7051
5d14afe
90e99e7
a76280a
2525cad
10856e6
8d0de4a
754af80
8499822
ea4c662
0a0c726
012b483
4ea7223
0d3b828
49efe88
353a4a5
6281667
f3e210d
19ecfd6
410bfff
1074120
5f7f43c
c65938d
60299a4
9a06e2d
a70472b
1b04d0a
1be2de5
8112e46
0a92ca5
3f58c88
2712e8b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,136 @@ | ||||||||||
| package net.datafaker.service; | ||||||||||
|
|
||||||||||
| import java.util.Arrays; | ||||||||||
| import java.util.HashSet; | ||||||||||
| import java.util.List; | ||||||||||
| import java.util.Map; | ||||||||||
| import java.util.Random; | ||||||||||
| import java.util.Set; | ||||||||||
|
|
||||||||||
| /** | ||||||||||
| * A utility class for selecting a random element from a list based on assigned weights. | ||||||||||
| **/ | ||||||||||
| public record WeightedRandomSelector(Random random) { | ||||||||||
| private static final String WEIGHT_KEY = "weight"; | ||||||||||
| private static final String VALUE_KEY = "value"; | ||||||||||
|
|
||||||||||
| public WeightedRandomSelector(Random random) { | ||||||||||
| this.random = random != null ? random : new Random(); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| /** | ||||||||||
| * Returns a weighted random element from the given list, where each element is represented as a Map | ||||||||||
| * containing a weight and the corresponding value. | ||||||||||
| * <p> | ||||||||||
| * | ||||||||||
| * @param items A list of maps, where each map contains: | ||||||||||
| * - weight: A Double representing the weight of the element, influencing its selection probability. | ||||||||||
| * - value: The actual element of type T to be randomly selected based on its weight. | ||||||||||
| * @param <T> The type of the element to be selected from the list. The value associated with the weight can be of any type. | ||||||||||
| * @return A randomly selected element based on its weight. | ||||||||||
| * @throws IllegalArgumentException if: | ||||||||||
| * - the list is null or empty, | ||||||||||
| * - any item in the list is null or empty, | ||||||||||
| * - the item does not contain 'weight' or 'value' keys, | ||||||||||
| * - any weight is null, non-positive, NaN or infinite, | ||||||||||
| * - any values in the list are not unique or null, | ||||||||||
| * - the sum of weights exceeds Double.MAX_VALUE. | ||||||||||
| */ | ||||||||||
| public <T> T select(List<Map<String, Object>> items) { | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looks like most of the methods here do not work with object state => could be turned to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Relevant methods were turned to static and private
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about this method <T> T select(List<Map<String, Object>> items) {why can not it be switched to
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
this seems not addressed yet
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the current design, the method public T select(List<Map<String, Object>> items) cannot be static because it relies on the random field of the WeightedRandomSelector record, which is an instance field. |
||||||||||
| validateItemsList(items); | ||||||||||
|
|
||||||||||
| Object[] values = new Object[items.size()]; | ||||||||||
| double[] cumulativeWeights = preprocessItems(items, values); | ||||||||||
|
|
||||||||||
| double randomValue = random.nextDouble() * cumulativeWeights[cumulativeWeights.length - 1]; | ||||||||||
| return selectWeightedElement(randomValue, cumulativeWeights, values); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void validateItemsList(List<Map<String, Object>> items) { | ||||||||||
| if (items == null) { | ||||||||||
| throw new IllegalArgumentException("Input list cannot be null"); | ||||||||||
| } | ||||||||||
| if (items.isEmpty()) { | ||||||||||
| throw new IllegalArgumentException("Input list cannot be empty"); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| Set<Object> uniqueValues = new HashSet<>(); | ||||||||||
|
|
||||||||||
| for (var item : items) { | ||||||||||
| validateItem(item); | ||||||||||
| assertUniqueValues(item, uniqueValues); | ||||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void assertUniqueValues(Map<String, Object> item, Set<Object> values) { | ||||||||||
| Object value = item.get(VALUE_KEY); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't guarantee uniqueness ...
name:
- name1
ref1:
- #{MyTest.name}
ref2:
- #{MyTest.name}
dataForWeighted:
- value: #{MyTest.name}
weight: 2.0
- value: #{MyTest.ref1}
weight: 3.0
- value: #{MyTest.ref2}
weight: 5.0
...in fact values for |
||||||||||
| if (!values.add(value)) { | ||||||||||
| throw new IllegalArgumentException("Duplicate value found: " + value + ". Values must be unique."); | ||||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void validateItem(Map<String, Object> item) { | ||||||||||
| if (item == null) { | ||||||||||
| throw new IllegalArgumentException("Item cannot be null"); | ||||||||||
| } | ||||||||||
| if (item.isEmpty()) { | ||||||||||
| throw new IllegalArgumentException("Item cannot be empty"); | ||||||||||
| } | ||||||||||
| if (!item.containsKey(WEIGHT_KEY) || !item.containsKey(VALUE_KEY)) { | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we checking separately existing of keys and non null values only for the sake of different error messages?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I split the condition into separate checks with distinct error messages. |
||||||||||
| throw new IllegalArgumentException("Each item must contain 'weight' and 'value' keys"); | ||||||||||
| } | ||||||||||
| validateValue(item.get(VALUE_KEY)); | ||||||||||
| validateWeight(item.get(WEIGHT_KEY)); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void validateValue(Object valueObj) { | ||||||||||
| if (valueObj == null) { | ||||||||||
| throw new IllegalArgumentException("Value cannot be null"); | ||||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void validateWeight(Object weightObj) { | ||||||||||
| if (!(weightObj instanceof Double weight)) { | ||||||||||
| throw new IllegalArgumentException("Weight must be a non-null Double"); | ||||||||||
| } | ||||||||||
| if (weight < 0 || Double.isNaN(weight) || Double.isInfinite(weight)) { | ||||||||||
| throw new IllegalArgumentException("Weight must be a non-negative number and cannot be NaN or infinite"); | ||||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| private static void validateTotalWeight(double totalWeight) { | ||||||||||
| if (totalWeight <= 0) { | ||||||||||
| throw new IllegalArgumentException("The total weight must be greater than 0. At least one item must have a positive weight"); | ||||||||||
| } | ||||||||||
| } | ||||||||||
|
|
||||||||||
| static double[] preprocessItems(List<Map<String, Object>> items, Object[] values) { | ||||||||||
| double[] cumulativeWeights = new double[items.size()]; | ||||||||||
|
|
||||||||||
| double totalWeight = 0.0; | ||||||||||
| for (int i = 0; i < items.size(); i++) { | ||||||||||
| double weight = (Double) items.get(i).get(WEIGHT_KEY); | ||||||||||
| if (Double.MAX_VALUE - totalWeight < weight) { | ||||||||||
| throw new IllegalArgumentException("Sum of the weights exceeds Double.MAX_VALUE"); | ||||||||||
| } | ||||||||||
| totalWeight += weight; | ||||||||||
| cumulativeWeights[i] = totalWeight; | ||||||||||
| values[i] = items.get(i).get(VALUE_KEY); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| validateTotalWeight(totalWeight); | ||||||||||
|
|
||||||||||
| return cumulativeWeights; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| static <T> T selectWeightedElement(double randomValue, double[] cumulativeWeights, Object[] values) { | ||||||||||
| int index = Arrays.binarySearch(cumulativeWeights, randomValue); | ||||||||||
| index = (index < 0) ? -index - 1 : index; | ||||||||||
|
|
||||||||||
| if (index >= cumulativeWeights.length) { | ||||||||||
| index = cumulativeWeights.length - 1; | ||||||||||
| } | ||||||||||
|
Comment on lines
+130
to
+132
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
i think it would be shorter
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Proposed change will always set index to |
||||||||||
|
|
||||||||||
| return (T) values[index]; | ||||||||||
| } | ||||||||||
| } | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that the only way to change weight is change file and rebuild everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weights are provided as part of the input items list and are processed at runtime. Changing weights does not require rebuilding the application but simply involves modifying the input list or the file (if applicable) and rerunning the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I didn't get...
We have files with data inside resources which are part of the jar right?
Also there might be custom data which could be either in resources or in separate files.
With normal files it is clear how to change weights without rebuilds
How can we do this with any from resources folder?