Refactor data sources to implement IDataSource and include GetNearest functionality#4270
Refactor data sources to implement IDataSource and include GetNearest functionality#4270swharden merged 25 commits intoScottPlot:mainfrom
IDataSource and include GetNearest functionality#4270Conversation
- Introduces the 'IGetNearest' interface for plottables that have the `GetNearest` and `GetNearestX` methods - Applies the IGetNearest interface to all Scatter data sources, as well as SignalXY and VectorFieldDataSource - Implement GetNearestX for all scatter data sources
- Adds missing using for Interfaces.IGetNearest
|
Hi @RFBomb, You have done a great job in the right direction. I also want to warn you that the existing implementation of the nearest point search does not work for |
|
@StendProg Thanks for looking into this, I was actually going to mention that PR just now, as they can go hand-in-hand! I thought about having a standardized utilities method as well, but wasn't sure if it was allowed (theres code duplication all over this library, so I just followed suit) As for solving for different axes, I found that out myself, here is what I came up with for my UI.
Coordinates mouseLoc = plot.GetCoordinates(mousePixel, plot.Axes.Bottom, item.YAxis);
DataPoint dataPoint = item.GetNearest(mouseLoc, plot.LastRender, 15);Now, there could be an argument that GetNearest includes a IYAxis and IXAxis properties, but I think I'd rather not do that because the DataSources that can implement it do not have such properties. I'd rather introduce accompanying |
The problem is a little deeper than that. public interface IGetNearestDataSource
{
DataPoint GetNearest(Coordinates mouseLocation, double PxPerUnitX, double PxPerUnitY, float maxDistance = 15);
DataPoint GetNearestX(Coordinates mouseLocation, double PxPerUnitX, double PxPerUnitY, float maxDistance = 15);
}While public interface IGetNearestPlottable
{
DataPoint GetNearest(Coordinates mouseLocation, RenderDetails renderInfo, float maxDistance = 15);
DataPoint GetNearestX(Coordinates mouseLocation, RenderDetails renderInfo, float maxDistance = 15);
}But once again, the design is not approved, so let's leave it as it is for now. |
|
So I took a peek at your last comment, and I see what you're getting at. Its an interesting situation. Shifting the axes up and down worked allowed a crosshair on the mouse to still follow the plot, even with a maxDistance of 5. I believe the snippet in question got the mouse coordinates for the main axes, relative to the main axes, so when it got the nearest position according the the main axes, it seems to grab the correct values. (That said, I didn't step through the code to validate that, only went by what was drawn on screen during testing) |
|
@StendProg I have a few thoughts on how to write the utility class to handle more sources easily, but it would entail an internal interface. What are your thoughts between these two ? The first one is more flexible overall, while the secondary one might be more performant, but if caching converted values is implemented it assumes that the earlier indexes in the collection are not going to be changed /// This one uses a function to get the X at specified index.
internal interface IDataSource
{
int RenderIndexCount { get; }
int MinRenderIndex { get; }
int MaxRenderIndex { get; }
double GetX(int index);
double GetY(int index);
}
/// array/list implementations
double IDataSource.GetX(int index) => Xs[index];
double IDataSource.GetY(int index) => Ys[index];
/// Implementation for generics:
double IDataSource.GetX(int index)
{
T1 x = Xs[index];
return NumericConversion.GenericToDouble(ref x);
}
double IDataSource.GetY(int index)
{
T2 y = Ys[index];
return NumericConversion.GenericToDouble(ref y);
}OR // Xs and Ys would pass their collections to the consumer methods
internal interface IDataSource
{
IList<double> Xs {get;}
IList<double> Ys {get;}
int RenderIndexCount { get; }
int MinRenderIndex { get; }
int MaxRenderIndex { get; }
}A new wrapper collection could be implemented to wrap generics, and have that passed via the interface. Without Caching /// Helper class used to convert a generic collection to various other collections and cache their values
internal class NumericList<T> : IList<double>
{
private readonly IList<T> Values;
public NumericList(IList<T> collectionToWrap)
{
Values = collectionToWrap;
}
public int Count => Values.Count;
public double this[int index] {
get
{
T baseValue = Values[index];
return NumericConversion.GenericToDouble(ref baseValue);
}
set => throw new NotImplementedException();
}
bool ICollection<double>.Contains(double item) => Values.Any(i => NumericConversion.GenericToDouble(ref i) == item);
public IEnumerator<double> GetEnumerator()
{
T val;
foreach (T item in Values.ToArray())
{
val = item;
yield return NumericConversion.GenericToDouble(ref val);
}
}
bool ICollection<double>.IsReadOnly => true;
}With Cachinginternal class NumericList<T> : IList<double>
{
private readonly IList<T> Values;
private readonly IList<double> ConvertedValues;
private bool isConverted => ConvertedValues.Count == Values.Count;
public NumericList(IList<T> collectionToWrap)
{
Values = collectionToWrap;
ConvertedValues = Values is IList<double> dbl? dbl : new List<double>(Values.Count);
}
public int Count => Values.Count;
public double this[int index] {
get
{
EnsureConverted(index);
return ConvertedValues[index];
}
set => throw new NotImplementedException();
}
private void EnsureConverted(int index)
{
if (index < 0) throw new ArgumentOutOfRangeException("index can not be less than 0", nameof(index));
if (index < ConvertedValues.Count) return;
if (index > Values.Count) throw new ArgumentOutOfRangeException("index exceeds number of values in collection", nameof(index));
for (int i = ConvertedValues.Count - 1; i <= index; i++)
{
T value = Values[i];
ConvertedValues[i] = NumericConversion.GenericToDouble(ref value);
}
}
bool ICollection<double>.Contains(double item) => this.Any(i => i == item);
public IEnumerator<double> GetEnumerator()
{
int max = Values.Count;
for (int i = 0; i < max; i++)
{
if (i < ConvertedValues.Count)
{
yield return ConvertedValues[i];
}
else
{
T val = Values[i];
ConvertedValues[i] = NumericConversion.GenericToDouble(ref val);
yield return ConvertedValues[i];
}
}
} |
|
After some playing and implementing, I've landed on the first one, as it's a simpler interface and more flexible. I have added a few properties to it so that the static method for GetNearest can optimize a little bit (such as if using coordinates get the coordinate for XY instead of grabbing the index once for X and again for Y) It will have parameters for passing in X and Y axes, and attempt to get the pixel per unit from the render pack. I noticed that SignalXY accounts for offset and rotation in its logic, which is a flaw I noticed other items don't account for. So what I'm thinking is introduce a record struct with the additional parameters necessary (IsRotated, XOffset,YOffset, IYAxis?, IXAxis?). This way the method signature stays simple, while the struct is used to pass in the additional calculation parameters. |
|
@RFBomb I apologize for the confusion. My comment about code duplication can be considered invalid. I realized only now that despite the fact that all implementations are similar to each other, they operate with different types/interfaces. I'm afraid that in an attempt to reduce everything to a common base, behind all these abstractions we can seriously lose performance. And for |
IDataSource Interface - Interface applied to DataSources to use with the DataSourceUtilities static class - Implemented on Scatter and Signal sources where able - Implemented on Scatter and Signal + SignalXY plottables DataSourceUtilities - Extracted code from various GetNearest() functions and normalized them into GetNearest and GetNearestX in this static class - Any rotation or scaling necessary for the calculation is expected to be performed via the IDataSource interface.
|
@StendProg Too late, i ran with it lol The JIT should be able to optimize that static method heavily over its many runs. As for handling rotation and offsets, that is taken care of by the interface Oh, this also handles the issue with the Y/X axis selection when calling GetNearest on the plot itself, which is aware of what axes it uses |
|
I did a little performance testing, and was quite surprised. You managed to generalize all the GetNearest() code and still it runs faster than the old one, at least for the if (Data is IDataSource ds)
return DataSourceUtilities.GetNearestX(ds, location, renderInfo, maxDistance, Axes.XAxis);
else
return Data.GetNearestX(location, renderInfo, maxDistance);You've probably thought of different scenarios for future DataSources, I understand, but maybe don't overengineer and simplify? You've done a great job, a lot of Plottables now supports finding the closest point, and it will be easy to support and add in new ones. Here are the benchmark results if you are interested: SignalXY to search X=0..Points, Y = rnd.NextDouble().
|
So, because so many consumers might implement the various existing interfaces already in custom sources, I didn't want to update the interface (such as ISignal and IScatterSource) to accommodate this new functionality. Case in point : ISignal has GetNearest while IScatterSource does not. (But the types themselves now implement IGetNearest and IDataSource). Further, the DataSources have no knowledge of the assigned axis, that belongs to the Plottable. This allows the publicly exposed GetNearest methods to not have complex overloads where you would be supplying known information. The data source falls back to using (Bottom,Left) if no axis is supplied, which is why I check in that order. Another point is that the Scatter DataSources don't have offset or scaling information, while the Scatter plot does. So that plot uses the IDataSource interface to perform the GetXScaled() and GetYScaled() calls, nearly everything else is passed back through the Data object. Also, yay benchmarks! I wasn't sure if it would actually be faster or not, I was just hoping |
src/ScottPlot5/ScottPlot5/DataSources/ScatterSourceCoordinatesList.cs
Outdated
Show resolved
Hide resolved
|
Next step for this is to implement the improvements you suggested over in your PR, as it was something I had looked at but wanted a functional version before taking a look at the BinarySearch. But you already figured out how to that! And those benchmarks are impressive. I'm thinking the best way to do this is likely use the IDataSource interface and add |
- Moves the Binary Search Extensions into DataSourceUtilities - Adds several overloads to common scenarios ( double[] ) - Renames the default comparer and extends it for more IComparer interfaces - Create the generic comparer - Adds a new helper class `CoordinateDataSource` for scenarios (specifically Scatter plot) that should reduce expensive interface calls where the IDataSource does not own the data collection
|
@StendProg Alright, I've Adapted your work over on #4265 into For example, any of the SignalSource classes only accept Y values, and their X values are calculated on demand. So these will not require using binary search to locate the index. Whereas SignalXY will use the binary search that it implements privately. Scatter is a unique one currently, and we may want to expand I have next few days off, so I won't be contributing to this during that time, but if you wanted to run more benchmarks I wouldn't say no! |
Add Unit Tests for IGetNearest and IDataSource
|
Hi @RFBomb, Benchmark for Scatter:
I am not at all confused by the long time for Scatter.GetNearest(). You are wrapping the array to support advanced functionality, this can be eliminated in the future by modifying the original DataSource. Let it not confuse you either. I made a separate benchmark for DataSource without wrappers. It took me hours to get this result. Before that your search was consistently winning starting with a more 10x advantage :-) In this benchmark you have chosen the worst case scenario where only the last point is not sorted in ascending order. That's why you method lose a bit because of the additional run through the array. Your ordering test works surprisingly well. I think most users load sorted values into the scatter anyway. But Scatter itself declares an arbitrary order of points. And it turns out we are checking and exploiting a user error in the wrong Plottable type. If we can think of prejudices against using SignalXY for ordered X values, they could be closed by creating a separate Scatter type that allows only ordered values. In spite of the above, I am impressed by your solution. You have developed a whole subsystem for DataSources that I had to understand for hours. You took into account Generics, generalized existing duplicate code, put a fast algorithm from PR #4265 even in Scatter.... Not to derail your creative impulse, but I would still ask you to stop adding new functionality in this PR and leave it for future ones. It's better to test and document the current one. If you have time, definitely try to correct the description of the PR in the first post, you've gone way beyond what you described earlier.
ScottPlot/src/ScottPlot5/ScottPlot5/DataSources/SignalXYSourceDoubleArray.cs Lines 418 to 423 in fe01ccb ScottPlot/src/ScottPlot5/ScottPlot5/DataSources/SignalXYSourceGenericArray.cs Lines 412 to 417 in fe01ccb Should be: int IDataSource.GetXClosestIndex(Coordinates mouseLocation)
{
return Rotated
? GetIndex(mouseLocation./*X*/ Y)
: GetIndex(mouseLocation./*Y*/ X);
} |
|
@StendProg i much appreciate the feedback! After implemented IDataSource on my desired signals (just the Scatters and Signals) I have only been tweaking trying to get it to work properly. This took a while for me to analyze the various implementations of GetNearest (and how some accounted for offsets and others didn't), so not surprised it took someone else a few hours to understand my abstraction. But my goal was to standardize the calls. And each source has its own nuances to work through. I think your work with the Optimized search (now As of now, I'm not adding more functionality per se, just documentation and unit tests. IDataSource was not my initial goal in this commit, but as I tried implementing IGetNearest I was running into issues, and needed it. One hiccup I am having a hard time with is the difference between IScatterSource and ISignalSource is that IScatterSource does not have offsets or scaling built in. This makes Signal and SignalXY perfect for testing the methods in unit tests, but difficult for scatter. Because the Signal sources can handle their own GetX/Y with scaling, it makes it easier than scatter. This is why I created the proxy class, in case the IScatterSource is not an IDataSource. But that should be fallback and not default, because as of now all the IScatterSources implement IDataSource. The proxy should exist only the duration of GetNearest (creating many of them is expensive, but again it really should be not be created if using built in sources). Another issue I'm running into while unit testing is the GetClosestIndex is inconsistent, and I don't know why yet. Sometimes it returns the exact index, other times it returns 0 or 1 (expecting 20). I think offsets and scaling due to mouse coordinates are coming into play here, I likely have to de-scale the mouse coordinate by the pixels-per-unit. (This will also eliminate multiplying it in the loop!) |
You may have misunderstood, I purposely chose the worst-case scenario. When you check the ordering of Scatter, and only the last point is not ordered. Because the Scatter is unordered, you fall into a simple minimum search, which is essentially identical to the old one (if you optimize old implementation). A small loss is obtained only by checking for ordering over the entire array up to the last point. I have conducted various benchmarks, but there is nothing to say, the signals work in nanoseconds on millions of points. Ordered Scatter fast too, despite the fact that it runs through the entire array to check for orderliness every time. |
|
Would you mind clarifying this?
There are several scatter sources that take arrays, but I think 3 that accept a list. The array I wouldn't expect to be modified once added to the source, but the list I can see being modified during life of the program (say for example as data comes in it gets added). Is updating the array of values common in your experience? I didn't think so, which is why input the |
- Fix broken demo for SignalXY Interaction
- disable the erroneous warning, as it shouldn't be a concern at runtime
|
Alright, I believe work on this PR is complete. All unit tests I've added are passing, all the original ones are still passing, and I've checked out the demos and nothing appeared broken. ( I think this even may have fixed #4261, as I cannot replicate it in the demo. ( I'll have to check my app to verify ) Scatter, Signal, and SignalXY now use the And 'IsAscending' linq method was added to help determine that. A generic comparer, which prioritizers One item to note: Using GetNearest from the plottables should be preferred over the GetNearest of the DataSources, as not all DataSources (looking at you |
src/ScottPlot5/ScottPlot5 Tests/Unit Tests/UnitTests/IDataSourceTests.cs
Show resolved
Hide resolved
src/ScottPlot5/ScottPlot5 Tests/Unit Tests/UnitTests/IDataSourceTests.cs
Show resolved
Hide resolved
- Updates `CoordinateDataSource` to use properties over fields, and allows settings of min/max render index.
- Updates `IDataSource.IsSorted { get; }` to `IDataSource.IsSorted()`
- Updates Scatter Sources to prefer the `DataSourceUtilities.GetRenderIndexCount` function over a private property
- Adds `IndexRange.None` to represent an index range for an empty collection
DataSourceUtilties
- Improves `IsAscending` and handles enumerator disposal
- `GetClosestIndex` now uses `BinarySearch` function
- Better validation on `GetRenderIndexCount` and `GetRenderIndexRange`
- added documentation, fixed formatting
- Rename the file and class to better describe what it is testing (not just the interface, but the utilty class itself)
- Corrected name to not compete with class it tests
The generic comparer does not need to implement IComparer, it just needs to provide one.
- Fix erroneous warning preventing github compile
- Make GenericComparer<T> static - Rename 'Instance' to 'Default' to more closesly mimick the Comparer<T>.Default api
- also renames BinarySearch parameter' count' to 'length' to pick up the inherited docs description
|
Should IndexRange validate the values at all? Typically this is used by the EX: /// <summary>
/// Represents a range of indexes in an array (inclusive)
/// </summary>
public readonly record struct IndexRange(int Min, int Max)
{
/// <summary>
/// The IndexRange that represents an empty collection
/// </summary>
public static readonly IndexRange None = new(-1, -1);
/// <summary>
/// The number of indexes the within the range
/// </summary>
public int Length { get; } = Validate(Min, Max);
/// <summary>
/// <see langword="true"/> if the Min & Max represent valid indexes, otherwise <see langword="false"/>
/// </summary>
public bool IsValid => Min >= 0;
private static int Validate(int Min, int Max)
{
if (Min == Max && Min == -1) return 0; // special case
if (Min < 0) throw new ArgumentException($"{nameof(IndexRange.Min)} must be at least 0", nameof(IndexRange.Min));
if (Max < Min) throw new ArgumentException($"{nameof(IndexRange.Max)} cannot be less than {nameof(IndexRange.Min)}", nameof(IndexRange.Max));
return Max - Min + 1;
}
} |
DataSourceUtilities - Ensure `GetRenderIndexRange().Length` == `GetRenderIndexCount()` Scatter Sources - Prefer GetRenderIndexRange() over direct instantiation of IndexRange()
|
@StendProg - I found out a nuance that I think needs consideration. When testing my app, which uses multiple Y axes but only 1 X, the new methods in this produces more accurate results, especially now that SignalXY and Scatter account for axes and offsets properly. BUT, it still required me to get the mouse-coordinates per-axis like so to get accurate results: for (int i = 0; i < items.Count; i++)
{
var item = items[i]; // items is a collection of wrappers for Scatter and SignalXY
if (!item.IsVisible) continue;
Coordinates mouseLoc = plot.GetCoordinates(mousePixel, plot.Axes.Bottom, item.YAxis);
DataPoint dataPoint = item.GetNearest(mouseLoc, plot.LastRender, 15);
if (dataPoint.IsReal)
{
// calculate the distance of the point to the mouse
double distance = dataPoint.Coordinates.DistanceSquared(mouseLoc);
if (distance < smallestDistance)
{
// store the datapoint
nearestPoint = dataPoint;
nearestPointSource = item;
pointSelected = true;
smallestDistance = distance;// update the smallest distance
}
}
}The issue is that when using multiple axes, you have to use the overload for 'plot.GetCoordinate' that specifies the axis. Otherwise you end up with coordinates based on the default axis. One possible solution is to have overloads in the Plottables like this: DataPoint GetNearest(Pixel mousePixel, RenderDetails renderDetails, float maxDistance)
{
var x = this.Axes.XAxis.GetCoordinate(mousePixel.X, renderDetails.DataRect);
var y = this.Axes.YAxis.GetCoordinate(mousePixel.Y, renderDetails.DataRect);
return GetNearest(new Coordinates(x, y), renderDetails, maxDistance);
}It would require that public IPlottable Plottable(IPlottable plottable)
{
Plot.PlottableList.Add(plottable);
plottable.Axes.XAxis ??= Plot.Axes.Bottom;
plottable.Axes.YAxis ??= Plot.Axes.Left;
return plottable;
}That said, I don't intend to make any more changes like this, just food for thought until this is accepted |
|
Hi @RFBomb,
If you think that recalculating mouse coordinates with additional axes for each Plottable looks messy (I totally agree with you on this). Maybe just create an additional overloaded method that simplifies this: public class Plot
{
..
Coordinates GetCoordinates(Pixel mousePixel, IPlottable plottable)
{
return GetCoordinates(mousePixel, plottable.Axes.XAxis, plottable.Axes.YAxis);
}
..
}I don't like your solution, but I can't suggest something better either.
You've been held hostage to a large and complex PR. This is what I tried to protect you from by advising you not to add additional functionality. If it were simple, you would not depend so much and could create a parallel PR. |
Well, to be fair I think it's a more robust PR than originally intended as well. All stemming from "maybe this should be in a utility class". The main API hasn't changed, just how things are calculated has been.
Agreed. After reviewing some of the other scenarios and plottables, even within my own app, I couldn't find a way to cleanly do a GetNearest that I'd be happy committing. One thought maybe tho is a method on plot itself. (DataPoint datapoint, Iplottable? signal) GetNearest(Pixel mousePixel);The plot knows last render, can get coordinates for all plottables, just the pixel would have to be supplied. It returns the datapoint, and if it was a match the plottable item. The consumer would then be able to check if the plottable item implements interfaces or is a specific type as their app requirements dictate. |
|
Hi @RFBomb and @StendProg, thanks for this PR and the insightful discussion! This is a complicated and nuanced topic, as we are all discovering 😅
This is my favorite quote from this week 🤣 This stuff is very difficult! I'm going to leave this PR open while I resolve some less complex open issues and PRs for the next several days, then I'll return to this one and review it carefully and provide some deeper feedback. |
IDataSource and include GetNearest functionality
To be clear, this statement refers to a discussion of further development beyond the scope of this PR. |
Thanks for the clarification! Knowing this, I'll review this now and merge it in 👍 |

Issue #3807
Introduces the 'IGetNearest' interface for plottables that have the
GetNearestandGetNearestXmethodsApplies the IGetNearest interface to all Scatter data sources, as well as SignalXY and VectorFieldDataSource
Implement GetNearestX for all scatter data sources
GetNearestFastwas adapted from WIP. SignalXY.GetNearest() improvements #4265, thanks to @StendProgIntroduces 'IDataSource' interface
'DataSourceUtilities' static class
GetNearest,GetNearestFast,GetNearestX,GetNearestFastX, and several other functions to standardize how these tasks work.- Previously, each class implemented these themselves, or did not have the functionality. Some accounted for offsets and scaling, while others did not.
- Now the scaling is handled by the
IDataSource.GetXScaledfunction, which allows a more standard and optimizedGetNearestto work in this static class.