Skip to content

Refactor data sources to implement IDataSource and include GetNearest functionality#4270

Merged
swharden merged 25 commits intoScottPlot:mainfrom
RFBomb:main
Oct 9, 2024
Merged

Refactor data sources to implement IDataSource and include GetNearest functionality#4270
swharden merged 25 commits intoScottPlot:mainfrom
RFBomb:main

Conversation

@RFBomb
Copy link
Contributor

@RFBomb RFBomb commented Sep 17, 2024

Issue #3807

  • Introduces the 'IGetNearest' interface for plottables that have the GetNearest and GetNearestX methods

  • Applies the IGetNearest interface to all Scatter data sources, as well as SignalXY and VectorFieldDataSource

  • Implement GetNearestX for all scatter data sources

  • GetNearestFast was adapted from WIP. SignalXY.GetNearest() improvements #4265, thanks to @StendProg

  • Introduces 'IDataSource' interface

    • IDataSource is fundamental to this PR, as it enables the plottables to 'talk to' their sources by getting values from them if needed, but more importantly this is the interface that is used to optimize the GetNearest and GetNearestFast methods. Data Sources implementing IDataSource offer an optimized way to access their data values in a standardized way. This PR updates all Signal, SignalXY, and Scatter sources to implement IDataSource.
  • 'DataSourceUtilities' static class

    • Several classes have been combined into the 'DataSourceUtilities' file (the BinarySearchComparer from DataSourceLogger)
    • Adds a generic comparer that will prefer to use Comparer.Default, but accomodates several of the ScottPlott structs
    • Includes GetNearest, GetNearestFast, GetNearestX, GetNearestFastX, and several other functions to standardize how these tasks work.
      - Previously, each class implemented these themselves, or did not have the functionality. Some accounted for offsets and scaling, while others did not.
      - Now the scaling is handled by the IDataSource.GetXScaled function, which allows a more standard and optimized GetNearest to work in this static class.

- Introduces the 'IGetNearest' interface for plottables that have the `GetNearest` and `GetNearestX` methods
- Applies the IGetNearest interface to all Scatter data sources, as well as SignalXY and VectorFieldDataSource
- Implement GetNearestX for all scatter data sources
- Adds missing using for Interfaces.IGetNearest
@StendProg
Copy link
Contributor

StendProg commented Sep 17, 2024

Hi @RFBomb,

You have done a great job in the right direction. The only thing that bothers me is a lot of code duplication. You can try to put repetitive parts into a separate method, even if it is a static Util method.

I also want to warn you that the existing implementation of the nearest point search does not work for Plottables with axes other than the main axis. I solved it for SignalXY in the neighboring PR #4265. Support for additional axes will require a slightly different IGetNearest interface. This design is not approved for now, so don't try to redesign everything, just take note.

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 17, 2024

@StendProg Thanks for looking into this, I was actually going to mention that PR just now, as they can go hand-in-hand!

I thought about having a standardized utilities method as well, but wasn't sure if it was allowed (theres code duplication all over this library, so I just followed suit)

As for solving for different axes, I found that out myself, here is what I came up with for my UI.

  • I created a helper class 'PlotHelper' that I use to store my data sources, signals, a reference to the plot itself, etc.
  • The 'item' in the below code is a helper class that wraps Scatter and SignalXY, and stores which YAxis was assigned to them
  • Get the mouse coordinates from the plot, specifying the Y axis that the signal belongs to
  • Pass that coordinate into the GetNearest method.
Coordinates mouseLoc = plot.GetCoordinates(mousePixel, plot.Axes.Bottom, item.YAxis);
DataPoint dataPoint = item.GetNearest(mouseLoc, plot.LastRender, 15);

Now, there could be an argument that GetNearest includes a IYAxis and IXAxis properties, but I think I'd rather not do that because the DataSources that can implement it do not have such properties. I'd rather introduce accompanying IHasAxes in addition to this if that were requested, then generics could constrain to either just IGetNearest or IGetNearest + IHasAxes

@StendProg
Copy link
Contributor

Coordinates mouseLoc = plot.GetCoordinates(mousePixel, plot.Axes.Bottom, item.YAxis);
DataPoint dataPoint = item.GetNearest(mouseLoc, plot.LastRender, 15);

The problem is a little deeper than that. GetNearest() uses PxPerUnit from RenderInfo. Which corresponds to the main axes only. If you modify RenderInfo so that it can return PxPerUnit for arbitrary axes, it turns out that DataSource does not own the axes of Plottable. That's why I suggested to pass already calculated PxPerUnitX and PxPerUnitY for necessary axes to DataSource instead of RenderInfo. And Plottable knowing which axes it uses will request these values from RenderInfo. Thus IGetNearest for DataSources could look like this:

public interface IGetNearestDataSource
{
    DataPoint GetNearest(Coordinates mouseLocation, double PxPerUnitX, double PxPerUnitY, float maxDistance = 15);

    DataPoint GetNearestX(Coordinates mouseLocation, double PxPerUnitX, double PxPerUnitY, float maxDistance = 15);
 }

While Plottable should implement its own IGetNearestPlottable exactly as you suggested.

public interface IGetNearestPlottable
{
  DataPoint GetNearest(Coordinates mouseLocation, RenderDetails renderInfo, float maxDistance = 15);

  DataPoint GetNearestX(Coordinates mouseLocation, RenderDetails renderInfo, float maxDistance = 15);
}

But once again, the design is not approved, so let's leave it as it is for now.

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 17, 2024

So I took a peek at your last comment, and I see what you're getting at. Its an interesting situation.
That said, I believe that my snippet for getting the mouse location accounts for it, as far as I can tell based on my testing.

Shifting the axes up and down worked allowed a crosshair on the mouse to still follow the plot, even with a maxDistance of 5.

I believe the snippet in question got the mouse coordinates for the main axes, relative to the main axes, so when it got the nearest position according the the main axes, it seems to grab the correct values. (That said, I didn't step through the code to validate that, only went by what was drawn on screen during testing)

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 17, 2024

@StendProg I have a few thoughts on how to write the utility class to handle more sources easily, but it would entail an internal interface. What are your thoughts between these two ?

The first one is more flexible overall, while the secondary one might be more performant, but if caching converted values is implemented it assumes that the earlier indexes in the collection are not going to be changed

/// This one uses a function to get the X at specified index.
internal interface IDataSource
{
    int RenderIndexCount { get; }
    int MinRenderIndex { get; }
    int MaxRenderIndex { get; }

    double GetX(int index);
    double GetY(int index);
}

/// array/list implementations
double IDataSource.GetX(int index) => Xs[index];
double IDataSource.GetY(int index) => Ys[index];

/// Implementation for generics:
double IDataSource.GetX(int index)
{
    T1 x = Xs[index];
    return NumericConversion.GenericToDouble(ref x);
}
double IDataSource.GetY(int index)
{
    T2 y = Ys[index];
    return NumericConversion.GenericToDouble(ref y);
}

OR

// Xs and Ys would pass their collections to the consumer methods

internal interface IDataSource
{
    IList<double> Xs {get;}
    IList<double> Ys {get;}
    int RenderIndexCount { get; }
    int MinRenderIndex { get; }
    int MaxRenderIndex { get; }
}

A new wrapper collection could be implemented to wrap generics, and have that passed via the interface.
If this had caching of the converted values, then it would also be usable in other spots, instead of calling the converters multiple times :

public static Coordinates[] GenericToCoordinates<T1, T2>(IEnumerable<T1> xs, IEnumerable<T2> ys)

Without Caching
 /// Helper class used to convert a generic collection to various other collections and cache their values
 internal class NumericList<T> : IList<double>
 {
     private readonly IList<T> Values;

     public NumericList(IList<T> collectionToWrap)
     {
         Values = collectionToWrap;
     }

     public int Count => Values.Count;
     public double this[int index] { 
         get
         {
             T baseValue = Values[index];
             return NumericConversion.GenericToDouble(ref baseValue);
         }
         set => throw new NotImplementedException(); 
     }

     bool ICollection<double>.Contains(double item) => Values.Any(i => NumericConversion.GenericToDouble(ref i) == item);

     public IEnumerator<double> GetEnumerator()
     {
         T val;
         foreach (T item in Values.ToArray())
         {
             val = item;
             yield return NumericConversion.GenericToDouble(ref val);
         }
     }

     bool ICollection<double>.IsReadOnly => true;
}
With Caching
internal class NumericList<T> : IList<double>
{
    private readonly IList<T> Values;
    private readonly IList<double> ConvertedValues;
    private bool isConverted => ConvertedValues.Count == Values.Count;


    public NumericList(IList<T> collectionToWrap)
    {
        Values = collectionToWrap;
        ConvertedValues = Values is IList<double> dbl? dbl : new List<double>(Values.Count);
    }

    public int Count => Values.Count;
    public double this[int index] { 
        get
        {
            EnsureConverted(index);
            return ConvertedValues[index];
        }
        set => throw new NotImplementedException(); 
    }

    private void EnsureConverted(int index)
    {
        if (index < 0) throw new ArgumentOutOfRangeException("index can not be less than 0", nameof(index));
        if (index < ConvertedValues.Count) return;
        if (index > Values.Count) throw new ArgumentOutOfRangeException("index exceeds number of values in collection", nameof(index));
        for (int i = ConvertedValues.Count - 1; i <= index; i++)
        {
            T value = Values[i];
            ConvertedValues[i] = NumericConversion.GenericToDouble(ref value);
        }
    }

    bool ICollection<double>.Contains(double item) => this.Any(i => i == item);

    public IEnumerator<double> GetEnumerator()
    {
        int max = Values.Count;
        for (int i = 0; i < max; i++)
        {
            if (i < ConvertedValues.Count)
            {
                yield return ConvertedValues[i];
            }
            else
            {

                T val = Values[i];
                ConvertedValues[i] = NumericConversion.GenericToDouble(ref val);
                yield return ConvertedValues[i];
            }
        }
    }

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 17, 2024

After some playing and implementing, I've landed on the first one, as it's a simpler interface and more flexible. I have added a few properties to it so that the static method for GetNearest can optimize a little bit (such as if using coordinates get the coordinate for XY instead of grabbing the index once for X and again for Y)

It will have parameters for passing in X and Y axes, and attempt to get the pixel per unit from the render pack.

I noticed that SignalXY accounts for offset and rotation in its logic, which is a flaw I noticed other items don't account for. So what I'm thinking is introduce a record struct with the additional parameters necessary (IsRotated, XOffset,YOffset, IYAxis?, IXAxis?). This way the method signature stays simple, while the struct is used to pass in the additional calculation parameters.

@StendProg
Copy link
Contributor

@RFBomb I apologize for the confusion. My comment about code duplication can be considered invalid. I realized only now that despite the fact that all implementations are similar to each other, they operate with different types/interfaces. I'm afraid that in an attempt to reduce everything to a common base, behind all these abstractions we can seriously lose performance. And for GetNearest(), performance is important because it is a resource-intensive operation in itself.

IDataSource Interface
- Interface applied to DataSources to use with the DataSourceUtilities static class
- Implemented on Scatter and Signal sources where able
- Implemented on Scatter and Signal + SignalXY plottables

DataSourceUtilities
- Extracted code from various GetNearest() functions and normalized them into GetNearest and GetNearestX in this static class
- Any rotation or scaling necessary for the calculation is expected to be performed via the IDataSource interface.
@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 18, 2024

@StendProg Too late, i ran with it lol
Nearly all 'GetNearest' functions were identical based on what I saw. The primary differences here were transforming the values for handling offsets and 90* rotation (which was hit or miss if they were handled!)
This last commit introduced an IDataSource interface that I've implemented on any of the sources that have GetNearest(), and stuffed all the logic into the static DataSourceUtilities class.

The JIT should be able to optimize that static method heavily over its many runs. As for handling rotation and offsets, that is taken care of by the interface GetXScaled() function, which will perform any scaling or rotation as the implementing type dictates. For example, SignalXY has a 'rotate' function that swaps X and Y, but Signal does not.

Oh, this also handles the issue with the Y/X axis selection when calling GetNearest on the plot itself, which is aware of what axes it uses

@StendProg
Copy link
Contributor

I did a little performance testing, and was quite surprised. You managed to generalize all the GetNearest() code and still it runs faster than the old one, at least for the SignalXY I tested. You have also included support for additional axes in your HelperMethod. Great.
It's a little unclear why DataSource needs to have both IDataSource and IGetNearest interfaces at the same time. If it implements IDataSource, we can calculate the nearest points via HelperMethod. Accordingly, it is not clear all this workaround code of the form:

        if (Data is IDataSource ds)
            return DataSourceUtilities.GetNearestX(ds, location, renderInfo, maxDistance, Axes.XAxis);
        else
            return Data.GetNearestX(location, renderInfo, maxDistance);

You've probably thought of different scenarios for future DataSources, I understand, but maybe don't overengineer and simplify?

You've done a great job, a lot of Plottables now supports finding the closest point, and it will be easy to support and add in new ones.

Here are the benchmark results if you are interested:
GetNearestNew - This PR implementation
GetNearestOld - previous implementation
GetNearestOptimized - optimized version for SignalXY proposed in the #4265.

SignalXY to search X=0..Points, Y = rnd.NextDouble().
Drawn on the surface (400, 600).
MaxDistance = 15px.
There was a search for the point closest to (Points/2, 1).

Method Points Mean Error StdDev Allocated
GetNearestNew 100_000 120,196.5 ns 485.96 ns 430.79 ns -
GetNearestOld 100_000 182,061.7 ns 972.41 ns 862.02 ns -
GetNearestOptimized 100_000 982.1 ns 3.63 ns 3.40 ns -
GetNearestNew 1_000_000 1,127,873.4 ns 7,267.32 ns 6,797.86 ns 1 B
GetNearestOld 1_000_000 1,756,229.7 ns 28,607.35 ns 26,759.33 ns 1 B
GetNearestOptimized 1_000_000 12,097.8 ns 53.69 ns 47.60 ns -
GetNearestNew 10_000_000 11,978,655.0 ns 109,451.80 ns 102,381.29 ns 6 B
GetNearestOld 10_000_000 18,834,700.0 ns 63,646.54 ns 56,421.00 ns 12 B
GetNearestOptimized 10_000_000 14,119.5 ns 73.07 ns 68.35 ns -

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 18, 2024

It's a little unclear why DataSource needs to have both IDataSource and IGetNearest interfaces at the same time.

So, because so many consumers might implement the various existing interfaces already in custom sources, I didn't want to update the interface (such as ISignal and IScatterSource) to accommodate this new functionality. Case in point : ISignal has GetNearest while IScatterSource does not. (But the types themselves now implement IGetNearest and IDataSource).

Further, the DataSources have no knowledge of the assigned axis, that belongs to the Plottable. This allows the publicly exposed GetNearest methods to not have complex overloads where you would be supplying known information. The data source falls back to using (Bottom,Left) if no axis is supplied, which is why I check in that order.

Another point is that the Scatter DataSources don't have offset or scaling information, while the Scatter plot does. So that plot uses the IDataSource interface to perform the GetXScaled() and GetYScaled() calls, nearly everything else is passed back through the Data object.

Also, yay benchmarks! I wasn't sure if it would actually be faster or not, I was just hoping

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 18, 2024

Next step for this is to implement the improvements you suggested over in your PR, as it was something I had looked at but wanted a functional version before taking a look at the BinarySearch. But you already figured out how to that! And those benchmarks are impressive.

I'm thinking the best way to do this is likely use the IDataSource interface and add GetClosest(mouseCoordindate), that way the data source can call the appropriate overload for the binary search

- Moves the Binary Search Extensions into DataSourceUtilities
- Adds several overloads to common scenarios ( double[] )
- Renames the default comparer and extends it for more IComparer interfaces
- Create the generic comparer
- Adds a new helper class  `CoordinateDataSource` for scenarios (specifically Scatter plot) that should reduce expensive interface calls  where the IDataSource does not own the data collection
@RFBomb RFBomb marked this pull request as draft September 19, 2024 13:11
@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 19, 2024

@StendProg Alright, I've Adapted your work over on #4265 into DataSourceUtilities.GetNearest_Optimized
This required some changes to the interface and all implementing classes to defer the GetXClosestIndex to the implementing class.

For example, any of the SignalSource classes only accept Y values, and their X values are calculated on demand. So these will not require using binary search to locate the index. Whereas SignalXY will use the binary search that it implements privately.

Scatter is a unique one currently, and we may want to expand IScatterSource, but currently I've avoided that. If using any of the built-in sources, it should be optimized as they all implement IDataSource, but if a consumer had a custom source, I put in a helper to avoid constantly requesting the GetScatterPoints list.

I have next few days off, so I won't be contributing to this during that time, but if you wanted to run more benchmarks I wouldn't say no!

Add Unit Tests for IGetNearest and IDataSource
@StendProg
Copy link
Contributor

StendProg commented Sep 20, 2024

Hi @RFBomb,

Benchmark for Scatter:
X = 0..N-1, -1
Y = rand.NextDouble()
GetNearestNew - Scatter.GetNearest()
GetNearestNewDataSourceOnly - Scater.Data.GetNearest()
GetNearestOldDataSouceOnly - actually optimized old implementation, original old get ~2.5 times slower.

Method Points Mean Error StdDev Allocated
GetNearestNew 10_000_000 45.516 ms 0.0797 ms 0.0665 ms 50 B
GetNearestNewDataSourceOnly 10_000_000 8.472 ms 0.0445 ms 0.0395 ms 6 B
GetNearestOldDataSourceOnly 10_000_000 7.682 ms 0.0332 ms 0.0311 ms 6 B

I am not at all confused by the long time for Scatter.GetNearest(). You are wrapping the array to support advanced functionality, this can be eliminated in the future by modifying the original DataSource. Let it not confuse you either. I made a separate benchmark for DataSource without wrappers.

It took me hours to get this result. Before that your search was consistently winning starting with a more 10x advantage :-)
The old implementation was very suboptimal and recalculated non-variable values right in the loop.

In this benchmark you have chosen the worst case scenario where only the last point is not sorted in ascending order. That's why you method lose a bit because of the additional run through the array. Your ordering test works surprisingly well. I think most users load sorted values into the scatter anyway. But Scatter itself declares an arbitrary order of points. And it turns out we are checking and exploiting a user error in the wrong Plottable type. If we can think of prejudices against using SignalXY for ordered X values, they could be closed by creating a separate Scatter type that allows only ordered values.
Scatter ordering caching is fundamentally wrong, values can change without any notification. This is a typical case, not an exception.

In spite of the above, I am impressed by your solution. You have developed a whole subsystem for DataSources that I had to understand for hours. You took into account Generics, generalized existing duplicate code, put a fast algorithm from PR #4265 even in Scatter....

Not to derail your creative impulse, but I would still ask you to stop adding new functionality in this PR and leave it for future ones. It's better to test and document the current one. If you have time, definitely try to correct the description of the PR in the first post, you've gone way beyond what you described earlier.

  1. A small bug in SignalXY, but it breaks GetNearest():

int IDataSource.GetXClosestIndex(Coordinates mouseLocation)
{
return Rotated
? GetIndex(mouseLocation.X)
: GetIndex(mouseLocation.Y);
}

int IDataSource.GetXClosestIndex(Coordinates mouseLocation)
{
return Rotated
? GetIndex(mouseLocation.X)
: GetIndex(mouseLocation.Y);
}

Should be:

 int IDataSource.GetXClosestIndex(Coordinates mouseLocation) 
 { 
     return Rotated 
         ? GetIndex(mouseLocation./*X*/ Y) 
         : GetIndex(mouseLocation./*Y*/ X); 
 } 

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 20, 2024

@StendProg i much appreciate the feedback! After implemented IDataSource on my desired signals (just the Scatters and Signals) I have only been tweaking trying to get it to work properly.

This took a while for me to analyze the various implementations of GetNearest (and how some accounted for offsets and others didn't), so not surprised it took someone else a few hours to understand my abstraction. But my goal was to standardize the calls. And each source has its own nuances to work through.

I think your work with the Optimized search (now GetNearestFast) was great, and I incorporated that here. Unfortunately it took me a few hours to track down a bug that I introduced when I translated it, which was fixed in the last commit. (It was only checking the one index, so it would iterate the entire collection every time, instead of checking the next candidate and finding best result). That bug is likely why your tests took so long!

As of now, I'm not adding more functionality per se, just documentation and unit tests. IDataSource was not my initial goal in this commit, but as I tried implementing IGetNearest I was running into issues, and needed it.

One hiccup I am having a hard time with is the difference between IScatterSource and ISignalSource is that IScatterSource does not have offsets or scaling built in. This makes Signal and SignalXY perfect for testing the methods in unit tests, but difficult for scatter. Because the Signal sources can handle their own GetX/Y with scaling, it makes it easier than scatter.

This is why I created the proxy class, in case the IScatterSource is not an IDataSource. But that should be fallback and not default, because as of now all the IScatterSources implement IDataSource. The proxy should exist only the duration of GetNearest (creating many of them is expensive, but again it really should be not be created if using built in sources).

Another issue I'm running into while unit testing is the GetClosestIndex is inconsistent, and I don't know why yet. Sometimes it returns the exact index, other times it returns 0 or 1 (expecting 20). I think offsets and scaling due to mouse coordinates are coming into play here, I likely have to de-scale the mouse coordinate by the pixels-per-unit. (This will also eliminate multiplying it in the loop!)

@StendProg
Copy link
Contributor

(It was only checking the one index, so it would iterate the entire collection every time, instead of checking the next candidate and finding best result). That bug is likely why your tests took so long!

You may have misunderstood, I purposely chose the worst-case scenario. When you check the ordering of Scatter, and only the last point is not ordered. Because the Scatter is unordered, you fall into a simple minimum search, which is essentially identical to the old one (if you optimize old implementation). A small loss is obtained only by checking for ordering over the entire array up to the last point.
Unless you deliberately trick the system into checking for orderliness and let it stop somewhere at the beginning of the array. Your complete search is almost identical in time to the old one (very small losses at the error level).

I have conducted various benchmarks, but there is nothing to say, the signals work in nanoseconds on millions of points. Ordered Scatter fast too, despite the fact that it runs through the entire array to check for orderliness every time.
.

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 20, 2024

Would you mind clarifying this?

Scatter ordering caching is fundamentally wrong, values can change without any notification. This is a typical case, not an exception.

There are several scatter sources that take arrays, but I think 3 that accept a list. The array I wouldn't expect to be modified once added to the source, but the list I can see being modified during life of the program (say for example as data comes in it gets added).

Is updating the array of values common in your experience? I didn't think so, which is why input the bool? IsSorted cache field.

@RFBomb RFBomb changed the title Resolve Issue 3807, Add IGetNearest interface Resolve Issue 3807, Add IDataSource and IGetNearest Sep 22, 2024
- Fix broken demo for SignalXY Interaction
- disable the erroneous warning, as it shouldn't be a concern at runtime
@RFBomb RFBomb marked this pull request as ready for review September 22, 2024 15:29
@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 22, 2024

Alright, I believe work on this PR is complete. All unit tests I've added are passing, all the original ones are still passing, and I've checked out the demos and nothing appeared broken. ( I think this even may have fixed #4261, as I cannot replicate it in the demo. ( I'll have to check my app to verify )

Scatter, Signal, and SignalXY now use the GetNearest/GetNearestFast as appropriate. GetNearestFast is preferred.

And 'IsAscending' linq method was added to help determine that. A generic comparer, which prioritizers Comparer<T>.Default was also addedd. My benchmarks had it finishing sorted arrays with 100k values with the last item unsorted in about 20us thanks to using the default comparer for the generics, so impact on this check should be neglibile.

One item to note: Using GetNearest from the plottables should be preferred over the GetNearest of the DataSources, as not all DataSources (looking at you IScatterSource) account for offsets, while the plottables do. As such, GetNearest and GetNearestX on any DataSource may require the mouse coordinates to be unscaled prior to checking. A helper method has been introduced for this.

- Updates `CoordinateDataSource` to use properties over fields, and allows settings of min/max render index.
- Updates `IDataSource.IsSorted { get; }` to `IDataSource.IsSorted()`
- Updates Scatter Sources to prefer the `DataSourceUtilities.GetRenderIndexCount` function over a private property
- Adds `IndexRange.None` to represent an index range for an empty collection

DataSourceUtilties
- Improves `IsAscending` and handles enumerator disposal
- `GetClosestIndex` now uses `BinarySearch` function
- Better validation on `GetRenderIndexCount` and `GetRenderIndexRange`
- added documentation, fixed formatting
- Rename the file and class to better describe what it is testing (not just the interface, but the utilty class itself)
- Corrected name to not compete with class it tests
The generic comparer does not need to implement IComparer, it just needs to provide one.
- Fix erroneous warning preventing github compile
- Make GenericComparer<T> static
- Rename 'Instance' to 'Default' to more closesly mimick the Comparer<T>.Default api
- also renames BinarySearch parameter' count' to 'length' to pick up the inherited docs description
@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 25, 2024

Should IndexRange validate the values at all? Typically this is used by the MinRenderIndex and MaxRenderIndex values, but those are just properties that have nothing to guarantee they are >0. Granted, if someone were to make it less than 0, it would result in a runtime fault when the collection was attempted to be accessed. Just food for thought.

EX:

/// <summary>
/// Represents a range of indexes in an array (inclusive)
/// </summary>
public readonly record struct IndexRange(int Min, int Max)
{
    /// <summary>
    /// The IndexRange that represents an empty collection
    /// </summary>
    public static readonly IndexRange None = new(-1, -1);

    /// <summary>
    /// The number of indexes the within the range
    /// </summary>
    public int Length { get; } = Validate(Min, Max);

    /// <summary>
    /// <see langword="true"/> if the Min &amp; Max represent valid indexes, otherwise <see langword="false"/>
    /// </summary>
    public bool IsValid => Min >= 0;
    private static int Validate(int Min, int Max)
    {
        if (Min == Max && Min == -1) return 0; // special case
        
        if (Min < 0) throw new ArgumentException($"{nameof(IndexRange.Min)} must be at least 0", nameof(IndexRange.Min));
        if (Max < Min) throw new ArgumentException($"{nameof(IndexRange.Max)} cannot be less than {nameof(IndexRange.Min)}", nameof(IndexRange.Max));
        return Max - Min + 1;
    }
}

DataSourceUtilities
- Ensure `GetRenderIndexRange().Length` == `GetRenderIndexCount()`

Scatter Sources
- Prefer GetRenderIndexRange() over direct instantiation of IndexRange()
@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 26, 2024

@StendProg - I found out a nuance that I think needs consideration. When testing my app, which uses multiple Y axes but only 1 X, the new methods in this produces more accurate results, especially now that SignalXY and Scatter account for axes and offsets properly.

BUT, it still required me to get the mouse-coordinates per-axis like so to get accurate results:
 for (int i = 0; i < items.Count; i++)
 {
     var item = items[i]; // items is a collection of wrappers for Scatter and SignalXY 
     if (!item.IsVisible) continue;
     Coordinates mouseLoc = plot.GetCoordinates(mousePixel, plot.Axes.Bottom, item.YAxis);
     DataPoint dataPoint = item.GetNearest(mouseLoc, plot.LastRender, 15);
     if (dataPoint.IsReal)
     {
         // calculate the distance of the point to the mouse
         double distance = dataPoint.Coordinates.DistanceSquared(mouseLoc);
         if (distance < smallestDistance)
         {
             // store the datapoint
             nearestPoint = dataPoint;
             nearestPointSource = item;
             pointSelected = true;
             smallestDistance = distance;// update the smallest distance
         }
     }
 }

The issue is that when using multiple axes, you have to use the overload for 'plot.GetCoordinate' that specifies the axis. Otherwise you end up with coordinates based on the default axis.

image

One possible solution is to have overloads in the Plottables like this:

DataPoint GetNearest(Pixel mousePixel, RenderDetails renderDetails, float maxDistance)
{
    var x = this.Axes.XAxis.GetCoordinate(mousePixel.X, renderDetails.DataRect);
    var y = this.Axes.YAxis.GetCoordinate(mousePixel.Y, renderDetails.DataRect);
    return GetNearest(new Coordinates(x, y), renderDetails, maxDistance);
}

It would require that PlottableAdder.Plottable() get modified as such to ensure the plottables axes are always set (which I believe would be good practice anyway, otherwise the plottable may have null axes despite being on a plot - unless one plottable is usable on multiple plots, oof):

public IPlottable Plottable(IPlottable plottable)
{
    Plot.PlottableList.Add(plottable);
    plottable.Axes.XAxis ??= Plot.Axes.Bottom;
    plottable.Axes.YAxis ??= Plot.Axes.Left;
    return plottable;
}

That said, I don't intend to make any more changes like this, just food for thought until this is accepted

@StendProg
Copy link
Contributor

Hi @RFBomb,

BUT, it still required me to get the mouse-coordinates per-axis like so to get accurate results:

If you think that recalculating mouse coordinates with additional axes for each Plottable looks messy (I totally agree with you on this). Maybe just create an additional overloaded method that simplifies this:

public class Plot
{
..
    Coordinates GetCoordinates(Pixel mousePixel, IPlottable plottable)
    {
        return GetCoordinates(mousePixel, plottable.Axes.XAxis, plottable.Axes.YAxis);
    }
..
}

I don't like your solution, but I can't suggest something better either.

  1. Your solution will break existing interfaces. GetNearest(Coordinates Pixel, ...)
  2. Will bring additional responsibility to own mouse coordinates in Plottable/DataSource.
  3. But worst of all, it will abandon null for the default axes. Not everyone creates a Plottable via PlottableAdder. It's not hard to manually create an instance and add it to the Plottable list. You'll break all such existing code and make it harder to create new Plottables manually. Also by having null as a default you are not tied to a specific Plot. You can easily move/duplicate an existing Plottable to a completely different Plot. (I've done this a few times, I think it's a relatively common scenario.)

That said, I don't intend to make any more changes like this, just food for thought until this is accepted

You've been held hostage to a large and complex PR. This is what I tried to protect you from by advising you not to add additional functionality. If it were simple, you would not depend so much and could create a parallel PR.
This issue doesn't seem to be as strongly related and could be discussed in a separate Isuue.

@RFBomb
Copy link
Contributor Author

RFBomb commented Sep 27, 2024

You've been held hostage to a large and complex PR. This is what I tried to protect you from by advising you not to add additional functionality. If it were simple, you would not depend so much and could create a parallel PR.

Well, to be fair I think it's a more robust PR than originally intended as well. All stemming from "maybe this should be in a utility class". The main API hasn't changed, just how things are calculated has been.

This issue doesn't seem to be as strongly related and could be discussed in a separate Isuue.

Agreed. After reviewing some of the other scenarios and plottables, even within my own app, I couldn't find a way to cleanly do a GetNearest that I'd be happy committing.

One thought maybe tho is a method on plot itself.

(DataPoint datapoint, Iplottable? signal) GetNearest(Pixel mousePixel);

The plot knows last render, can get coordinates for all plottables, just the pixel would have to be supplied. It returns the datapoint, and if it was a match the plottable item. The consumer would then be able to check if the plottable item implements interfaces or is a specific type as their app requirements dictate.

@swharden
Copy link
Member

swharden commented Oct 9, 2024

Hi @RFBomb and @StendProg, thanks for this PR and the insightful discussion!

This is a complicated and nuanced topic, as we are all discovering 😅

I don't like your solution, but I can't suggest something better either.

This is my favorite quote from this week 🤣 This stuff is very difficult!

I'm going to leave this PR open while I resolve some less complex open issues and PRs for the next several days, then I'll return to this one and review it carefully and provide some deeper feedback.

@swharden swharden changed the title Resolve Issue 3807, Add IDataSource and IGetNearest Refactor data sources to implement IDataSource and include GetNearest functionality Oct 9, 2024
@StendProg
Copy link
Contributor

This is a complicated and nuanced topic, as we are all discovering 😅

I don't like your solution, but I can't suggest something better either.

To be clear, this statement refers to a discussion of further development beyond the scope of this PR.
I like the PR itself very much and fully support its approval.

@swharden
Copy link
Member

swharden commented Oct 9, 2024

I like the PR itself very much and fully support its approval.

Thanks for the clarification! Knowing this, I'll review this now and merge it in 👍

@swharden swharden enabled auto-merge (squash) October 9, 2024 18:26
@swharden swharden merged commit 3eed4e5 into ScottPlot:main Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants