Skip to content

SignalPlotList#815

Closed
swharden wants to merge 1 commit intomasterfrom
SignalPlotList
Closed

SignalPlotList#815
swharden wants to merge 1 commit intomasterfrom
SignalPlotList

Conversation

@swharden
Copy link
Member

@swharden swharden commented Feb 21, 2021

The goal of SignalPlotList is to make it easier to efficiently display growing datasets with Signal plots.

SignalPlotList doesn't use List<T> to store data; it just behaves like List<T> with Add(), AddRange(), Capacity, and Count. It still uses double[] Ys for data storage, but when data is added maxRenderIndex is moved automatically and Ys is resized as needed.

@StendProg, do you have any thoughts about this idea? Do you think it would be a good idea to move this functionality into SignalPlotBase so it could be used in all types of Signal plots?

WinForms Demonstration

readonly ScottPlot.Plottable.SignalPlotList SignalPlotList;

public Form1()
{
    InitializeComponent();
    SignalPlotList = formsPlot1.Plot.AddSignalList();
}

// high frequency tick (1 ms) adds data
private void timer1_Tick(object sender, EventArgs e)
{
    SignalPlotList.AddRange(/* new data */);
}

// low frequency tick (20 ms) renders the plot
private void timer2_Tick(object sender, EventArgs e)
{
    var axisLimits = formsPlot1.Plot.GetAxisLimits();
    if (axisLimits.XMax < SignalPlotList.LastX)
        formsPlot1.Plot.SetAxisLimits(xMax: axisLimits.XMax + 250);
    formsPlot1.Render();
}
list.mp4

/// <summary>
/// When the Ys array must be expanded, increase its length by this number of values.
/// </summary>
private readonly int CapacityIncriment;
Copy link
Member

@bclehmann bclehmann Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a constant-valued increment is not going to perform well, which is why dynamic arrays have geometric (or sometimes something fancier) growth.

Because I couldn't find the source for List<T>, I found the source for Microsoft's implementation of the C++ STL. You're looking at the function which determines how to grow an std::vector<T>'s backing buffer:
https://github.com/microsoft/STL/blob/main/stl/inc/vector#L1672-L1688

If you plot the two you see that constant increment leads to a linear number of resizes while constant multiple leads to a logarithmic number of resizes. If you take the example where you multiply the array length by k every time you resize it, resizing the array takes T(kn), but this only needs to be done once for every kn elements. Since T(kn) / kn is O(1) pushing to the array is amortized constant time.

I wrote a thingy in Python to illustrate this:
image

Code:

import matplotlib.pyplot as plt
import math

INIT_SIZE = 50
INCREMENT = 50
FACTOR = 1.5


def constant_growth(size):
	new_size = INIT_SIZE
	
	i = 0
	while new_size < size:
		new_size += INCREMENT
		i += 1
	
	return i

def geometric_growth(size):
	new_size = INIT_SIZE
	
	i = 0
	while new_size < size:
		new_size *= FACTOR
		i += 1
	
	return i

ns = [i for i in range(1, 10_000)]

plt.plot(ns, ys_constant)
plt.plot(ns, ys_geometric)
plt.show()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closed Form Representation

A closed form representation of the number of resizes for each function is below:
Constant Increment:
image

Constant Multiple:
image

Cost of Pushing

Simply multiply the cost of resizing by the frequency of resizing:

image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change the resizing algorithm, but it'd likely be easier for you and the user to just allow them to pass in a List<double>, and then you don't need to manage any memory yourself. Plus the BCL can probably be faster because it'll be using unmanaged buffers, not C# arrays.

Copy link
Member Author

@swharden swharden Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bclehmann,

These are excellent points! I considered the growth rate, but went with the linear model here because it seemed logical considering that Signal plot data is expected to come in at a fixed rate (SampleRate), so being able to set that would let the user intuit how often the array will be copied. A counter argument to the exponential growth model is that it's wasteful when the arrays get large. I guess the fundamental trade off is CPU vs. memory? Considering how slow plotting is, an occasional array copy doesn't seem likely to make a big difference either way, so it may make sense to favor conserving memory here. Ultimately it probably doesn't matter much one way or another, but this is very good to consider!

Maybe an intermediate option is a hybrid solution that uses two public fields?

public float GrowthFactor = 1.5;
public int MaximumCapacityIncrement = 1e6;

Your last comment noted "Allow them to pass in List<double>", which would be great, but the signal plots use min/max algorithms that require double or generic arrays, making it difficult to support lists #526, #447, #103

Copy link
Member

@bclehmann bclehmann Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use < or > with generics, however, you can define a comparator function. It would be a pain for the user to have to specify a comparator function, but I suppose you could have a dictionary of common ones (i.e. numeric types).

Also, I'm not sure how this applies, I was suggesting List<double>, not List<T>. You have the same problem with comparison operators with T[] as you do with List<T>, even if we did want to allow users to pass in int or float (or whatever new floating point format NVidia comes up with next). Perhaps I misunderstood what you're saying?

Copy link
Contributor

@StendProg StendProg Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bclehmann,

You cannot use < or > with generics, however, you can define a comparator function. It would be a pain for the user to have to specify a comparator function, but I suppose you could have a dictionary of common ones (i.e. numeric types).

Also, I'm not sure how this applies, I was suggesting List, not List. You have the same problem with comparison operators with T[] as you do with List, even if we did want to allow users to pass in int or float (or whatever new floating point format NVidia comes up with next). Perhaps I misunderstood what you're saying?

This problem is solvable, and has already been solved for Signal<T>.

One hack from Jon Skeet is to use compiled at runtime lambda expressions. At runtime type of T already can be taken, and you can manualy compile any math on it to high performance IL code and call it. Thats make code complex and unreadable, and compiler cant check for errors. But using lambda reduce performance drop to only 10-30% compare to nongeneric implementation.

#103 (comment)

/// The internal data structure will be automatically expanded in memory if needed.
/// </summary>
/// <param name="ys">new Y values</param>
public void AddRange(IEnumerable<double> ys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a nice little optimization (which you can also see in Microsoft's implementation of the STL, where you can compare the requested size to the size given by your growth function. If your growth function gives enough space for the requested size then you go with that, otherwise you just allocate the requested amount:

https://github.com/microsoft/STL/blob/main/stl/inc/vector#L1683-L1685

You can simply get the requested size by taking the current size and adding the length of the new data. However, this optimization requires casting ys to a double[] or List<double> or similar as IEnumerable.Count() is a LINQ extension method that loops over the entire collection to get the count as IEnumerable does not provide a Count property (so you can use it to create generators).

So unfortunately this optimization requires checking if the ys is an array or list or something that has an O(1) count field, and then casting to whichever type to use that field.

@StendProg
Copy link
Contributor

StendProg commented Feb 22, 2021

Hi @swharden, @bclehmann,

As far as I understand, there is an attempt to create your own version of the List.
So why not do it explicitly, i.e. create a separate type of say CustomList.
All we need from this type for further binding with signals is a few events:

  1. event SizeChanged(double lastValue, double newValue);
  2. event CapacityChanged(double[] newY);

If we subscribe to these events a method that changes the MaxRenderIndex of the existing Signal (1 event).
And the method that calls the replacement of the ys at the second event (event 2).
Then maybe everything will work out of the box without creating a separate signal type.
In case the SignalConst it is also necessary to track the corresponding changes of any ranges of the array and call UpdateData.

List<T> uses a fairly simple strategy for changing the capacity. When capacity is not enough, it just doubles, always, and never decreases automatically (Only through a manual call).
If it was possible to reach the source array in the List<T> and add a couple of events, this would be the easiest option, but I have big doubts that this is possible...

@swharden
Copy link
Member Author

swharden commented Feb 23, 2021

Hi @StendProg, thank you for your input! To be sure I understand your suggestion, I'll re-state it with sample code. The user will do something like this:

SignalData temperatures = new SignalData();

void Init()
{
    formsPlot1.Plot.AddSignal(temperatures);
}

void Timer1_Tick()
{
    double temperature = GetCurrentTemperature();
    values.Add(temperature);
    formsPlot1.Render();
}

This strategy is indeed very easy for the user! What the user does not see is that Add() may increase the size of the array. If this happens, an event is invoked that tells the Signal plot to update Ys to point to the new array. Every time data is added UpdateData() is called so trees can be recalculated if necessary.

Did I understand your suggestion correctly? If so, this is a very interesting design I had not considered. SignalData could store Ys and SampleRate, and maybe a similar ScatterData class could store X/Y pairs for other plot types. It also seems more logical for the user to store data objects (rather than plottables themselves) if the intent is to Add() data later. This is an interesting design to consider!

EDIT / more thoughts:

  • validation could occur in the data module
  • min/max search and tree management could occur in the data module
  • plottables would be simpler, limited to just rendering
  • both modules would be easier to test
  • GetAxisLimits() could be put in the data module
  • Events may not be needed if Scatter plots store the SignalData and reference SignalData.Ys

@StendProg
Copy link
Contributor

Hi @swharden,

Did I understand your suggestion correctly?

I just noticed that your implementation can be well encapsulated. What to do with this further, a completely different question, a set of ScatterList, SignaList, ... classes also seems to be a good option if they can be easy implemented relying on this separate type (it can be private field in ScatterList for example).

The approach using a separate set of data adapted for plot purpouse is widely known and is used by other libraries, only all of them are slow when drawing large amounts of data, I will not presume that this is why, but it is very easy to lose performance with a large number of abstractions ..

@swharden
Copy link
Member Author

This is a great discussion, and may evolve into something extremely useful. I no longer wish to proceed with the implementation in this PR so I will close it, but I started an issue to track development on this front #841 and added exploration of this topic to the current triage page #716

@swharden swharden closed this Feb 28, 2021
@swharden swharden mentioned this pull request Feb 28, 2021
48 tasks
@swharden swharden deleted the SignalPlotList branch March 28, 2021 17:33
@swharden swharden mentioned this pull request May 17, 2021
82 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants