Conversation
| /// <summary> | ||
| /// When the Ys array must be expanded, increase its length by this number of values. | ||
| /// </summary> | ||
| private readonly int CapacityIncriment; |
There was a problem hiding this comment.
I think a constant-valued increment is not going to perform well, which is why dynamic arrays have geometric (or sometimes something fancier) growth.
Because I couldn't find the source for List<T>, I found the source for Microsoft's implementation of the C++ STL. You're looking at the function which determines how to grow an std::vector<T>'s backing buffer:
https://github.com/microsoft/STL/blob/main/stl/inc/vector#L1672-L1688
If you plot the two you see that constant increment leads to a linear number of resizes while constant multiple leads to a logarithmic number of resizes. If you take the example where you multiply the array length by k every time you resize it, resizing the array takes T(kn), but this only needs to be done once for every kn elements. Since T(kn) / kn is O(1) pushing to the array is amortized constant time.
I wrote a thingy in Python to illustrate this:

Code:
import matplotlib.pyplot as plt
import math
INIT_SIZE = 50
INCREMENT = 50
FACTOR = 1.5
def constant_growth(size):
new_size = INIT_SIZE
i = 0
while new_size < size:
new_size += INCREMENT
i += 1
return i
def geometric_growth(size):
new_size = INIT_SIZE
i = 0
while new_size < size:
new_size *= FACTOR
i += 1
return i
ns = [i for i in range(1, 10_000)]
plt.plot(ns, ys_constant)
plt.plot(ns, ys_geometric)
plt.show()There was a problem hiding this comment.
You can change the resizing algorithm, but it'd likely be easier for you and the user to just allow them to pass in a List<double>, and then you don't need to manage any memory yourself. Plus the BCL can probably be faster because it'll be using unmanaged buffers, not C# arrays.
There was a problem hiding this comment.
Hey @bclehmann,
These are excellent points! I considered the growth rate, but went with the linear model here because it seemed logical considering that Signal plot data is expected to come in at a fixed rate (SampleRate), so being able to set that would let the user intuit how often the array will be copied. A counter argument to the exponential growth model is that it's wasteful when the arrays get large. I guess the fundamental trade off is CPU vs. memory? Considering how slow plotting is, an occasional array copy doesn't seem likely to make a big difference either way, so it may make sense to favor conserving memory here. Ultimately it probably doesn't matter much one way or another, but this is very good to consider!
Maybe an intermediate option is a hybrid solution that uses two public fields?
public float GrowthFactor = 1.5;
public int MaximumCapacityIncrement = 1e6;Your last comment noted "Allow them to pass in List<double>", which would be great, but the signal plots use min/max algorithms that require double or generic arrays, making it difficult to support lists #526, #447, #103
There was a problem hiding this comment.
You cannot use < or > with generics, however, you can define a comparator function. It would be a pain for the user to have to specify a comparator function, but I suppose you could have a dictionary of common ones (i.e. numeric types).
Also, I'm not sure how this applies, I was suggesting List<double>, not List<T>. You have the same problem with comparison operators with T[] as you do with List<T>, even if we did want to allow users to pass in int or float (or whatever new floating point format NVidia comes up with next). Perhaps I misunderstood what you're saying?
There was a problem hiding this comment.
Hi @bclehmann,
You cannot use < or > with generics, however, you can define a comparator function. It would be a pain for the user to have to specify a comparator function, but I suppose you could have a dictionary of common ones (i.e. numeric types).
Also, I'm not sure how this applies, I was suggesting List, not List. You have the same problem with comparison operators with T[] as you do with List, even if we did want to allow users to pass in int or float (or whatever new floating point format NVidia comes up with next). Perhaps I misunderstood what you're saying?
This problem is solvable, and has already been solved for Signal<T>.
One hack from Jon Skeet is to use compiled at runtime lambda expressions. At runtime type of T already can be taken, and you can manualy compile any math on it to high performance IL code and call it. Thats make code complex and unreadable, and compiler cant check for errors. But using lambda reduce performance drop to only 10-30% compare to nongeneric implementation.
| /// The internal data structure will be automatically expanded in memory if needed. | ||
| /// </summary> | ||
| /// <param name="ys">new Y values</param> | ||
| public void AddRange(IEnumerable<double> ys) |
There was a problem hiding this comment.
There's a nice little optimization (which you can also see in Microsoft's implementation of the STL, where you can compare the requested size to the size given by your growth function. If your growth function gives enough space for the requested size then you go with that, otherwise you just allocate the requested amount:
You can simply get the requested size by taking the current size and adding the length of the new data. However, this optimization requires casting ys to a double[] or List<double> or similar as IEnumerable.Count() is a LINQ extension method that loops over the entire collection to get the count as IEnumerable does not provide a Count property (so you can use it to create generators).
So unfortunately this optimization requires checking if the ys is an array or list or something that has an O(1) count field, and then casting to whichever type to use that field.
|
Hi @swharden, @bclehmann, As far as I understand, there is an attempt to create your own version of the
If we subscribe to these events a method that changes the
|
|
Hi @StendProg, thank you for your input! To be sure I understand your suggestion, I'll re-state it with sample code. The user will do something like this: SignalData temperatures = new SignalData();
void Init()
{
formsPlot1.Plot.AddSignal(temperatures);
}
void Timer1_Tick()
{
double temperature = GetCurrentTemperature();
values.Add(temperature);
formsPlot1.Render();
}This strategy is indeed very easy for the user! What the user does not see is that Did I understand your suggestion correctly? If so, this is a very interesting design I had not considered. EDIT / more thoughts:
|
|
Hi @swharden,
I just noticed that your implementation can be well encapsulated. What to do with this further, a completely different question, a set of ScatterList, SignaList, ... classes also seems to be a good option if they can be easy implemented relying on this separate type (it can be private field in ScatterList for example). The approach using a separate set of data adapted for plot purpouse is widely known and is used by other libraries, only all of them are slow when drawing large amounts of data, I will not presume that this is why, but it is very easy to lose performance with a large number of abstractions .. |




The goal of
SignalPlotListis to make it easier to efficiently display growing datasets with Signal plots.SignalPlotListdoesn't useList<T>to store data; it just behaves likeList<T>withAdd(),AddRange(),Capacity, andCount. It still usesdouble[] Ysfor data storage, but when data is addedmaxRenderIndexis moved automatically andYsis resized as needed.@StendProg, do you have any thoughts about this idea? Do you think it would be a good idea to move this functionality into
SignalPlotBaseso it could be used in all types of Signal plots?WinForms Demonstration
list.mp4