Use Memory<T> and Span<T> for plot data (where possible) #766

Jmerk523 · 2021-02-10T04:13:30Z

Jmerk523
Feb 10, 2021

I'm doing some relevant personal work to enable the use of Memory and Span model for plot data, you can check out the code at this fork/branch: https://github.com/Jmerk523/ScottPlot/tree/plot_data

The idea is to use Memory and Span here for the same reasons that it's useful in general.
I won't go into a large explanation of why these classes are useful, suffice it to say they are becoming more and more common in .NET, including in core APIs such as System.IO.Stream.
Basically, these types let you wrap arbitrary (managed or unmanaged) memory with fast read/write access.

It's a large performance issue to have to convert between Arrays and any other form of memory, including copying and allocations, so supporting more methodologies seems useful. Plus, since we already have Memory and Span in all major releases of .net and core, no version upgrades are needed.

In my test implementation, I created an intermediate PlotData struct, which is essentially a Memory where I can define my own operators, allowing the class to implicitly convert existing array data automatically (Memory and Span support array storage natively).
I would have used Memory and Span directly, but that would have required conversions in a lot of locations that can be done automatically with an implicit operator.
The result was that PlotData was almost a perfect drop-in for externally-visible APIs with arrays as parameters, but also allowing Memory data to be used for plots.

The main drawback is API change - binary backwards-compatibility would require duplicating every changed API.
However, re-compiling is almost seamless and requires changes only in some very specific conditions.
Also, a dependency on System.Memory is required, but this is not a very large addition.

I'm wondering whether anyone else would find this useful, or has any related thoughts.

Jmerk523 · 2021-02-10T04:27:44Z

Jmerk523
Feb 10, 2021
Author

I found that exclusively supporting double[] as a form of input data has been a low-friction way to meet most people's needs. On one hand I recognize that fixed arrays aren't convenient ways to manage data for some people with changing datasets, but on the other hand ScottPlot benefits greatly from rendering optimizations that require fixed-length arrays (e.g., Signal plots).

I'm pretty hesitant to move away from this model, but if you have some concise code demonstrating your intermediate PlotData module I'd be interested to see how it's put together. Perhaps this can help give some insights into how to design (or document suggestions for how to design) adapter classes for people to render different types of data.

I guess my main response would be - this feature would not actually be moving away from arrays! Memory and Span can both natively wrap managed Arrays, and are designed to have array-like performance.
In unsafe code, you can also access them through pointers the same way you can do with fixed arrays.

Neither side as it exists today would notice much difference. On the ScottPlot side, the data is accessed using the same notation as it is today, mostly through indexers.
On the caller side, implicit operators allow calls to be automatically converted when using arrays.
As mentioned above, the only difference is binary signature compatibility, which could be maintained using two function signatures per API, but doesn't seem necessary since a lot of renaming/refactoring is happening right now anyway, so backwards binary compatibility is almost surely already broken.

Basically, a function changes from:
public static double[] Tan(double[] xs)
to
public static double[] Tan(in PlotData<double> xs)

Only the input types need to change in most cases, output types don't change unless you want to take it a step further and reduce ScottPlot's memory allocations (I'm looking into this separately).

No additional allocations happen either, since Memory, Span, and PlotData are all readonly structs (hence the 'in' qualifier) - there are much slower things that far outweigh the extra couple instructions needed to execute the implicit operator that wraps double[] into PlotData.

5 replies

Jmerk523 Feb 10, 2021
Author

@swharden
I didn't go into any "real" explanations, I feel that I'd be more effective at answering specific questions or concerns you have, since I estimate that this change has quite low impact overall, in the current repo conditions.

Jmerk523 Feb 10, 2021
Author

As a sidenote, my use-case which prompted this change, is plotting unmanaged data.
I have P/Invoke calls to an unmanaged library, which return an IntPtr to an unmanaged array of doubles.
To use it with ScottPlot today, I'd have to copy the entire dataset into a large managed array, with all sorts of other performance hits.
With the proposed change, I can implement a MemoryManager (included in System.Memory) which allows me to expose this unmanaged array as a Memory, which I can then convert to a PlotData the same way I would otherwise have had to do with a double[]. ScottPlot sees it as an indexable sequence of doubles, and doesn't need to differentiate between the underlying memory in order to plot it.
There's all sorts of other extensibility and flexibility here which I won't get into - suffice it to say, managed arrays can get quite costly, especially as they get larger - and if you can avoid using them completely when your data exists in a similar form, well all the better!

StendProg Feb 10, 2021

Hi @Jmerk523,
I really liked the work that you have done in this direction. The issue of extending the supported data types has been discussed many times #526.I am not familiar with these new features and may seem naive in some way.

I will express my doubts about the proposed features:

Can user pass double[] array to methods with changed signature? Or he needs to wrap it some way.
What are the limitations for using these features. ScottPlot support .Net framework 4.6 for now. This is actually a very important point, I would not like to shift support towards the latest platforms only, people are very conservative...
To be honest, these features are only needed for signals, hetmaps, vector fields. For other types, the data conversion time is negligible compared to the render time. If a decision is made to move in this direction, then it might be worth starting only with these types, and not plowing the entire library at a time.
I have doubts that the additional level of abstraction is completely free in terms of performance. Based on my little experience. .NET has very good optimizations for simple arrays, and additional abstractions can reduce performance by several times. It would be nice to see some benchmarks.

Only point 2 is actually very important. The rest is all solvable. It is very bad practice to force the user to use the latest version of the platform.

Jmerk523 Feb 10, 2021
Author

Thanks for pointing out those other threads. I haven't read them thoroughly yet but I think this kind of change may address at least part of the issue.
Let me answer your specific points:

The type 'double[]' must be wrapped in a PlotData. The other option is to have a different overloaded signature accepting double[], but I don't think that's necessary because user-defined implicit operators allow this conversion to happen automatically. I.e., void method(in PlotData<double> xs) can be called as method(new double[] { 1, 2, 3 }) with no additional sugar. The compiler will insert a call to public static implicit operator PlotData<double>(double[] array) { ... } automatically.
Memory and Span have a couple of inherent limitations, though they come up rarely, and indeed, all of the issues I ran into in the entire ScottPlot codebase were only API name differences. These types already exist in all platforms that ScottPlot compiles to today (net461, netstandard2.0, and net5.0). They are BCL types and are designed to be used in cases exactly like this. There were a couple quirks that came up, but not directly a result of limitations of these types, e.g. here.
This feature is not intended to directly improve the performance of ScottPlot. It is intended to allow consumers of ScottPlot the flexibility to have great performance in scenarios where creating/copying arrays is a problem for them. It does offer ScottPlot some benefits, but for those same reasons - as a consumer of its own APIs. The new types are also very flexible, and can be used alongside double[] easily (particularly since any double[] can be converted implicitly when needed, with effectively no overhead).
Memory and Span were designed with performance in mind. There are hundreds of articles on any aspect you would want to know more about, but the intention of their introduction was to bring pointer-like performance to managed languages without directly exposing pointers. However, a double* can be obtained from Span at any time in unsafe code, indeed it even enables pinning of the array when necessary. If performance is absolutely critical, then direct pointer access is the fastest you can get, and using Span to do so is an expected use case. If performance is important but not critical, accessing a Span has similar performance to accessing an Array (notice the Intrinsic attribute). Additionally, compared to an array, you can do many things with a Span (and a Memory) that are difficult with an array, e.g. allocation-free slicing and substring (selecting a sub-array as a new Span). The RyuJIT compiler has all sorts of optimizations for span as well, and there are other related changes which could improve performance significantly (e.g. use of System.Numerics.Vector is often compiled to intrinsic SIMD instructions). Also, any performance differences (good or bad) are overwhelmed by the amount of copying and array allocation that ScottPlot does - that issue is a pretty good target for improving performance related to allocations.

Overall, I think for supporting a larger set of data types, Memory and Span are the way to go.
I added PlotData only as a wrapper for Memory. It's not necessary but because it's user-defined it allows some low-cost simplifications like implicit conversion of Array to Memory over all 4 types - Memory, ReadOnlyMemory, Span, and ReadOnlySpan. I believe Memory also has an implicit conversion, but you sometimes want to use other types so being able to customize the automatic conversion made the drop-in of this type trivial.

In all the locations where I made API changes could be simplified even further by adding extension methods, like these ones I did for BinarySearch, which only redirect older calls of Array.BinarySearch to the MemoryExtensions.BinarySearch version for Span.

As time goes on, support for Span and Memory and related features are increasing due to their ability to enable allocation-free pointer-like computation speed with support for many intrinsic instructions, making their use blazing fast.
The existing code in ScottPlot has way more overhead in terms of speed (array copying an allocation primarily) than making use of PlotData does.

Some additional documentation:
https://docs.microsoft.com/en-us/dotnet/standard/memory-and-spans/
https://docs.microsoft.com/en-us/dotnet/standard/simd

StendProg Feb 11, 2021

@Jmerk523 Thanks for the detailed answer.
All the same, on question 4, I would prefer not to take everything on faith, but to see for yourself, especially since you provided a link to the finished code. But I don't know when my hands will come to this.
Otherwise it sounds very promising, at least for me, the conversion of tens of millions of points into an array just to match the API has been confusing me for a long time.

bclehmann · 2021-02-11T18:07:54Z

bclehmann
Feb 11, 2021

I think the performance and compatibility conversation is an important one, but I'm wondering how one communicates to users that a parameter which takes a PlotData<double> argument is implicitly convertible from double[], it seems to me this user is liable to become confused and would not think to simply pass in a double[] as they are intended to, they may instead attempt to instantiate the PlotData object themselves. I don't think having the user instantiate this object is bad per se (especially if there's a constructor that takes in a double[]), but one probably ought to communicate that it's unnecessary.

One option is obviously to maintain two overloads, one which takes PlotData<double>, one which takes double[], although this is clunky.

I wonder if one could rename PlotData such that it better communicates what it does. Perhaps one could create an IArray<T> interface which wrapper classes of Memory and Span and double[] would be compatible with. This does introduce virtual calls which do have a performance impact, but it seems like the point of this is to reduce data copying, which has a much larger performance impact. This does necessitate wrapper classes, not the native double[], but I think the use of the name IArray<T> is enough to tempt one to try passing in a double[], which will work because there can be an implicit conversion.

6 replies

Jmerk523 Feb 11, 2021
Author

you can make your own ref struct as a wrapper to Span, but a ref struct is not allowed to implement interfaces anyway, so the problem persists.
You also then have the problem of asking the user to know to wrap their array/memory/span in some unknown class with a custom interface and you are quickly making the problem more confusing, not less.
I think this kind of thing is solved by example instead, so if there is existing code which simply calls the API with a double[] then users will quickly learn to do the same.
More advanced users will simply try it and find out it works, or else read the source code and notice the operator for themselves, but I believe seeing existing usage that works is the biggest factor, and doesn't require anything more than providing examples at minimum.
The type and its constructors can also be easily documented so that intellisense indicates this compatibility/intention when users are looking at the PlotData members - this is built into almost every editor/ide that handles the code, and the other options are fine fallbacks for the rare cases where it isn't available.

@StendProg
On the topic of performance, here is some demonstrations of the design requirement that span be on par with arrays and all sorts of other interesting benchmarks and design info.

bclehmann Feb 12, 2021

My suggestion was to change the name of the class so it's obvious you can pass in an array. I think the IArray and wrapper classes which can be implicitly converted from double[] achieves that, it's obvious you can pass in an array and the user doesn't have to do anything more. I concede that wrapper classes are more complex, but the user never has to touch them, the user never has to do anything to wrap their data.

Regardless of whether one likes the wrapper class idea, I think PlotData is ambiguous in terms of what it means, is it one dimension, is it both, is it config information? Perhaps you can avoid the complexity of my solution with a simple rename?

bclehmann Feb 12, 2021

As for the benchmark you linked, it would probably be more relevant if it were a benchmark of scottplot with and without these changes, as ScottPlot's usage is probably very different than this synthetic benchmark. Not to say the benchmark isn't useful, but knowing how much the change would benefit the library is more important than knowing how it performs in a cleanroom. Scottplot likely uses the memory very different then this benchmark, not to mention most plottables are more likely to be bottlenecked by rendering speed than by memory, the exceptions being signals and heatmaps and probably a few others. It would be useful to know how much this change matters to scatterplots for example.

With this information you could prioritize different parts of the API, not to mention if there's a compelling performance difference it ought to be advertised so users know whether to bother switching or not. And if in certain situations arrays can perform better.

Jmerk523 Feb 12, 2021
Author

I'll be honest, I think you're focusing way too much on benchmarks here.
Scottplot is full of inefficiencies and slowdowns that dwarf the impact of using Span and Memory - not that Scottplot is poorly-performing, but rather, it's not so well optimized that every little difference needs to be analyzed.
Span and Memory have become accepted throughout the .net ecosystem, including and especially in high performance scenarios like direct modification of rows of pixels in image editing. I'm not going to try to convince you that it's efficient enough to meet your standards. All I can say is that it was meant to perform on par with arrays, and generally, it does.

If Scottplot is ever going to handle data in more formats than just double[], this is the way to do it, at least for a certain type of data - in particular, memory-like data.
Trying to implement such in any other way is not going to outperform this existing solution (the use of Memory/Span, not my implementation).

So, the real question here is, does Scottplot really care so much about performance that it cannot afford to use any method other than the one it currently does? I would say (again, supported by the fact that there's plenty of other inefficiencies) that the answer should very definitely be a resounding no.
All that matters is reasonable performance, and you're not going to find or implement a better solution than Memory or Span.

I also think that it's obvious that more flexibility is better than less flexibility, and the path forward in the BCL and Runtime has been to move to support Memory/Span in addition to Array (since binary backwards compatibility is a requirement there), and internally, usually this is done by wrapping the array into Memory/Span the same way I have done here.

Now, my personal need is to be able to call any of the plot methods in an efficient manner.
This means avoid array creation and copying if possible, and thus necessitates support for some kind of flexible conversion to Memory/Span.
You could make the same change I did, but convert to Memory/Span directly rather than using PlotData as an intermediate class.
Doing it that way loses a lot of flexibility, though, since you can't edit those types, e.g. to add operators, or to expand support for other features if you need it. But it would likely work fine and the change would be very similar.

Having already made this change, to all the APIs that are publicly visible, I have already verified for myself that my needs to plot non-array data are met, and I know (without having to achieve exact parity) that the efficiency is as good or better than Scottplot's efficiency in its current form.
So, I have simply offered my changes as a relatively standard solution to the problem of supporting more dynamic sources of memory, since it is likely others may benefit from it as well, especially using accepted technology that is well supported and commonly used by the ecosystem.

Unfortunately, life circumstances preclude me from having the energy available to convince you this is the right path. If you think it's not, then that's fine. My solution works for me, and that's all I need.
I just think there's a lot of unnecessary and premature optimization going on - there's much simpler things you could do to improve performance, and I don't even think those are necessary in the general case either.

As far as naming goes - you can name things whatever you like. But having one wrapper class seems much simpler than having many, and calling a struct an 'IArray' hurts me just to consider. I called it PlotData because it's used exclusively for - you guessed it - plot data. Maybe PlotDataValues? Just DataValues? Any of these seem more descriptive than double[] or xs (you get xs anyway since the parameter has its own name). But I think you're over-optimizing here too - the only way someone will have an issue is if they are trying to pass in non-array, non-memory types, and at that point, you would be trying to prevent a discussion-invoking situation that is unpreventable. You'd get much better results by simply having documentation available explaining anything you consider likely to be ambiguous, rather than programming around possible mistakes the end-user could make.
But I don't think you're going to get a much simpler and direct solution without eliminating the wrapper class completely and moving directly to Memory. Feel free to check out the branch from my fork and coming up with a preferable solution.

I have presented my proof of concept - it works, it is just as performant, it's nearly seamless (I had to make hardly any changes other than replacing double[] with with PlotData), and it's decently flexible. I don't think making it more complicated will solve any of your problems, but if you would like to do so, well at least I have provided a demonstration.

bclehmann Feb 12, 2021

I'll be honest, I think you're focusing way too much on benchmarks here.

As I understand it, this is a change made for performance reasons. I don't see how benchmarks cannot be pertinent to that discussion. Nobody said that the library was "so well optimized that every little difference needs to be analyzed". I want to know the performance difference because it would be helpful to be able to point out to users in the documentation in situations where performance is important. The point that this also allows more flexibility is well taken, but if this was about giving maximum flexibility we would probably also have an overload of every function that takes in IList<double> as well. (And who knows, that may be a good idea).

calling a struct an 'IArray' hurts me just to consider.

I was suggesting an interface called IArray, not a struct.

You'd get much better results by simply having documentation available explaining anything you consider likely to be ambiguous, rather than programming around possible mistakes the end-user could make.

I agree, documentation exists to reduce ambiguity. But the design of the API should not introduce unnecessary ambiguity. Most users (myself included) do not like having to stop writing software because they need clarification of a parameter list. Such APIs are unnecessarily frustrating (e.g. TensorFlow). Seeing as it's a class name and the name of the class has no purpose other than to describe what it does I don't believe there's any compromise in naming it something which reduces how often the documentation becomes necessary, thus saving the user time. Ambiguity that exists because something is inherently complex is acceptable, ambiguity in what is fundamentally an array can almost certainly be avoided.

Unfortunately, life circumstances preclude me from having the energy available to convince you this is the right path. If you think it's not, then that's fine. My solution works for me, and that's all I need.

That's ok, despite what I might perceive as rude comments I think this is a good idea. I don't need to be convinced that this is good from a flexibility point of view, and from a performance point of view I can run my own benchmarks. I am troubled that your implementation involves a 600+ line PlotData.cs file which at first glance it appears could be much shorter, more clear, and more maintainable. However, I am more than happy to try my hand at an implementation and I'll be much more able to discern whether my idea is actually simpler.

at least I have provided a demonstration.

You have, and you have brought this issue to my attention. Thank you, I will look at it further in the future. I don't know how other users feel but I think this is a good idea.

StendProg · 2021-02-13T13:08:35Z

StendProg
Feb 13, 2021

I make some performance benchmark:

		PlotData	double[]
RandomWalk Signal 50M Points	Debug mode	775 ms	80 ms
	standalone exe .NET 5.0 (Release)	80 ms	30 ms
	standalone exe .NET 4.8 (Release)	210 ms	30 ms
RandomWalk SignalXY 50M points	Debug mode	770 ms	80 ms
	standalone exe .NET 5.0 (Release)	70 ms	30 ms
	standalone exe .NET 4.8 (Release)	200 ms	30 ms

I used Scott build in benchmark info, There is averange value measured by 'eyes'. Error bars about 10-20 ms.
Tests were done on WPFDemo app.
For some reasone i get compiler errors for WinFormsDemo app in PlotData solution.

6 replies

Jmerk523 Mar 11, 2021
Author

I make some performance benchmark:

PlotData double[]
RandomWalk Signal 50M Points Debug mode 775 ms 80 ms
standalone exe .NET 5.0 (Release) 80 ms 30 ms
standalone exe .NET 4.8 (Release) 210 ms 30 ms
RandomWalk SignalXY 50M points Debug mode 770 ms 80 ms
standalone exe .NET 5.0 (Release) 70 ms 30 ms
standalone exe .NET 4.8 (Release) 200 ms 30 ms
I used Scott build in benchmark info, There is averange value measured by 'eyes'. Error bars about 10-20 ms.
Tests were done on WPFDemo app.
For some reasone i get compiler errors for WinFormsDemo app in PlotData solution.

@StendProg You haven't posted any information on what you actually benchmarked and how you ran your tests.
If you used my changes directly, there are several major gaps (intentional, this was a proof-of-concept) that would prevent "full performance" from being achieved. This includes the extra layer of indirection that PlotData adds to the Memory structure. Measuring access through this structure would obviously produce slow results and is meaningless since it's not intended to be used this way for high-perf access. It looks like this is what you are doing, but without any actual code, who knows.
If you read through the code, you will see that this is for convenience only. Perf-dependent cases would operate through the Memory structure directly.

Secondly, even if some change I made has a 100x performance decrease, this means nothing if the rest of the library already runs 1000x slower.
There are at least 10 or 15 patterns occurring that make this the case - ScottPlot's performance is absolutely not bounded by array random-access performance, and therefore chances to this performance are effectively irrelevant since the overall performance is already limited by other components.

If you profile any of the sample applications with a concurrency profiler, you will see that the sheer amount of allocations occurring is causing such high GC pressure that you get a collection many times per second. This is severely slow and can result in stuttering.
Your visual performance is also bounded by the latency of the rendering engine, be it WPF's dispatcher or WinForms' message loop. Typical scenarios (like this one) get about 10fps of performance. that's 100ms to render a single frame, and you will certainly not see a change in array access times hit this boundary (even a 10x increase, for example), since this is subject to other complications, etc.

Performance and optimization are complicated.
What you are doing is called premature optimization, and it will not help you get good performance. There are many, many other things that would have orders of magnitude more noticeable improvement in rendering performance for ScottPlot.

The technologies behind Memory and Span are not only proven, but being used almost universally in more and more performance critical scenarios. Your personal judgement based on some hidden benchmark you did is not a good way to decide whether to make use of this technology or not.

Jmerk523 Mar 11, 2021
Author

I ran a performance analysis on an actual rendering task, i.e. the performance of SignalPlot.Render.
With some minor changes (to access the Span rather than the PlotData, which is the intention for high perf scenarios), and the outcome was that the amortized time for both the old method and the new method were identical when run side by side.

This supports exactly my statement that overall ScottPlot performance is bounded by many other factors than array random access times.

For further examination, here is the performance breakdown of calling Render:

Function Name	Total CPU [unit, %]	Self CPU [unit, %]	Module
+ ConsoleApp.exe (PID: 52300)	14221 (100.00%)	0 (0.00%)	ConsoleApp.exe
\| - [Native]	14207 (99.90%)	1682 (11.83%)	Multiple modules
\| - ConsoleApp.Program.Main(System.String[])	8829 (62.08%)	3 (0.02%)	ConsoleApp.dll
\| - ScottPlot.Plot.Render(System.Drawing.Bitmap, bool)	8760 (61.60%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Plot.RenderPlottables(System.Drawing.Bitmap, bool)	7007 (49.27%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Plottable.SignalPlotBase<double>.Render(ScottPlot.PlotDimensions, System.Drawing.Bitmap, bool)	7005 (49.26%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Plottable.SignalPlotBase<double>.RenderHighDensity(ScottPlot.PlotDimensions, System.Drawing.Graphics, float64, float64, System.Drawing.Pen)	6999 (49.22%)	11 (0.08%)	ScottPlot.dll
\| - [External Call] System.Drawing.Graphics.DrawLines(System.Drawing.Pen, System.Drawing.PointF[])	4935 (34.70%)	4935 (34.70%)	System.Drawing.Common.dll
\| - [External Code]	3736 (26.27%)	502 (3.53%)	Multiple modules
\| - ScottPlot.Plottable.SignalPlotBase<double>.RenderHighDensity.AnonymousMethod__0(int)	3233 (22.73%)	7 (0.05%)	ScottPlot.dll
\| - ScottPlot.Plottable.SignalPlotBase<double>.CalcInterval(int, float64, float64, ScottPlot.PlotDimensions)	3226 (22.68%)	33 (0.23%)	ScottPlot.dll
\| - ScottPlot.MinMaxSearchStrategies.LinearDoubleOnlyMinMaxStrategy.MinMaxRangeQuery(int, int, ref float64, ref float64)	3190 (22.43%)	2731 (19.20%)	ScottPlot.dll
\| - [External Call] System.Drawing.Graphics.DrawLines(System.Drawing.Pen, System.Drawing.PointF[])	1890 (13.29%)	1890 (13.29%)	System.Drawing.Common.dll
\| - ScottPlot.Plot.RenderBeforePlottables(System.Drawing.Bitmap, bool)	1002 (7.05%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Settings.LayoutAuto()	715 (5.03%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Settings.LayoutAuto(int, int)	712 (5.01%)	2 (0.01%)	ScottPlot.dll
\| - ScottPlot.Renderable.Axis.Render(ScottPlot.PlotDimensions, System.Drawing.Bitmap, bool)	504 (3.54%)	0 (0.00%)	ScottPlot.dll
\| - ScottPlot.Renderable.Axis.RecalculateAxisSize()	485 (3.41%)	1 (0.01%)	ScottPlot.dll
\| - ScottPlot.Drawing.GDI.Font(ScottPlot.Drawing.Font)	478 (3.36%)	0 (0.00%)	ScottPlot.dll
\| - ScottPlot.Drawing.GDI.Font(string, float32, bool)	478 (3.36%)	0 (0.00%)	ScottPlot.dll
\| - ScottPlot.Drawing.InstalledFont.ValidFontName(string)	464 (3.26%)	8 (0.06%)	ScottPlot.dll
\| - ScottPlot.Renderable.AxisTicks.Render(ScottPlot.PlotDimensions, System.Drawing.Bitmap, bool)	451 (3.17%)	1 (0.01%)	ScottPlot.dll
\| - [External Call] System.Drawing.Graphics.DrawLine(System.Drawing.Pen, float32, float32, float32, float32)	310 (2.18%)	310 (2.18%)	System.Drawing.Common.dll
\| - ScottPlot.Renderable.FigureBackground.Render(ScottPlot.PlotDimensions, System.Drawing.Bitmap, bool)	291 (2.05%)	1 (0.01%)	ScottPlot.dll
\| - [External Call] System.ReadOnlySpan<double>.get_Item(int)	262 (1.84%)	262 (1.84%)	System.Private.CoreLib.dll

As you can see, the actual time spent in get_Item for Span is minimal, far outweighed by other tasks such as drawing lines.
Notably, the time spent in just one of the DrawLines API overloads was around 17 times more expensive than the time spent accessing point data through Span.

I'm not going to go in to further details, because performance analysis is complicated. But this is a great example of why premature optimization will have you chasing red herrings. ScottPlot also doesn't need the utter-maximal level of optimization and performance anyway - there are very clearly more useful priorities such as API flexibility and variety of charts, etc.

Overall, the render time for 1million points came down to about 27ms in both methods. Neither WPF nor WinForms will give you this much framerate by a long shot, so a perf hit would not even matter in terms of actual user experience anyway.

So, can we please move away from this unnecessary exercise in how to benchmark in a useful way?
There are so many things you can accomplish if you don't fall into the trap of optimizing things when you don't yet have a performance problem.

bclehmann Mar 12, 2021

Not sure why you expected them to benchmark different code than what you wrote. And I accept that there are other places which can be optimized. However this discussion was started on the basis of performance improvement. I recognize there are other justifications for this change but I don't think it's unreasonable to focus on performance when it was the focus of this proposition from the start.

There is discussion around supporting not just arrays as parameters for plottables, currently focusing on making live data plotting easier. I think Span<T> and Memory<T> ought to be considered in this context.

StendProg Mar 12, 2021

@Jmerk523

You haven't posted any information on what you actually benchmarked and how you ran your tests.

I tried to indicate all the essential points of the testing, but it came out briefly, I am ready to try to describe if there are difficulties:
I took the code from the link from the first post and called it PlotData.
I took the latest version at that time from ScottPlot/master and called it double[].
I ran the project from under the debugger and named this mode Debug mode.
I made a release build for the .NET Framework 4.8 and ran the resulting exe file separately and called it standalone exe .NET 4.8 (Release).
I made a release build for the .NET 5.0 and ran the resulting exe file separately and called it standalone exe .NET 5.0 (Release).
As a test application, I used the existing WPFDemo project for all cases.
In all cases, I maximized the app to full screen with a resolution of 1920x1200.
I used CookbookDemo\Plottable: SignalPlot\Signal Plot QuickStart changing only the number of points from 1 million to 50 million and called it RandomWalk Signal 50M Points.
I used CookbookDemo\Plottable: SignalXY\SignalXY QuickStart changing only the number of points from 1 million to 50 million and called it RandomWalk SignalXY 50M points.
To measure time, I used the built-in benchmark that can be turned on by double-clicking on the graph. (After each render, it displays the elapsed time in milliseconds. To measure time, it uses a simple StopWatch timer.)
I pan the ploе within a small range using the held down the left mouse button. At the same moment, trying to calculate the average value by eye among the rapidly changing numbers displayed in the benchmark.
On average, I spent 5-15 seconds for each measurement.

I entered the measured results into a table, which I hope has now become very clear.

I hope now I have described the measurement technique in more detail, if I missed something important, please indicate.
Now you can repeat my measurements and indicate exactly what I did wrong.
If you have a more optimized version, then I would be curious to look at testing using a similar technique.

P.S. I am happy to continue communication, but please come on in a more productive manner, this is not a competition, but a teamwork.

StendProg Mar 12, 2021

For further examination, here is the performance breakdown of calling Render:

It is not entirely fair to point out an insufficiently complete description of the details, and omit important points yourself.
As far as I understand, you used a Signal with 1 million points (RandomWalk?), The release version. NET 5, in some code different from the code given in 1 message?
And you came to the conclusion that the memory access time is 17 times less than the time spent on GDI calls.
I have no objection to this, but let's see what happens if you increase the number of points from 1 million to 50 million i.e. 50 times. The memory access time will also increase proportionally by 50 times. But the time spent on GDI call will remain exactly the same. The signal works like that.
Thus, the time spent on accessing memory will be not 17 times less, but 3 times more. Which, according to very rough estimates, is quite correlated with the results of my benchmark.

In addition, ScottPlot has a positive expirience in proof-of-concept of using hardware accelerated libraries instead of GDI.
The support of these libraries is repeatedly reflected in the pre-term plans by @swharden , and the project is continuously moving towards simplifying this transition, I think this moment will come sooner or later. In this case, GDI overhead will decrease by 10-20 times, and the memory access time will come to the fore even more.

I understand that for many, 1 million points may seem like a prohibitive number. But as I have already indicated, displaying signals from 5-20 million points is almost an everyday task for me (sometimes 50 million, but to be honest, very rarely). Yes, of course, this is raw data and it can be easily greatly reduced to reasonable limits that even a standard Chart component will be able to handle. But ScottPlot allows not to do this work, but to display everything at once, and I really do not want to lose this opportunity.
I understand very well that my vote is only one of many, I just wanted it to be taken into account too.

Jmerk523 · 2021-03-12T21:25:40Z

Jmerk523
Mar 12, 2021
Author

@bclehmann Please re-read the original post. Performance is only included as a statement of non-degradation. The focus is very significantly (and I keep repeating this) on flexibility of input data. The fact that the lack of flexibility causes a performance issue absolutely does not mean that the proposed changes are performance-centric - They are flexibility centric. In fact, performance is mentioned only once as the reason for flexibility being needed - not once did I propose some change to improve ScottPlot's performance.

@StendProg You are making all sorts of false analogies here. Let us look at some.

First off, I showed you exactly the relevant portions of my benchmark. It includes the exact API I was running and analyzing - no more, no less. We can certainly apply further and further edge-cases as you move the goalposts so that I can show you the next way to address an obscure situation. If I show that 20 million points still outperform any graphics drawing library in existence, you will raise the number to 100 million and tell me that some new library might be created.
I will be completely clear: the runtime was the same for ALL cases, all frameworks, etc. because it was dominated by GDI and GDI is a native API that is the same across all of them.

I will start off by saying, that the highest performance scenarios running .NET are moving from arrays to Span/Memory. The question must thus be why should ScottPlot have more stringent performance than any other high performance scenario? And also, is that even a goal of ScottPlot? I would assuredly say that it is more fair to assume that a balance of performance and flexibility is desired for a charting library. We're not running compute workloads on million-dollar hardware here - a bit of inefficiency is not the end of the world.

A great example that should really set the stage is ImageSharp. It is an image processing library for .NET which does raw-pixel operations.
Normally this is done with direct memory access through pointers. That is even faster than managed arrays, and you cannot argue otherwise. But, this library uses Memory/Span, and achieves wonderful performance (indeed, only acceptable performance would really be necessary).
Thus, since ScottPlot is drawing charts, no matter how the plotting is done, you are lower-bounded by the drawing of the image, which has been established to be acceptably done through Memory/Span, and thus is not a problem for ScottPlot.

Next, we must ask, What does ScottPlot really do? Right now, there are many limitations, several of which would be removed with Memory/Span.

Your argument keeps coming back to large datasets with millions of points of data. Is ScottPlot a data processing/compute library?
No, it assuredly isn't.
However, it does do some data processing - whatever is necessary in order to provide a reasonable display of the points.
Can you display 50 million points simultaneously? Absolutely not.
Therefore you must do preprocessing on them. How this is done is irrelevant to the argument - the preprocessing takes some time, and must be done every time the data changes.
If the user is generating 50 million points new of data on every chart update, this must be repeated - and in such a scenario, we are essentially developing a general compute workload and I would also say that not only is this inefficient in ScottPlot today, but it's assuredly outside the scope in general. This is a plotting library. It should do the minimal amount of work necessary to plot data. That includes reasonable performance on slow machines, and once again, a tradeoff for flexibility is incredibly desirable.
Furthermore, a user generating this many datapoints would be best suited to do their own preprocessing, since only they know the contents of the data well enough to actually make any good attempt at optimizing it. This can not and should not be a concern of ScottPlot.

So we have examined several extremely unlikely scenarios - you have provided no normal scenario to consider. In fact, in all normal scenarios, there is no noticeable change anyway, so you are focusing on unlikely events in absurd situations that should be far outside the scope of this project.

To complete the thought, though - If we have 50 million data points, and they are not absurdly updated continuously, the answer to large processing requirements is never to improve performance through hardware or compiler optimizations.
The answer is to simplify your problem and identify how you can reduce the processing requirements through more efficient algorithms.
The data set could be preprocessed in such a way that minimal access is needed to display any particular portion. This is the most common way to improve performance for static datasets, indeed we call it caching or memoization.
It is also minimally dependent on low-level hardware and pointer access performance - it is an accepted upfront cost by people with 50 million point datasets. It is also preprocessing they will likely do on their own.
ScottPlot's only scope in this task depends on the visual navigation portions, which often involve windowing and other reduction techniques (noticeably facilitated by Memory/Span).

I find it very unreasonable when someone focuses on edge cases and poor understanding of out-of-scope scenarios that are not even currently supported, as a blocker to forward progress. It causes others to spend energy discussing things that will either never happen, or that should be addressed differently, it provides no benefit (since you're not actually developing a high performance data analysis algorithm), and is incredibly pedantic (not in a good way) - everything has a performance penalty.
This is especially true when the forward progress is explicitly in scope: a general purpose charting library should be flexible in its acceptance of data.
Real-time processing of millions of points, then this processing should be done independently as a separate project, and charting should be limited to the tasks necessary to achieve its purpose, none of which require maximization of low-level performance.
After all, I have put forth a much more general purpose scenario - data in multiple formats - which you would like to block because of your rare scenario, under the assumption that there is no way to address the rare scenario other than squeezing every drop of performance out of arrays.

This is my conclusion: There are better ways to improve performance when displaying 20 million data points than making sure the compiler optimizes array access. Please don't pretend otherwise - this is not an opinion, it is a well known fact in computing, and really needs to be a separate discussion.

Premature optimization is a very very well-known phenomenon, and you have dug yourself into a very deep optimization hole.
No matter what performance someone is unnecessarily asked to demonstrate, it's not good enough for you because it's not identical to the status quo.

I am done talking about performance and optimization.
Keeping these topics in mind is surely a good thing.
Focusing on them incessantly is unproductive for everyone.

Let's talk about general purpose, non-specialized features and techniques that will help general purpose users use a general purpose library in a general purpose way. I promise you, any tiny edge case you can think of can be addressed independently as its own problem to be solved. This isn't high performance computing. Please don't treat it as such.

1 reply

bclehmann Mar 12, 2021

We have already recognized that flexibility in the API is important, this is a recognized issue and Memory<T> and Span<T> will likely be part of any solution on that (although Span<T> may be hard to work with as it's a ref struct).

I have no problem with ignoring performance. The 50 million points comparison is useful because that is where the data structure will be relevant from a performance perspective. A small set of 10 000 would serve no purpose for a data structure benchmark because there would be virtually no difference. In addition, 50 million points is not an unrealistic use case for this library, and large datasets are absolutely within the scope of the library. The reason SignalPlot, SignalPlotXY, SignalPlotConst, etc exist is because a scatter plot is incapable of acceptable performance for interactive rendering of 50 million points or more. Are large data sets the main focus of this library? No, but they are an important consideration and there is no meaningful performance difference when you're talking about a few thousand points or less regardless.

I don't believe the accusation of prematurely optimizing is appropriate, especially given that the performance-centric aspects of the library are virtually completely contained within SignalPlotBase and its child classes (and Heatmap code). Elsewhere performance is of course a good thing but rarely pursued for performance's sake alone. I am mildly curious how you got several garbage collections per second, but that's not relevant to this discussion.

Moving forward, I believe this idea has merit far beyond performance, which seems to be what you're saying as well. I am happy to ignore performance considerations because even if something is faster/slower it still makes sense from a flexibility perspective. A user of the library can make that choice for themselves; you correctly point out that for many use cases the performance is inconsequential.

Flexibility to me means adding features that have a good tradeoff between being likely to be used and worth the time to implement. Memory<T> and Span<T> are both likely to be used as they are useful for interop with other code, especially native code. This is a common use case for ScottPlot, as it is common to read in data from an ADC, which normally involves native code. Memory<T> is also not difficult to implement as currently the only strict requirement in any of our code is that it supports random access (i.e. no linked lists). There are some plottables that currently use LINQ, and Memory<T> does not support LINQ without memory marshalling. However LINQ can be rewritten, and this is often faster as LINQ is quite allocation heavy. As for Span<T>, unfortunately the fact that it's a ref struct could make things very difficult indeed.

Neither Memory<T> nor Span<T> implement an interface like IList<T>, which would have been very convenient, however one could create a wrapper class that did. If this wrapper class required explicit conversion or a function call I wouldn't see any benefit over simply calling ToArray() besides avoiding a potentially costly copy. If it were implicitly convertible it would be quite convenient, although there are plenty of thorns in that direction. And if it weren't convertible at all I think that would require a lot of code duplication (not to mention remove any benefit of creating a wrapper class).

Jmerk523 · 2021-03-12T21:54:15Z

Jmerk523
Mar 12, 2021
Author

As a final afterthought, here are the performance results for 100 million points:

Function Name	Total CPU [unit, %]	Self CPU [unit, %]	Module	Category
+ ConsoleApp.exe (PID: 52984)	359421 (100.00%)	0 (0.00%)	Multiple modules
\| + [Native]	359148 (99.92%)	5022 (1.40%)	Multiple modules	Other \| UI \| IO \| Networking \| JIT \| Graphics \| File System \| Kernel \| Security
\|\| + [External Code]	341302 (94.96%)	791 (0.22%)	Multiple modules	IO \| JIT \| Kernel
\|\|\| + ScottPlot.Plottable.SignalPlotBase<double>.RenderHighDensity.AnonymousMethod__0(int)	340511 (94.74%)	5 (0.00%)	ScottPlot.dll	JIT \| Kernel
\|\|\|\| + ScottPlot.Plottable.SignalPlotBase<double>.CalcInterval(int, float64, float64, ScottPlot.PlotDimensions)	340506 (94.74%)	131 (0.04%)	ScottPlot.dll	Kernel
\|\|\|\|\| + ScottPlot.MinMaxSearchStrategies.LinearDoubleOnlyMinMaxStrategy.MinMaxRangeQuery(int, int, ref float64, ref float64)	340328 (94.69%)	292559 (81.40%)	ScottPlot.dll	Kernel
\|\|\|\|\|\| - [External Call] System.ReadOnlySpan<double>.get_Item(int)	33039 (9.19%)	33039 (9.19%)	System.Private.CoreLib.dll	Kernel

Indeed, as you mentioned, span access is now more costly than GDI drawing calls.
However, LinearDoubleOnlyMinMaxStrategy is 9 times more expensive with an overall time of only 80ms.
In edge cases within a general purpose library, the answer is to develop an edge-solution to improve the processing. My data did not change between iterations, so an easy 99% optimization would have been to store the preprocessed state - this 80ms should only be necessary once, and after that performance is definitely not Span related. Please don't move the goalposts another time.
Also, I noticed that in fully optimized build, the JIT compiler removed the span completely, so the span calls were essentially direct memory access. Please note that the JIT compiler is part of the runtime, so it doesn't matter much if you use new or old .NET - even in 4.6.1.

2 replies

StendProg Mar 13, 2021

My data did not change between iterations, so an easy 99% optimization would have been to store the preprocessed state - this 80ms should only be necessary once, and after that performance is definitely not Span related.

There is already a solution called PlottableSignalConst. Everything is a little more complicated there and access to memory is still needed, but yes, the key is that it is cached 1 time and repeated requests go for O (log (n)) instead of O (n).

As for the support of old frameworks, in my benchmark I got very characteristic numbers that cannot be attributed to the measurement error. .NET 4.8 shows itself at times worse than .NET 5.0 on your solution, and the debug mode just shows unacceptable performance (and it is very important for me, I personally spend my main time in debug mode)

Jmerk523 Mar 14, 2021
Author

This is a pretty silly (and selfish) statement to make.
You will find that most libraries have significantly worse performance in debug mode. Orders of magnitude worse. And this is especially true for performance-intensive applications.
For example - clang (a compiler) runs more than 100x slower in debug mode. A compilation that takes ~5 seconds can take 5-6 minutes in debug mode.
In fact, moderately large projects often forefeit debug mode altogether - if you can't debug a release-mode experience, then you can't help a customer. They're not using debug mode either.

I do certainly agree that better performance in debug mode is a better experience. But I would certainly not forego improvements in favor of simply keeping development runtime down because I am unwilling to find ways to work around impacts. I would instead find ways to improve the development experience despite the performance impact, and learn how to make changes to my development strategies to better suit the environment I am working in.

In this case it is moot anyway though - if a 1nanosecond operation becoming 10nanoseconds impacts your development experience in a noticeable way, you are doing something gravely wrong.
Save yourself a lot of trouble and don't run insane workloads of 100 millions of points during development - save these for an optimized experience.

bclehmann · 2021-03-12T22:57:48Z

bclehmann
Mar 12, 2021

Unfortunately, while I do accept there is a flexibility benefit, I question how much value this would add over calling foo.ToArray(). I know that involves a copy, but if one considers MemoryMarshal.TryGetArray which does not require a copy (although it does require a ReadOnlyMemory<T>) I think most likely use cases are covered. Note that TryGetArray returns an ArraySegment, and converting that to an array requires a copy, but ArraySegment<T> implements IList<T>, so perhaps this is a non-issue.

If we look at this we have the main benefits of Memory<T> and Span<T> already: efficient slicing and easy interop. Admittedly this is less convenient than supporting Memory<T> directly, but slicing is rare for ScottPlot (and already supported through min/max renderIndex). As for interop, it depends on the usecase, but for example with an ADC it makes more sense to me to read the data through an IntPtr (or indeed a pinned Span<T>) and copy that in rather than maintain the whole thing in native code. Since you're already copying the data from unmanaged to managed code I don't know why the user would have cause to use Memory<T> or Span<T>, even when reading a large buffer it needs to be appended to the rest of the data anyways, which can easily be an array.

I would be interested to see an application that needs the interop of a very large amount of data (so that the copying cost is relevant), but where additions are so frequent that the copying cost cannot be written off as a one-time thing. Even then, I think TryGetArray does the job.

3 replies

StendProg Mar 13, 2021

I completely agree that if the solution is free, then there is nothing to think about, but if it costs even a little (and my measurements have not been completely refuted) then a good question arises: what do we get, what do we lose, and what is the resulting amount.

Jmerk523 Mar 14, 2021
Author

@StendProg I will get back to this in a few days, but here's something for you to consider:

Fact: Under the correct use conditions (optimizations, correct API calls, etc), operations on Span are actually close to 6% faster than double[].
And no, I am not joking.
After investigation, I believe I also understand exactly why.
Can you figure it out? If the answer is no, then you probably shouldn't be making performance related statements - this is simply a task too complex (and completely unnecessary) to perform every time you consider using a new technology or adding a new library.

It's also essentially irrelevant for all the reasons I have put forth and more - you usually (and effectively always in general purpose libraries like this - especially your 20-million points scenario) get much better gains from a better algorithm than from faster memory access. As far as free goes - I still fully assert that any performance impact, even if there were one, is essentially irrelevant for exactly this reason. Something as simple as putting a window on your data (which you can do trivially with Memory/Span, by the way - not that this is a hard problem, just another convenient benefit) will provide orders of magnitudes more improvement.

But this just goes to show that performance is tricky and unpredictable business, another reason why it is nonsensical to make statements about benchmarks and performance.

The only optimization/performance analysis that should regularly be done are 'for the 99% case, does this currently and immediately (with no future speculations) impact the user in any noticeable way?'. For any reasonable (non edge case) amount of data, you could have a 100x performance hit to array access and still not notice any changes in responsiveness - computers are very fast.

Jmerk523 Mar 14, 2021
Author

Additionally, if Memory/Span are not necessary, since everyone can implement their own slicing, then why do they exist at all? Surely, we could all just use ArraySegment, or write our own wrappers.
The fact that you do not understand the benefit does not mean that there isn't one.
Nor does it mean that the benefit is insignificant.

The fact is, high performance scenarios use these these mechanisms because arrays are too expensive, not in terms of raw performance, but in terms of many other factors, including allocations, memory locality, interop, etc.
So, scenarios exist (and are becoming more and more common) that preclude the use of heavy allocations, arrays, etc.
It doesn't really matter, in fact - a user comes to you and says 'I have reasons whereby I would like to use this data structure which (whether you like it or not) is becoming extremely commonplace. You of course can say 'no, I will not support it. You must use arrays or you may not use ScottPlot'.
Or, you could look into the actual realistic usability of this new factor. And when you conclude to the contrary of all new high performance scenarios that it is too costly for you, and that you can already do everything it provides and it is therefore useless, then it only really shows that either don't understand how things work, or you don't care.
Obsessing over the cost of array accesses falls into the former category, and rejecting someone else's solution because you don't understand it, falls into the latter.

It shouldn't matter to ScottPlot, because ScottPlot should not be doing general purpose computing workloads - it should do the minimal amount of work to display an accurate representation of the data. No person can take in 50 million data points at once. ScottPlot should therefore not need to do so either - if slicing works the way you say it does, then any argument involving huge amounts of data is void - you should have already sliced it into something the user can actually take in.

These are not problems that ScottPlot should be solving.
If I have a straight line composed of 100 million data points, it should not be ScottPlot's job to determine that this is, in fact, a straight line. I should get poor performance until I reduce the amount of data. Perhaps by slicing it. Or by any of a myriad of data reduction methods.
The only reason a chart should care about displaying a huge number of points should be small windows, and large summaries. Neither of these involves very much computing.

I would be happy to discuss solutions to these problems, as I can see many ways to get better performance without examining all 50 million data points - many methods even from simple data science already have solutions.

StendProg · 2021-03-13T08:03:27Z

StendProg
Mar 13, 2021

Can you display 50 million points simultaneously? Absolutely not.

The fact of the matter is that I can do it with ScottPlot. Yes, of course, I will not see all the points at the same time, but I can interactively approach the area of interest and examine each point separately, then another area (preprocessing described by you, but interactively using the mouse and without a single line of code).

we are essentially developing a general compute workload and I would also say that not only is this inefficient in ScottPlot today, but it's assuredly outside the scope in general. This is a plotting library. It should do the minimal amount of work necessary to plot data. That includes reasonable performance on slow machines, and once again, a tradeoff for flexibility is incredibly desirable.
Furthermore, a user generating this many datapoints would be best suited to do their own preprocessing, since only they know the contents of the data well enough to actually make any good attempt at optimizing it. This can not and should not be a concern of ScottPlot.

I think the last word is still for the author of this library @swharden , He, of course, listens very well and adjusts to the opinions of others until it crosses out the main concept of the project (his vision).
I can briefly describe how I understand main concept. The key and main feature of ScottPlot is that it practically does not use any caching. Each time the Plot is built from scratch, and there is no any difference the 1st time or the 1000th. It certainly has a huge overhead, but it runs at an acceptable speed and gives one very interesting feature. Changing data costs nothing. You can completely change the original data and in the next frame you will already see these changes. And inside the library, too, nothing special needs to be changed for the update procedure (update caches, re-create objects, etc.).
What is good and what is bad about this approach can be debated endlessly, but this is the main feature of this library. If you change it, it will be a different library...

As for this discussion, it seems to me that it has already gone a long way.
Initially, there was a proposal to carry out optimization to avoid the need for initial data transformation. And it was stated that the proposed solution allows you to solve this problem.
Now the thread of the discussion has gone aside that optimizations are not so important, if we want to change the topic and go towards a more convenient API, then for this it is better to create a separate discussion.

To summarize why I am against the proposed changes:

To display data now I need to convert the source data into a double array and only then transfer it to the library. This operation begs for optimization by itself, but to be honest it is not that expensive and is performed once. In addition, albeit not always, but quite often, the user has the opportunity to generate the initial data in the required ScottPlot format without any problems, since he himself controls the data generation process.
With the proposed solution, I will be able to transfer data without converting directly to the library (it will be necessary to create some kind of wrapper, but I think this is an easy task) and save a few milliseconds (it is quite possible that even microseconds, but I'm too lazy to take measurements). But then, with interactive rendering, I will lose even if only 1 ms every time, but this rendering will be called hundreds or even thousands of times.

The proposed solution forces you to make a choice at the library level, there is no way to save both options and provide both options for the user to choose (Retaining the original performance of the first option.).

Based on these considerations, I choose the first option, especially since it has already been implemented.

1 reply

Jmerk523 Mar 14, 2021
Author

The key and main feature of ScottPlot is that it practically does not use any caching. Each time the Plot is built from scratch, and there is no any difference the 1st time or the 1000th. It certainly has a huge overhead, but it runs at an acceptable speed and gives one very interesting feature. Changing data costs nothing. You can completely change the original data and in the next frame you will already see these changes. And inside the library, too, nothing special needs to be changed for the update procedure (update caches, re-create objects, etc.).
What is good and what is bad about this approach can be debated endlessly, but this is the main feature of this library. If you change it, it will be a different library...

I surely hope you are not serious in your implication that there is no way to do what the library currently does, without also having other mechanisms that function differently, right...?
There are more ways to redraw changed data than to do so unconditionally. Like, this is so incredibly basic I'm not even sure how to respond.
You can literally do something as simple as having a flag to enable or disable redrawing/caching. Adding such a flag most certainly does not 'make it into a different library'...

It also indicates that you have either not read my code changes, or do not understand them.
In both cases, you should not really be making any judgements of whether they are 'acceptable' or 'unacceptable'.

The entire purpose of the PlotData class that I introduced is to preserve all existing functionality exactly as it is today. The library loses absolutely nothing - this extends to all aspects I could consider, including things like the ability of the user to use double[] arrays without any extra steps. It's all done automatically.
And I'm not sure what you're even imagining when it comes to 'changing the data costs nothing'.
Any part of that statement that was true before remains true after. Changing an element of a Memory/Span is just as visible as modifying the contents of an array.
If I can somehow break the behavior of ScottPlot with a Memory/Span, I can break it exactly the same way with an array.

Perhaps you are not understanding what is actually going on here: Memory/Span are constructs that abstract away the idea of a contiguous segment of memory.
An array is (in C#) a contiguous section of memory.
Therefore, this new abstraction preserves arrays exactly as they are - it's just an abstraction. It's not a fundamental change in functionality.
It does, however, increase flexibility (another purpose of abstraction!) by allowing a 'contiguous section of memory' to comprise of more ideas than just arrays.
This means, that if someone doesn't want to use arrays (for whatever reason, it's their choice!), then they have more flexibility to work with this decision, than having requirements imposed on them by less abstracted concepts.

One of my many personal use cases is that with Memory/Span, I can give data to ScottPlot that comes from an unmanaged portion of memory, without having to convert it to an array. But why would I want do this, I could just copy the data into an array, right?
Correct, I could. But, I don't want to. I shouldn't have to give you a reason - I have provided changes which allow me to do it, and preserve all existing behavior with changes only to the way things 'appear' from the outside.

But I will tell you why.
The same reason that GDI is the performance limitation of ScottPlot.
The same reason that any other graphics API you use will STILL be the performance limitation of ScottPlot. (I have investigated all the major ones, perhaps there is a magical new one that will be the exception, but this is ridiculous reasoning).
The fundamental reason, is that drawing is done in native code.
Anything you want to draw, must make a call into managed code to do that drawing.
And herein lies the massive inefficiency. Marshalling data between managed and native code basically boils down to copying data to and from arrays - exactly the thing you are telling me you think is cheap or free or something I should accept without argument.
This exact action is what makes GDI slow. Copying the points to be drawn into native memory.
SkiaSharp has the same problem, if not even worse - I have found it lacks many APIs that allow data to be copied in bulk, so not only does the data need to be copied, but a marshalling P/Invoke call must be made repetitively, which is significantly more overhead.
The story is the same for other libraries.

The main way to improve performance is to store the data on the native heap to begin with. This means, the data never even enters an array, and any time you tell me that I must use an array, you are telling me to do the thing that is the LEAST efficient part of ScottPlot.

Memory/Span allow me to treat this memory from the native heap exactly as if it were contained in an array.
While simultaneously allowing me to do the SAME thing with arrays.

If any of this is surprising to you, then it has been completely inappropriate of you to assert anything about the performance implications or (nonexistent) behavior changes introduced by my suggestion.

If it is not surprising to you, well, then this contradicts the fact that you have been telling me that arrays are in fact perfectly acceptable, and that my suggestion somehow changes some behavior.

If you still don't understand, then that's fine. But please don't make statements about what's good/bad about something you don't understand until you have put in the work to learn how things actually work and why they are accepted by the community as a whole, and why they were necessary to introduce in the first place (e.g. why arrays and slicing is not sufficient).

If you have questions about how things work, or concerns about feature breaks, then I am happy to explain why those are either not the case, or how they can be remediated. That's what this thread is supposed to be here for.
Instead I have been bombarded with nothing but claims of minute performance differences (nanoseconds are nanoseconds) demonstrated by vague "benchmarks" which don't actually indicate anything other than yes, different code will have different performance characteristics, or else statements of non-necessity (if it were not necessary, I would have been doing it a different way - but instead of asking whether I can do x or y instead of z, I am simply being told that z is wrong or unnecessary), and false assertions that things work incorrectly or cause breaks, instead of seeking to understand why you think they are so while I think they are not.

This thread isn't a pull request yet for a reason. It's neither a full change, nor near complete.
But I can tell you with confidence - every statement I've seen made how the presented code works all point to a severe lack of understanding how they work, accompanied by a desire to shut things down rather than move forward, regardless of whether, ultimately, this becomes a contribution or not.

The fact of the matter is that I am making the changes for me, and not for you. So they will be made whether you like them or not. But what you can do is either reject them in an attempt to make them go for reasons that appear to mostly be lack of understanding, or you can work with me to understand why these changes have been necessary for me (and I am very sure that they are necessary), and how they can be adapted to bring the flexibility that I have needed to others who may also need them. This isn't a proposition. It exists, and the most productive and helpful thing you can do is to understand and help me understand if and where gaps exist, and if/how they can be filled. This should be a shared learning experience all around, not a quick 'vote the change off the island', which is the route you have so far chosen to take.

swharden · 2021-03-16T03:54:23Z

swharden
Mar 16, 2021
Maintainer

A lot of messages have been posted in this discussion over the last few days. The topic of how ScottPlot should manage plot data in memory is highly advanced, complex, and nuanced. This discussion is worth preserving! I'm appreciative to each of you for sharing your thoughts, code, tests, benchmarks, and ideas for how to make this library better.

Some of the messages in this discussion are more adversarial than typical for this community. The opening of ScottPlot's Code of Conduct communicates our desire to foster an open and welcoming environment for contributors! I greatly value and wish to encourage open discourse and disagreement, but in response to the escalating intensity of the messages over the last several days I've decided to lock this discussion. Everyone has been given an opportunity to share their opinion, agreement has been reached on some topics but not on all (likely due to differences in peoples' specific use cases and personal experiences), and I don't think this group will reach improved consensus through additional posts on this page. If someone wants to add or modify something on this final post they are welcome to message me directly: [email protected]

I'll briefly summarize this discussion including advantages and concerns associated with it, then indicate what action I intend to take. I'm going to avoid technical discussion - we're all programmers, and with some time and effort we can figure out the details. I'll keep this summary at a high level:

Proposed Change

Create a data management class to handle data instead of working with double[] arrays through this library
The data class could manage data in memory using Memory and Span which are relatively new .NET classes that can represent an arbitrary contiguous regions of memory and have fast read/write access
The goal of this change is not to improve performance - it is to make it easier to plot different types of data. However performance is unlikely to be meaningfully impacted (especially when considering how slow rendering can be).

Advantages

Easier to plot data not in already in memory as double[]
Easier to plot unmanaged data
Possible support for streaming/growing data

Concerns

I'm just listing the concerns here, not addressing them. Detailed discussions of each of these topics can be found above.

Performance
- What is the actual performance cost of this extra layer of abstraction? Specifically, how does the performance compare using this memory-managing data class vs. working with traditional arrays?
- Does data indexing performance even matter considering GDI and drawing is so slow? Perhaps this concern at this time should be disregarded as premature optimization.
- Are benchmarking tests measuring the right thing? Benchmarking actual plotting tasks (e.g., signal plots with their frequent traversing) is probably the best test to use when evaluating these changes. Be aware to separate rendering performance (involving data access and drawing) from delays associated with the display (involving the WPF or WinForms interactive loop).
- Conversion to double[] may not actually be that expensive (especially compared to rendering).
- Do answers to all these points change depending on the size of the dataset? 1K vs 1M vs 100M points?
Backwards compatibility
- Works with .NET Core 2
- Works with .NET Framework 4.6.1 with the System.Memory package
Complexity and Discoverability
- Consider ScottPlot seeks to be inviting for users new to C#
- This is more intuitive: void MyFunc(double[] data)
- This is less intuitive: void MyFunc(in PlotData<double> data)
- It should be considered that support for alternative data formats and plotting data from unmanaged memory are features that perhaps only a small fraction of the community may be interested in, so this is an instance where a small number of users desire a public library to implement a change to support their specific use case.

Notes

One of ScottPlot's primary design goals is to make it extremely easy to plot data with tens of millions of points (e.g., a WAV file) and smoothly pan/zoom with the mouse at high frame rates. Is ScottPlot a data processing/compute library? No, but real-time interactivity with this type of data is what ScottPlot strives to be good at. This goal is balanced with the competing goals of being dependency-free and having an API so simple that it can be used by people new to the language. Together these features set ScottPlot apart from most other free charting libraries.
Plotting data from complex memory sources is likely only desired for certain plot types (probably scatter plot and signal plot), so a partial implementation could be considered as an alternative to large-scale changes to the code base.
The performance cost of computing vs. rendering may be disrupted by future plans which seek to replace GDI+ with a faster rendering system (e.g., SkiaSharp with OpenGL). A slight relative performance hit in data traversal today might translate to a much more significant performance hit in the future when the render system is optimized. This is worth considering, but is hard to evaluate at this time because we are talking about prematurely optimizing a hypothetical system that doesn't even exist yet.

Scott's Decision / Next Action

I'm not going to pursue this change at this time.
- This doesn't mean I won't consider it again - it just means that I haven't made a final decision, and regardless I don't think this is the best time to implement such a change.
- When the time comes to revisit this topic, I'll open a new issue, discussion, or PR to discuss it
There are several high priority tasks I think merit attention above this task.
- These tasks include improved documentation, more downloadable code examples and quickstart guides, improved support for mouse interactivity and selecting points, an async render system, an improved interactive cookbook, improving financial charting, etc.
- These tasks are prioritized because they are likely to make a larger impact for a larger fraction of users than the change described in this Discussion.

Resources

@Jmerk523's ScottPlot fork: https://github.com/Jmerk523/scottplot
@Jmerk523's PlotData.cs (functional implementation):
https://github.com/Jmerk523/ScottPlot/blob/67e8e32775a59f3453fff1323f6383e48f4c75b5/src/ScottPlot/PlotData.cs#L1-L633
Span vs Array: Performance characteristics on par with arrays
Memory- and span-related types
Use SIMD-accelerated numeric types
Related Discussions/Issues: #526, #815, #841

0 replies

Use Memory<T> and Span<T> for plot data (where possible) #766

Uh oh!

Uh oh!

Replies: 8 comments · 24 replies

Uh oh!

Jmerk523 Feb 10, 2021 Author

Uh oh!

Jmerk523 Feb 10, 2021 Author

Uh oh!

Jmerk523 Feb 10, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Feb 10, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Feb 11, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Feb 12, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Mar 11, 2021 Author

Uh oh!

Uh oh!

Jmerk523 Mar 11, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Mar 12, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Jmerk523 Mar 12, 2021 Author

Uh oh!

Uh oh!

Uh oh!

Replies: 8 comments 24 replies

Jmerk523
Feb 10, 2021
Author

Jmerk523 Feb 10, 2021
Author

Jmerk523 Feb 10, 2021
Author

Jmerk523 Feb 10, 2021
Author

Jmerk523 Feb 11, 2021
Author

Jmerk523 Feb 12, 2021
Author

Jmerk523 Mar 11, 2021
Author

Jmerk523 Mar 11, 2021
Author

Jmerk523
Mar 12, 2021
Author

Jmerk523
Mar 12, 2021
Author