Skip to content

Datagen docs#903

Merged
swharden merged 4 commits intoScottPlot:masterfrom
bclehmann:datagen-docs
Apr 10, 2021
Merged

Datagen docs#903
swharden merged 4 commits intoScottPlot:masterfrom
bclehmann:datagen-docs

Conversation

@bclehmann
Copy link
Member

New Contributors:
please review CONTRIBUTING.md

Purpose:
#902

New Functionality:
N/A

@bclehmann
Copy link
Member Author

I've documented most of it but I'm going to come back to a few bits that I missed. Note that it fails test intentionally because I added a test to ensure that all DataGen methods are documented.

I'm not entirely sure what this function does:

public static double[] SinSweep(int pointCount, double density = 50.0)

A quick google says it's some audio thingy but I elected not to document something I'm not sure about.

It may also be a good idea to document the Electrocardiogram class, but that's not currently part of this PR.

@bclehmann
Copy link
Member Author

bclehmann commented Mar 25, 2021

Aside from documenting the class, I have a few questions about some methods:

  • Why does NoisyLinear use a normal distribution but NoisySin uses a uniform distribution
  • Why does an overload which takes double[] xs, double mult = 1 exist for sine but not cosine and tangent (I added this as it does not break backwards compatibility)
  • Why is there an overload for RandomStockPrices which takes a TimeSpan as well as deltaMinutes and deltaDays
    • Why does the deltaMinutes overload have a flag to disable DateTime axis when an overload for sequential axes already exists?

@swharden
Copy link
Member

Hi @bclehmann,

Thanks so much for this PR! It's shaping up really well. I'll add a couple comments for you to consider along the way:

  • I like the use of tests to ensure public methods are documented. One day I hope to enable CS1591 at the project level, so working toward that really helps. I just added this to the triage page

  • You may have noticed, but a triaged goal is to create a new Generate module to replace the DataGen module. I'll probably never delete DataGen, but it was created a long time ago and the argument names and orders weren't very carefully considered. While you're working on DataGen, perhaps consider function signatures you would like to see in a new Generate module. There's an opportunity to have a fresh slate there, and might be interesting to explore some options.

@swharden
Copy link
Member

Aside from documenting the class, I have a few questions about some methods

The answer to all 3 questions, perhaps unsatisfyingly, is that there is no good reason/justification for those decisions. I suspect they originated in a test or a WinForms demo application and were copy/pasted into the DataGen module when a cookbook example was made. These types of inconsistencies are all over the place, and are one of the reasons I think a new, carefully-crafted, conservative Generate module with better argument names and overloads may be a good idea.

If possible, I'd like to take as many hints from the numpy API as possible. They have most of these functions already, and the documentation is pretty good. We can probably recycle a lot of their design and documentation. E.g., https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

@bclehmann
Copy link
Member Author

I've finished most of it, here's what I have left:

  • SinSweep and SinSweepByte (I don't really know what they do)
  • Electrocardiogram class (not really part of this PR, not to mention I haven't taken a Bio course in 2 years)

In terms of what I'd like if we were to make a new class, here's my thinking:

  • Consecutive or Range, they do essentially the same thing. Perhaps a mix of the two.
  • Trig functions
  • The Random functions. Currently we have normal and uniform distributions. Others may be appropriate.
    • Plus RandomWalk, RandomStockPrices, etc
  • The Noisy functions, perhaps with the distribution of the residuals configurable to address the issue where some have normal and some have uniform residuals
  • The Zeroes function, and possibly the Ones function
  • The SampleImageData is there because it's used by one of the demos, I think having it return a constant is not great, but I think it could be made more useful generically. Currently it feels like it's more there as part of a demo than it is to be useful.

A lot of these functions have overloads that can be gotten rid of, either through default parameters or through just not having a reason to exist as a separate overload. Additionally, the consistency could be improved, some random functions take a seed, and some take a Random object. Some check if the Random object is null and if so they initialize it. Some do not.

@bclehmann
Copy link
Member Author

Support for lots of random distributions could probably be done by having each distribution have a GetRandomValue function and then using the strategy pattern to let that be used by functions that return an array of them.

If you wanted to get extra fancy, you could find a way to extend this to things like RandomWalk or Consecutive or Sin which require knowledge of more state such as what came before them or what the index is. But I don't know if there's a way to do that such that you actually get a benefit in terms of clean or easy to maintain code.

@swharden swharden linked an issue Mar 28, 2021 that may be closed by this pull request
@swharden swharden mentioned this pull request Mar 28, 2021
48 tasks
@swharden
Copy link
Member

I think SinSweep was used for testing rendering artifacts when plotting data with different densities of data. They generate sine waves that sweep frequency from low to high.

double[] ys = ScottPlot.DataGen.SinSweep(pointCount: 5000, density: 20);

image

Related, I've used this to study some biological properties of neurons: https://github.com/swharden/pyABF/blob/master/docs/advanced/creating-waveforms

image

SinSweepByte doesn't look particularly useful and could be recreated using LINQ, so I deleted that method.

Altogether the sweeping sine wave is a pretty niche need and can probably be omitted from future data generation modules

@swharden
Copy link
Member

DataGen.Electrocardiogram

When it comes time to redesign the DataGen class as new modules (#716), it's probably a good idea to have a ScottPlot.Generate static class and a ScottPlot.SampleData namespace. The sample data can hold things like images (Mona Lisa), sample audio, and that ECG module.

Specifically, DataGen.Electrocardiogram is used as a data source to produce signals that look like heartbeats when testing apps that display live, updating data.

@swharden swharden marked this pull request as ready for review April 10, 2021 15:25
@swharden swharden merged commit 268ca22 into ScottPlot:master Apr 10, 2021
@swharden swharden mentioned this pull request May 17, 2021
82 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better document DataGen

2 participants