Curve Fitting by m-mohr · Pull Request #240 · Open-EO/openeo-processes

m-mohr · 2021-04-26T15:22:29Z

A first draft for the curve fitting process. Name fit_curve and highly inspired by scipy.curve_fit. This is mostly to spur discussions rather than a final proposal.

Related to #231

m-mohr · 2021-04-27T10:28:09Z

The process is written from a use-case perspective based on an example from @clausmichele.

My question is whether this can be implemented by our back-ends, so for Platform, this is especially aimed at @jdries and @lforesta.

proposals/fit_curve.json

clausmichele · 2021-06-07T14:56:50Z

Hi @m-mohr, I'm trying to implement fit_curve, but I don't understand how to write the fitting function.
parameters+(parameters*cos(2*pi/362.25*x))+(parameters*sin(2*pi/362.25*x)) ? How do I specify which parameters to use if it's an array?

I will use dimension_labels to get the timestamps, is it ok in your opinion?

clausmichele · 2021-06-15T06:48:57Z

@m-mohr could you please have a look here? I can't proceed with the implementation otherwise

m-mohr · 2021-06-15T08:55:06Z

What is parameters in your case? Is that a single parameter?

PS: I was on vacation last week and I'm still catching up.

clausmichele · 2021-06-15T08:56:55Z

parameters is an array of coefficients, for this specific case is [a0,a1,a2], because the function I need to fit is:
a0+(a1*cos(2*pi/362.25*x))+(a2*sin(2*pi/362.25*x))

# Conflicts: # CHANGELOG.md

m-mohr · 2021-06-15T12:24:43Z

I've pushed an update to the process specification which adds a "parameters" parameter, which we (may?) need to define the number of parameters.

Here's how a process graph could look like for the formula you gave:

{
  "process_graph": {
    "load": {
      "process_id": "load_collection",
      "arguments": {
        "id": "S2_L2A_T32TPS",
        "spatial_extent": null,
        "temporal_extent": null
      }
    },
    "labels": {
      "process_id": "dimension_labels",
      "arguments": {
        "data": {
          "from_node": "load"
        },
        "dimension": "temporal"
      }
    },
    "fit": {
      "process_id": "fit_curve",
      "arguments": {
        "data": {
          "from_node": "load"
        },
        "labels": {
          "from_node": "labels"
        },
        "parameters": [
          1,
          1,
          1
        ],
        "function": {
          "process_graph": {
            "765a9m5jr": {
              "process_id": "pi",
              "arguments": {}
            },
            "a0": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 0
              }
            },
            "a1": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 1
              }
            },
            "a2": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 2
              }
            },
            "viadlidxt": {
              "process_id": "multiply",
              "arguments": {
                "x": 2,
                "y": {
                  "from_node": "765a9m5jr"
                }
              }
            },
            "dzcmqylf5": {
              "process_id": "multiply",
              "arguments": {
                "x": 2,
                "y": {
                  "from_node": "765a9m5jr"
                }
              }
            },
            "az83u1fgl": {
              "process_id": "divide",
              "arguments": {
                "x": {
                  "from_node": "viadlidxt"
                },
                "y": 362.25
              }
            },
            "l3wirzu5v": {
              "process_id": "divide",
              "arguments": {
                "x": {
                  "from_node": "dzcmqylf5"
                },
                "y": 362.25
              }
            },
            "4kt3ezdz5": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "az83u1fgl"
                },
                "y": {
                  "from_parameter": "x"
                }
              }
            },
            "4kceskraz": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "l3wirzu5v"
                },
                "y": {
                  "from_parameter": "x"
                }
              }
            },
            "o7s99j9pd": {
              "process_id": "cos",
              "arguments": {
                "x": {
                  "from_node": "4kt3ezdz5"
                }
              }
            },
            "kydj4bwy3": {
              "process_id": "sin",
              "arguments": {
                "x": {
                  "from_node": "4kceskraz"
                }
              }
            },
            "e19f54kwn": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "a1"
                },
                "y": {
                  "from_node": "o7s99j9pd"
                }
              }
            },
            "82rlj4j02": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "a2"
                },
                "y": {
                  "from_node": "kydj4bwy3"
                }
              }
            },
            "f9xv6wbqn": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_node": "a0"
                },
                "y": {
                  "from_node": "e19f54kwn"
                }
              }
            },
            "rjb4pmv5a": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_node": "f9xv6wbqn"
                },
                "y": {
                  "from_node": "82rlj4j02"
                }
              },
              "result": true
            }
          }
        }
      },
      "result": true
    }
  }
}

Please ensure that dimension labels for the temporal dimension are actually numerical and not just ISO strings!

PS: This is purely theoretical and I've never actually tried or implemented it, so happy to take any suggestions that may make it easier or better etc.

clausmichele · 2021-06-15T13:20:50Z

Thanks. I was actually thinking that the fit_curve process takes care of translating the temporal labels/strings (or whatever format they are) into the format required by the process (numerical).

m-mohr · 2021-06-15T13:22:51Z

If that's a common use case we could think about this special case and support it, yes.

m-mohr · 2021-06-15T13:36:03Z

Hmm, I'm just thinking whether it would be easier if this process returns an actual process graph instead of just the parameters?
What do you really need from this? The parameters or the function to pass your values to?

clausmichele · 2021-06-15T14:00:30Z

With the computed parameters [a0,a1,a2] and new time steps coming from another "testing" datacube (load_collection with a subsequent time extent) I need to predict the values the fitted function would take with the new time steps. This would basically return the a datacube with the same temporal extent as the "test" one, but different values: no more real data but predicted values.

m-mohr · 2021-06-15T14:41:42Z

Results from a call with @clausmichele:

There's no way yet to properly pass the "data" parameter - we were at a point to pass a raster-cube and the dimension name to work on, but decided against it as it's not very generic. Instead, I'll add an openEO process to get a (labeled?) array with values for a dimension (tbc if that actually makes sense).
We may want to make the "labels" parameter optional and default to the labels/indices from the array.
We want to allow labels to be date-times for convenience
The process should still return the computed parameters so that they can be compared, stored or be re-used for later calls to fit_curve as defaults for the parameters.
I should highlight in the process that the function should be stored as user-defined function so that users can re-use it for both the curve fitting and later applying the function with the computed parameters to the data cube. The computed parameters can be passed as "context" in apply.

m-mohr · 2021-06-21T16:27:34Z

@clausmichele I made an attempt to cover what we discussed in a single process, which returns 2(!) data cubes. I have not had the time to fine-tune the descriptions yet, but would you think it would work this way for you?

clausmichele · 2021-06-22T06:20:41Z

How would I have to use the result? How do I select the "parameters" datacube to predict values for a different datacube with different timesteps (same x,y,bands)?
Up to this point it's clear to me how to feed the data to the process and compute the output, but I'm not sure about what could follow.

m-mohr · 2021-06-22T07:54:25Z

How would I have to use the result?

You get the data cube with array_element(result, 0)

How do I select the "parameters" datacube

As above, but use 1 instead of 0.

to predict values for a different datacube with different timesteps (same x,y,bands)?

Not possible. That use case didn't came up yet or I did not understand it.

Up to this point it's clear to me how to feed the data to the process and compute the output, but I'm not sure about what could follow.

I'm not sure what you want to have. You input a data cube (e.g. S2 with dimensions x,y,t,b) and get returned a data cube (x,y,t,b) with the predicted dates (as passed into predict_labels) on the temporal dimension instead of the original dates. Could you give some more use-case examples?
I just tried to make the use-case from yesterday working.

clausmichele · 2021-06-22T09:03:14Z

to predict values for a different datacube with different timesteps (same x,y,bands)?

Not possible. That use case didn't came up yet or I did not understand it.

Some time ago I've shared with you the python code of the use case implementation, which is here https://github.com/SARScripts/SAR2Cube_use_cases/blob/main/SAR2Cube_Forest_Change.ipynb

You can see that we need to fit the curve function over a certain period (2 years of data in this case) but then we need to predict values of a different period (the "testing" one) as well.

Maybe this sketch can clarify the pipeline:

m-mohr · 2021-06-22T13:30:12Z

Some time ago I've shared with you the python code of the use case implementation, which is here https://github.com/SARScripts/SAR2Cube_use_cases/blob/main/SAR2Cube_Forest_Change.ipynb

Yes, but the code is mostly undocumented and has a lot of other things in it. I can't exactly figure out what the code does and what part of the code is needed for the openEO use case, sorry.

You can see that we need to fit the curve function over a certain period (2 years of data in this case) but then we need to predict values of a different period (the "testing" one) as well.

Maybe this sketch can clarify the pipeline:

Okay, so if I understand correctly the issue is that the "parameters" parameter is not a datacube, but an array, right? So the node "array_element (parameters)" is meant to be passed into the "parameters" parameter of fit_curve, right?

m-mohr · 2021-06-22T13:32:05Z

Or do you want to predict values without an actual curve_fitting just based on the parameters that are returned for the first fit_curve? Then we'd need two processes: fit + predict. The first fit_curve would be replaced with fit + predict and the second fit_curve would only use predict.

clausmichele · 2021-06-22T14:07:55Z

Or do you want to predict values without an actual curve_fitting just based on the parameters that are returned for the first fit_curve? Then we'd need two processes: fit + predict. The first fit_curve would be replaced with fit + predict and the second fit_curve would only use predict.

This! It is indeed what we would need

m-mohr · 2021-06-22T14:09:41Z

Okay, thanks for clarifying, that makes things likely a bit more complicated, but I'll come up with another proposal (for two processes!) then.

m-mohr · 2021-06-25T15:47:41Z

I've committed a new proposal that splits the fit_curve process into two parts: fit_curve and predict_curve (names TBD). I have to polish the descriptions a lot, but for now it's meant to discuss whether the proposal is feasible and makes sense for many use cases.

Here's an example:

First of all, the model function should be stored separately, here as an example function stored as a process with the name example_fitting_function_eurac. We'll use this function in the other process below.

{
  "id": "example_fitting_function_eurac",
  "parameters": [
    {
      "name": "x",
      "schema": {
        "type": "number"
      }
    },
    {
      "name": "parameters",
      "schema": {
        "type": "array",
        "items": {
          "type": "number"
        }
      }
    }
  ],
  "process_graph": {
    "765a9m5jr": {
      "process_id": "pi",
      "arguments": {}
    },
    "a0": {
      "process_id": "array_element",
      "arguments": {
        "data": [
          "parameters"
        ],
        "index": 0
      }
    },
    "a1": {
      "process_id": "array_element",
      "arguments": {
        "data": {
          "from_parameter": "parameters"
        },
        "index": 1
      }
    },
    "a2": {
      "process_id": "array_element",
      "arguments": {
        "data": {
          "from_parameter": "parameters"
        },
        "index": 2
      }
    },
    "dzcmqylf5": {
      "process_id": "multiply",
      "arguments": {
        "x": 2,
        "y": {
          "from_node": "765a9m5jr"
        }
      }
    },
    "viadlidxt": {
      "process_id": "multiply",
      "arguments": {
        "x": 2,
        "y": {
          "from_node": "765a9m5jr"
        }
      }
    },
    "l3wirzu5v": {
      "process_id": "divide",
      "arguments": {
        "x": {
          "from_node": "dzcmqylf5"
        },
        "y": 362.25
      }
    },
    "az83u1fgl": {
      "process_id": "divide",
      "arguments": {
        "x": {
          "from_node": "viadlidxt"
        },
        "y": 362.25
      }
    },
    "4kceskraz": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "l3wirzu5v"
        },
        "y": {
          "from_parameter": "x"
        }
      }
    },
    "4kt3ezdz5": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "az83u1fgl"
        },
        "y": {
          "from_parameter": "x"
        }
      }
    },
    "kydj4bwy3": {
      "process_id": "sin",
      "arguments": {
        "x": {
          "from_node": "4kceskraz"
        }
      }
    },
    "o7s99j9pd": {
      "process_id": "cos",
      "arguments": {
        "x": {
          "from_node": "4kt3ezdz5"
        }
      }
    },
    "82rlj4j02": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "a2"
        },
        "y": {
          "from_node": "kydj4bwy3"
        }
      }
    },
    "e19f54kwn": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "a1"
        },
        "y": {
          "from_node": "o7s99j9pd"
        }
      }
    },
    "f9xv6wbqn": {
      "process_id": "add",
      "arguments": {
        "x": {
          "from_node": "a0"
        },
        "y": {
          "from_node": "e19f54kwn"
        }
      }
    },
    "rjb4pmv5a": {
      "process_id": "add",
      "arguments": {
        "x": {
          "from_node": "f9xv6wbqn"
        },
        "y": {
          "from_node": "82rlj4j02"
        }
      },
      "result": true
    }
  }
}

Here's an example workflow based on @clausmichele's image above (if I understood it correctly as the actual values/inputs are missing partially from the image):

{
  "process_graph": {
    "1": {
      "process_id": "load_collection",
      "arguments": {
        "id": "COPERNICUS/S2",
        "spatial_extent": null,
        "temporal_extent": [
          "2015-01-01T00:00:00Z",
          "2018-01-01T00:00:00Z"
        ],
        "bands": null
      }
    },
    "2": {
      "process_id": "filter_temporal",
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "extent": [
          "2015-01-01T00:00:00Z",
          "2017-01-01T00:00:00Z"
        ]
      }
    },
    "3": {
      "process_id": "filter_temporal",
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "extent": [
          "2017-01-01T00:00:00Z",
          "2018-01-01T00:00:00Z"
        ]
      }
    },
    "4": {
      "process_id": "fit_curve",
      "arguments": {
        "data": {
          "from_node": "2"
        },
        "function": {
          "process_graph": {
            "1": {
              "process_id": "example_fitting_function_eurac",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "parameters": {
                  "from_parameter": "parameters"
                }
              },
              "result": true
            }
          }
        },
        "parameters": [
          1,
          1,
          1
        ],
        "dimension": "t"
      }
    },
    "5": {
      "process_id": "predict_curve",
      "arguments": {
        "data": {
          "from_node": "3"
        },
        "parameters": {
          "from_node": "4"
        },
        "function": {
          "process_graph": {
            "1": {
              "process_id": "example_fitting_function_eurac",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "parameters": {
                  "from_parameter": "parameters"
                }
              },
              "result": true
            }
          }
        },
        "dimension": "t"
      },
      "result": true
    }
  }
}

The example likely misses the "labels" in "predict_curve" as they are missing from the example from @clausmichele. It's not clear for which values predictions are computed.

Open-EO/openeo-processes#240 openEOPlatform/architecture-docs#53

m-mohr · 2021-07-15T15:21:51Z

This seems to work, so shall we merge, @clausmichele ?

clausmichele · 2021-07-16T14:31:54Z

Yes, for me it's fine!

First draft for fit_curve process #231

31a5fd1

m-mohr added new process platform labels Apr 26, 2021

m-mohr added this to the 1.1.0 milestone Apr 26, 2021

m-mohr requested review from aljacob, clausmichele, jdries and soxofaan April 26, 2021 15:22

m-mohr changed the title ~~First draft for fit_curve process #231~~ Curve Fitting Apr 26, 2021

m-mohr requested a review from lforesta April 27, 2021 10:26

m-mohr mentioned this pull request Apr 27, 2021

Additional use cases #231

Closed

10 tasks

soxofaan reviewed Apr 27, 2021

View reviewed changes

proposals/fit_curve.json Outdated Show resolved Hide resolved

m-mohr mentioned this pull request Apr 27, 2021

array_labels: Return something meaningful for arrays #243

Closed

Split data (x+y) into data (y) and labels (x).

b8a0687

m-mohr modified the milestones: 1.1.0, 1.2.0 May 18, 2021

m-mohr linked an issue May 18, 2021 that may be closed by this pull request

Additional use cases #231

Closed

10 tasks

m-mohr mentioned this pull request Jun 4, 2021

Release openEO processes v1.1.0 Open-EO/PSC#12

Closed

m-mohr added 2 commits June 15, 2021 13:51

Merge remote-tracking branch 'origin/draft' into fit-curve

128b9c9

# Conflicts: # CHANGELOG.md

Add parameters parameter

7ff993d

Merge remote-tracking branch 'origin/draft' into fit-curve

d829f42

m-mohr self-assigned this Jun 15, 2021

m-mohr mentioned this pull request Jun 18, 2021

Add array_create_labels process #268

Merged

Allow date-times to be passed as labels

5d2bb8e

m-mohr force-pushed the fit-curve branch from 549eea7 to 5d2bb8e Compare June 18, 2021 16:38

m-mohr added 2 commits June 21, 2021 12:55

function should be stored as udp

808a44b

Draft to work on data cubes directly

08ed3d2

Split process into two parts

848df83

m-mohr force-pushed the fit-curve branch from 626c94e to 848df83 Compare June 25, 2021 15:52

Update CHANGELOG

ef1e0f0

jdries added a commit to Open-EO/openeo-python-client that referenced this pull request Jul 15, 2021

Add fit_curve and predict_curve

7d0e28f

Open-EO/openeo-processes#240 openEOPlatform/architecture-docs#53

m-mohr merged commit 4e532e2 into draft Jul 16, 2021

m-mohr deleted the fit-curve branch July 16, 2021 14:32

Conversation

m-mohr commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Apr 27, 2021

Uh oh!

Uh oh!

clausmichele commented Jun 7, 2021

Uh oh!

clausmichele commented Jun 15, 2021

Uh oh!

m-mohr commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clausmichele commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clausmichele commented Jun 15, 2021

Uh oh!

m-mohr commented Jun 15, 2021

Uh oh!

m-mohr commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clausmichele commented Jun 15, 2021

Uh oh!

m-mohr commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Jun 21, 2021

Uh oh!

clausmichele commented Jun 22, 2021

Uh oh!

m-mohr commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clausmichele commented Jun 22, 2021

Uh oh!

m-mohr commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clausmichele commented Jun 22, 2021

Uh oh!

m-mohr commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-mohr commented Jul 15, 2021

Uh oh!

clausmichele commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

m-mohr commented Apr 26, 2021 •

edited

Loading

m-mohr commented Jun 15, 2021 •

edited

Loading

clausmichele commented Jun 15, 2021 •

edited

Loading

m-mohr commented Jun 15, 2021 •

edited

Loading

m-mohr commented Jun 15, 2021 •

edited

Loading

m-mohr commented Jun 15, 2021 •

edited

Loading

m-mohr commented Jun 22, 2021 •

edited

Loading

m-mohr commented Jun 22, 2021 •

edited

Loading

m-mohr commented Jun 22, 2021 •

edited

Loading

m-mohr commented Jun 22, 2021 •

edited

Loading

m-mohr commented Jun 25, 2021 •

edited

Loading