Skip to content

Curve Fitting#240

Merged
m-mohr merged 10 commits intodraftfrom
fit-curve
Jul 16, 2021
Merged

Curve Fitting#240
m-mohr merged 10 commits intodraftfrom
fit-curve

Conversation

@m-mohr
Copy link
Copy Markdown
Member

@m-mohr m-mohr commented Apr 26, 2021

A first draft for the curve fitting process. Name fit_curve and highly inspired by scipy.curve_fit. This is mostly to spur discussions rather than a final proposal.

Related to #231

@m-mohr m-mohr added this to the 1.1.0 milestone Apr 26, 2021
@m-mohr m-mohr changed the title First draft for fit_curve process #231 Curve Fitting Apr 26, 2021
@m-mohr m-mohr requested a review from lforesta April 27, 2021 10:26
@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Apr 27, 2021

The process is written from a use-case perspective based on an example from @clausmichele.

My question is whether this can be implemented by our back-ends, so for Platform, this is especially aimed at @jdries and @lforesta.

@m-mohr m-mohr mentioned this pull request Apr 27, 2021
10 tasks
@m-mohr m-mohr modified the milestones: 1.1.0, 1.2.0 May 18, 2021
@m-mohr m-mohr linked an issue May 18, 2021 that may be closed by this pull request
10 tasks
@clausmichele
Copy link
Copy Markdown
Member

Hi @m-mohr, I'm trying to implement fit_curve, but I don't understand how to write the fitting function.
parameters+(parameters*cos(2*pi/362.25*x))+(parameters*sin(2*pi/362.25*x)) ? How do I specify which parameters to use if it's an array?

I will use dimension_labels to get the timestamps, is it ok in your opinion?
image

@clausmichele
Copy link
Copy Markdown
Member

@m-mohr could you please have a look here? I can't proceed with the implementation otherwise

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 15, 2021

What is parameters in your case? Is that a single parameter?

PS: I was on vacation last week and I'm still catching up.

@clausmichele
Copy link
Copy Markdown
Member

clausmichele commented Jun 15, 2021

parameters is an array of coefficients, for this specific case is [a0,a1,a2], because the function I need to fit is:
a0+(a1*cos(2*pi/362.25*x))+(a2*sin(2*pi/362.25*x))

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 15, 2021

I've pushed an update to the process specification which adds a "parameters" parameter, which we (may?) need to define the number of parameters.

Here's how a process graph could look like for the formula you gave:

{
  "process_graph": {
    "load": {
      "process_id": "load_collection",
      "arguments": {
        "id": "S2_L2A_T32TPS",
        "spatial_extent": null,
        "temporal_extent": null
      }
    },
    "labels": {
      "process_id": "dimension_labels",
      "arguments": {
        "data": {
          "from_node": "load"
        },
        "dimension": "temporal"
      }
    },
    "fit": {
      "process_id": "fit_curve",
      "arguments": {
        "data": {
          "from_node": "load"
        },
        "labels": {
          "from_node": "labels"
        },
        "parameters": [
          1,
          1,
          1
        ],
        "function": {
          "process_graph": {
            "765a9m5jr": {
              "process_id": "pi",
              "arguments": {}
            },
            "a0": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 0
              }
            },
            "a1": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 1
              }
            },
            "a2": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "parameters"
                },
                "index": 2
              }
            },
            "viadlidxt": {
              "process_id": "multiply",
              "arguments": {
                "x": 2,
                "y": {
                  "from_node": "765a9m5jr"
                }
              }
            },
            "dzcmqylf5": {
              "process_id": "multiply",
              "arguments": {
                "x": 2,
                "y": {
                  "from_node": "765a9m5jr"
                }
              }
            },
            "az83u1fgl": {
              "process_id": "divide",
              "arguments": {
                "x": {
                  "from_node": "viadlidxt"
                },
                "y": 362.25
              }
            },
            "l3wirzu5v": {
              "process_id": "divide",
              "arguments": {
                "x": {
                  "from_node": "dzcmqylf5"
                },
                "y": 362.25
              }
            },
            "4kt3ezdz5": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "az83u1fgl"
                },
                "y": {
                  "from_parameter": "x"
                }
              }
            },
            "4kceskraz": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "l3wirzu5v"
                },
                "y": {
                  "from_parameter": "x"
                }
              }
            },
            "o7s99j9pd": {
              "process_id": "cos",
              "arguments": {
                "x": {
                  "from_node": "4kt3ezdz5"
                }
              }
            },
            "kydj4bwy3": {
              "process_id": "sin",
              "arguments": {
                "x": {
                  "from_node": "4kceskraz"
                }
              }
            },
            "e19f54kwn": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "a1"
                },
                "y": {
                  "from_node": "o7s99j9pd"
                }
              }
            },
            "82rlj4j02": {
              "process_id": "multiply",
              "arguments": {
                "x": {
                  "from_node": "a2"
                },
                "y": {
                  "from_node": "kydj4bwy3"
                }
              }
            },
            "f9xv6wbqn": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_node": "a0"
                },
                "y": {
                  "from_node": "e19f54kwn"
                }
              }
            },
            "rjb4pmv5a": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_node": "f9xv6wbqn"
                },
                "y": {
                  "from_node": "82rlj4j02"
                }
              },
              "result": true
            }
          }
        }
      },
      "result": true
    }
  }
}

Please ensure that dimension labels for the temporal dimension are actually numerical and not just ISO strings!

PS: This is purely theoretical and I've never actually tried or implemented it, so happy to take any suggestions that may make it easier or better etc.

@clausmichele
Copy link
Copy Markdown
Member

Thanks. I was actually thinking that the fit_curve process takes care of translating the temporal labels/strings (or whatever format they are) into the format required by the process (numerical).

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 15, 2021

If that's a common use case we could think about this special case and support it, yes.

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 15, 2021

Hmm, I'm just thinking whether it would be easier if this process returns an actual process graph instead of just the parameters?
What do you really need from this? The parameters or the function to pass your values to?

@clausmichele
Copy link
Copy Markdown
Member

With the computed parameters [a0,a1,a2] and new time steps coming from another "testing" datacube (load_collection with a subsequent time extent) I need to predict the values the fitted function would take with the new time steps. This would basically return the a datacube with the same temporal extent as the "test" one, but different values: no more real data but predicted values.

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 15, 2021

Results from a call with @clausmichele:

  • There's no way yet to properly pass the "data" parameter - we were at a point to pass a raster-cube and the dimension name to work on, but decided against it as it's not very generic. Instead, I'll add an openEO process to get a (labeled?) array with values for a dimension (tbc if that actually makes sense).
  • We may want to make the "labels" parameter optional and default to the labels/indices from the array.
  • We want to allow labels to be date-times for convenience
  • The process should still return the computed parameters so that they can be compared, stored or be re-used for later calls to fit_curve as defaults for the parameters.
  • I should highlight in the process that the function should be stored as user-defined function so that users can re-use it for both the curve fitting and later applying the function with the computed parameters to the data cube. The computed parameters can be passed as "context" in apply.

@m-mohr m-mohr self-assigned this Jun 15, 2021
@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 21, 2021

@clausmichele I made an attempt to cover what we discussed in a single process, which returns 2(!) data cubes. I have not had the time to fine-tune the descriptions yet, but would you think it would work this way for you?

@clausmichele
Copy link
Copy Markdown
Member

How would I have to use the result? How do I select the "parameters" datacube to predict values for a different datacube with different timesteps (same x,y,bands)?
Up to this point it's clear to me how to feed the data to the process and compute the output, but I'm not sure about what could follow.

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 22, 2021

How would I have to use the result?

You get the data cube with array_element(result, 0)

How do I select the "parameters" datacube

As above, but use 1 instead of 0.

to predict values for a different datacube with different timesteps (same x,y,bands)?

Not possible. That use case didn't came up yet or I did not understand it.

Up to this point it's clear to me how to feed the data to the process and compute the output, but I'm not sure about what could follow.

I'm not sure what you want to have. You input a data cube (e.g. S2 with dimensions x,y,t,b) and get returned a data cube (x,y,t,b) with the predicted dates (as passed into predict_labels) on the temporal dimension instead of the original dates. Could you give some more use-case examples?
I just tried to make the use-case from yesterday working.

@clausmichele
Copy link
Copy Markdown
Member

to predict values for a different datacube with different timesteps (same x,y,bands)?

Not possible. That use case didn't came up yet or I did not understand it.

Some time ago I've shared with you the python code of the use case implementation, which is here https://github.com/SARScripts/SAR2Cube_use_cases/blob/main/SAR2Cube_Forest_Change.ipynb

You can see that we need to fit the curve function over a certain period (2 years of data in this case) but then we need to predict values of a different period (the "testing" one) as well.

Maybe this sketch can clarify the pipeline:
image

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 22, 2021

Some time ago I've shared with you the python code of the use case implementation, which is here https://github.com/SARScripts/SAR2Cube_use_cases/blob/main/SAR2Cube_Forest_Change.ipynb

Yes, but the code is mostly undocumented and has a lot of other things in it. I can't exactly figure out what the code does and what part of the code is needed for the openEO use case, sorry.

You can see that we need to fit the curve function over a certain period (2 years of data in this case) but then we need to predict values of a different period (the "testing" one) as well.

Maybe this sketch can clarify the pipeline:
image

Okay, so if I understand correctly the issue is that the "parameters" parameter is not a datacube, but an array, right? So the node "array_element (parameters)" is meant to be passed into the "parameters" parameter of fit_curve, right?

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 22, 2021

Or do you want to predict values without an actual curve_fitting just based on the parameters that are returned for the first fit_curve? Then we'd need two processes: fit + predict. The first fit_curve would be replaced with fit + predict and the second fit_curve would only use predict.

@clausmichele
Copy link
Copy Markdown
Member

Or do you want to predict values without an actual curve_fitting just based on the parameters that are returned for the first fit_curve? Then we'd need two processes: fit + predict. The first fit_curve would be replaced with fit + predict and the second fit_curve would only use predict.

This! It is indeed what we would need

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 22, 2021

Okay, thanks for clarifying, that makes things likely a bit more complicated, but I'll come up with another proposal (for two processes!) then.

@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jun 25, 2021

I've committed a new proposal that splits the fit_curve process into two parts: fit_curve and predict_curve (names TBD). I have to polish the descriptions a lot, but for now it's meant to discuss whether the proposal is feasible and makes sense for many use cases.

Here's an example:

First of all, the model function should be stored separately, here as an example function stored as a process with the name example_fitting_function_eurac. We'll use this function in the other process below.

{
  "id": "example_fitting_function_eurac",
  "parameters": [
    {
      "name": "x",
      "schema": {
        "type": "number"
      }
    },
    {
      "name": "parameters",
      "schema": {
        "type": "array",
        "items": {
          "type": "number"
        }
      }
    }
  ],
  "process_graph": {
    "765a9m5jr": {
      "process_id": "pi",
      "arguments": {}
    },
    "a0": {
      "process_id": "array_element",
      "arguments": {
        "data": [
          "parameters"
        ],
        "index": 0
      }
    },
    "a1": {
      "process_id": "array_element",
      "arguments": {
        "data": {
          "from_parameter": "parameters"
        },
        "index": 1
      }
    },
    "a2": {
      "process_id": "array_element",
      "arguments": {
        "data": {
          "from_parameter": "parameters"
        },
        "index": 2
      }
    },
    "dzcmqylf5": {
      "process_id": "multiply",
      "arguments": {
        "x": 2,
        "y": {
          "from_node": "765a9m5jr"
        }
      }
    },
    "viadlidxt": {
      "process_id": "multiply",
      "arguments": {
        "x": 2,
        "y": {
          "from_node": "765a9m5jr"
        }
      }
    },
    "l3wirzu5v": {
      "process_id": "divide",
      "arguments": {
        "x": {
          "from_node": "dzcmqylf5"
        },
        "y": 362.25
      }
    },
    "az83u1fgl": {
      "process_id": "divide",
      "arguments": {
        "x": {
          "from_node": "viadlidxt"
        },
        "y": 362.25
      }
    },
    "4kceskraz": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "l3wirzu5v"
        },
        "y": {
          "from_parameter": "x"
        }
      }
    },
    "4kt3ezdz5": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "az83u1fgl"
        },
        "y": {
          "from_parameter": "x"
        }
      }
    },
    "kydj4bwy3": {
      "process_id": "sin",
      "arguments": {
        "x": {
          "from_node": "4kceskraz"
        }
      }
    },
    "o7s99j9pd": {
      "process_id": "cos",
      "arguments": {
        "x": {
          "from_node": "4kt3ezdz5"
        }
      }
    },
    "82rlj4j02": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "a2"
        },
        "y": {
          "from_node": "kydj4bwy3"
        }
      }
    },
    "e19f54kwn": {
      "process_id": "multiply",
      "arguments": {
        "x": {
          "from_node": "a1"
        },
        "y": {
          "from_node": "o7s99j9pd"
        }
      }
    },
    "f9xv6wbqn": {
      "process_id": "add",
      "arguments": {
        "x": {
          "from_node": "a0"
        },
        "y": {
          "from_node": "e19f54kwn"
        }
      }
    },
    "rjb4pmv5a": {
      "process_id": "add",
      "arguments": {
        "x": {
          "from_node": "f9xv6wbqn"
        },
        "y": {
          "from_node": "82rlj4j02"
        }
      },
      "result": true
    }
  }
}

Here's an example workflow based on @clausmichele's image above (if I understood it correctly as the actual values/inputs are missing partially from the image):

{
  "process_graph": {
    "1": {
      "process_id": "load_collection",
      "arguments": {
        "id": "COPERNICUS/S2",
        "spatial_extent": null,
        "temporal_extent": [
          "2015-01-01T00:00:00Z",
          "2018-01-01T00:00:00Z"
        ],
        "bands": null
      }
    },
    "2": {
      "process_id": "filter_temporal",
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "extent": [
          "2015-01-01T00:00:00Z",
          "2017-01-01T00:00:00Z"
        ]
      }
    },
    "3": {
      "process_id": "filter_temporal",
      "arguments": {
        "data": {
          "from_node": "1"
        },
        "extent": [
          "2017-01-01T00:00:00Z",
          "2018-01-01T00:00:00Z"
        ]
      }
    },
    "4": {
      "process_id": "fit_curve",
      "arguments": {
        "data": {
          "from_node": "2"
        },
        "function": {
          "process_graph": {
            "1": {
              "process_id": "example_fitting_function_eurac",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "parameters": {
                  "from_parameter": "parameters"
                }
              },
              "result": true
            }
          }
        },
        "parameters": [
          1,
          1,
          1
        ],
        "dimension": "t"
      }
    },
    "5": {
      "process_id": "predict_curve",
      "arguments": {
        "data": {
          "from_node": "3"
        },
        "parameters": {
          "from_node": "4"
        },
        "function": {
          "process_graph": {
            "1": {
              "process_id": "example_fitting_function_eurac",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "parameters": {
                  "from_parameter": "parameters"
                }
              },
              "result": true
            }
          }
        },
        "dimension": "t"
      },
      "result": true
    }
  }
}

image

The example likely misses the "labels" in "predict_curve" as they are missing from the example from @clausmichele. It's not clear for which values predictions are computed.

jdries added a commit to Open-EO/openeo-python-client that referenced this pull request Jul 15, 2021
@m-mohr
Copy link
Copy Markdown
Member Author

m-mohr commented Jul 15, 2021

This seems to work, so shall we merge, @clausmichele ?

@clausmichele
Copy link
Copy Markdown
Member

Yes, for me it's fine!

@m-mohr m-mohr merged commit 4e532e2 into draft Jul 16, 2021
@m-mohr m-mohr deleted the fit-curve branch July 16, 2021 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additional use cases

3 participants