Skip to content

Create function to trim output json file from not necessary data#625

Merged
Bachibouzouk merged 19 commits intodevfrom
feature/reduce-output
Nov 5, 2020
Merged

Create function to trim output json file from not necessary data#625
Bachibouzouk merged 19 commits intodevfrom
feature/reduce-output

Conversation

@Bachibouzouk
Copy link
Copy Markdown
Collaborator

@Bachibouzouk Bachibouzouk commented Oct 29, 2020

Address part of #606

Changes proposed in this pull request:

  • Move retrieve_date_time_info from C0 to B0

Tasks from #606 (comment):

  • 2. After closely inspecting the JSON file (~29MB), we observed timestamps are the same for all timeseries. We can significantly reduce the size and redundancy of the file by removing all timestamps vectors from the file. The period can be deduced by simulation settings.

  • 3. We also spotted attributes which do not exist in the EPA side (e.g. costs_investment_over_lifetime). These can be left-out at first and added by the end user.

  • 4. “EnergyBusses" field should be provided directly by EPA as MVS input, it can therefore be dropped

  • 5. Data which are NOT modified by MVS (e.g. economic_data, EnergyConsumption{development_costs, age_installed, optimizedCap, input timeSeries, etc.}) shall not be send back to EPA to prevent redundancy.

  • 7. “path_to_plots” seem to be referencing plotly dash generated figures. We can omit it.

  • 8. cost_matrix, scalar_matrix
    not sure if needed

  • 9. optimizedFlows is a duplicate of the flows of each asset, it can be omitted in the returned json


  • Use in-line comments to explain your code
  • Write docstrings to your code (example docstring)
  • For new functionalities: Explain in readthedocs
  • Write test(s) for your new patch of code (pytests, assertion debug messages)
  • Update the CHANGELOG.md
  • Apply black (black . --exclude docs/)
  • Check if benchmark tests pass locally (EXECUTE_TESTS_ON=master pytest)

For more information on how to contribute check the CONTRIBUTING.md.

@Bachibouzouk Bachibouzouk force-pushed the feature/reduce-output branch 2 times, most recently from e42d019 to 6c65ce0 Compare October 30, 2020 18:23
@Bachibouzouk Bachibouzouk force-pushed the feature/reduce-output branch from 110a923 to fcc5f6c Compare November 2, 2020 22:33
@smartie2076
Copy link
Copy Markdown
Collaborator

* [x]  2. After closely inspecting the JSON file (~29MB), we observed `timestamps `are the same for all timeseries. We can significantly reduce the size and redundancy of the file by removing all timestamps vectors from the file. The period can be deduced by simulation settings.

Nice

* [ ]  3. We also spotted attributes which do not exist in the EPA side (e.g. costs_investment_over_lifetime). These can be left-out at first and added by the end user.

Do you want to address this as well in this PR? It will be huge that way. Also, if I understood correctly, ICOM will ignore/drop unknown parameters/entries, so we would not have to put work into that right now (especially as we get feedback from the end users next friday, whether they would like to keep the parameters you would need to drop).

* [x]  4.  “EnergyBusses" field should be provided directly by EPA as MVS input, it can therefore be dropped

I should create another issue for this, and we should remove it seperately, from my point of view. If we drop it from our outputs, eventhough it is still part of our processing in C, this may be misleading. If we drop it, we should replace it for our manual/local use with a new csv input file energy_busses.csv. This is also connected to #559.

* [ ]  5. Data which are NOT modified by MVS (e.g. economic_data, EnergyConsumption{`development_costs, age_installed, optimizedCap, input timeSeries`, etc.}) shall not be send back to EPA to prevent redundancy.

Makes sense, should not be too hard to do? However, do we want to have this for our general MVS output as well (local execution) or only for the json file sent to the EPA? I think it would be good to keep inputs and outputs together for local execution.

* [x]  9. `optimizedFlows ` is a duplicate of the flows of each asset, it can be omitted in the returned json

Yeah, if they use the assets individually to plot this is true. Not sure why they do that, though.

Copy link
Copy Markdown
Collaborator

@smartie2076 smartie2076 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • You also created the function C0.compute_timeseries_properties(), which is not in the changelog. Have you written a test for that? f0cad40

  • In 61555b8 you wrote a lot of tests, but can you add debug assertion messages?

  • I need to remember c1b4908 for the energyBus issue

  • In 60dcc9d you have left a number of parameters as strings instead of constant variables.

  • Is there an overview on the helper functions that you create in the utils? data_parser.py does not have extensive docstrings that could explain what you can do with the function.

  • In 82a0868 you replace == etc. with is. Can you add this to the changelog?

Comment on lines +334 to +330
logging.debug(f"Not loading {group} asset {asset} from file")
compute_timeseries_properties(dict_values[group][asset])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what that debug message tells me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* You also created the function `C0.compute_timeseries_properties()`, which is not in the changelog. Have you written a test for that? [f0cad40](https://github.com/rl-institut/multi-vector-simulator/commit/f0cad40d86fa47d45e66895820b8d771a12021db)

I move this code into a function because I wanted to be able to execute this also when not loading timeseries from csv, I did not write tests for it (here the function applies standard sum and max to a timeseries. There was already a kind of local test of the normalized timeseries which I kept. testing that sum does the sum is already done by the builtin test suite on the sum function in that case.

# If Filename defines the generation timeseries, then we have an asset with a lack of dispatchability
dict_values[group][asset].update({DISPATCHABILITY: False})
else:
logging.debug(f"Not loading {group} asset {asset} from file")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what that debug message tells me.

)
return
else:
logging.debug(f"Not loading {group} asset {asset} from file")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what that debug message tells me.

return answer


def select_essential_results(dict_values):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings can be improved

Comment on lines +258 to +324
for asset_group in [
ECONOMIC_DATA,
SIMULATION_SETTINGS,
FIX_COST,
ENERGY_BUSSES,
PATHS_TO_PLOTS,
OPTIMIZED_FLOWS,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work for now, but in a different pr we should find a way to define energy busses manually

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we will run into issues this way, as energyBusses bay be overwritten with C

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible, I did not check how the output looked, I only checked that the simulation was running through with the provided energyBusses, thinking we will then iterate on EPA outputs which are then input of MVS etc...

Comment on lines +86 to +111
ENERGY_PROVIDERS: [
"asset_type",
LABEL,
OEMOF_ASSET_TYPE,
"energy_vector",
INFLOW_DIRECTION,
OUTFLOW_DIRECTION,
CONNECTED_CONSUMPTION_SOURCE,
CONNECTED_FEEDIN_SINK,
DEVELOPMENT_COSTS,
DISPATCH_PRICE,
ENERGY_PRICE,
FEEDIN_TARIFF,
"installed_capacity",
LIFETIME,
"optimize_capacity",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arent energy_vector, installed_capacity, optimize_capacity also constant variables? Or are you using EPA names here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are EPA names, I wanted to specifically not have constant variables for EPA names, so that we see the ones that are different and we are aware of them. It is also helpful in the future to know which constant variable values need to be adapted as they differ than the EPA ones

Comment on lines +302 to +311
# TODO fix input from time parameters for simulation settings
# def test_load_json_parse_pandas_series(self):
# """ """
# k = TYPE_SERIES
# assert self.value_dict[k].equals(JSON_TEST_DICTIONARY[k])
#
# def test_load_json_parse_pandas_series_tuple_name(self):
# """ """
# k = "pandas_series_tuple_name"
# assert self.value_dict[k].equals(JSON_TEST_DICTIONARY[k])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you addressed this in a later commit?

Copy link
Copy Markdown
Collaborator Author

@Bachibouzouk Bachibouzouk Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires a definition of simulation time parameters prior to running these tests #626 (comment)

Comment on lines +102 to +109

epa_dict_values = data_parser.convert_mvs_params_to_epa(dict_values)

output_processing.select_essential_results(epa_dict_values)

json_values = output_processing.store_as_json(epa_dict_values)

return json.loads(json_values)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that there is only the option to store the data in EPA-format. Would it not make sense to have the option to keep the json output as it is, without dropping duplicates? This would keep all data intact and in one place for future consideration/comparision. Does ´mvs_report.py` also work with the lightweight json output?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three things:

  • dropping indexes of timeseries to reduce size is done anyway (store_as_json)
  • convert from EPA to MVS
  • drop some fields that might be redundant (project_data, energy_busses etc)

We can add an option for the server enable or not the last two. I just considered that server.py is primarily meant for MVS/EPA communication so I made this the default.

python mvs_tool.py -i ... (which btw you can replace now by mvs_tool -i ....) calls main function of cli.py, so it is only concerned by the changes which are dropping indexes of timeseries to reduce size (store_as_json).

@Bachibouzouk Bachibouzouk force-pushed the feature/reduce-output branch from fcc5f6c to b3eabdc Compare November 5, 2020 21:36
@Bachibouzouk Bachibouzouk marked this pull request as ready for review November 5, 2020 22:39
@Bachibouzouk Bachibouzouk force-pushed the feature/reduce-output branch from b3eabdc to 66e2b7e Compare November 5, 2020 22:47
@Bachibouzouk Bachibouzouk merged commit 7649baf into dev Nov 5, 2020
@Bachibouzouk Bachibouzouk deleted the feature/reduce-output branch November 5, 2020 23:03
@smartie2076 smartie2076 mentioned this pull request Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants