MISO2 Database

Created on Tue Aug 23 10:20:55 2022

@author: bgrammer

class model.output.MISO2_database.MISO2Database

Aggregate object for MISO2Outputs.

Stores regional outputs into larger dataframes and provides convenience wrapper to subset the data. Only outputs with the same system definition and run options can be added to a database object. It is possible to add both MISO2Outputs and MISO2Stats, although the latter needs to be finalized. The output types that are aggregated are determined by the shape and index of the MISO2Output’s data, with series having only one column, enduse containing a sector index, and cohorts containing both a time and a sector index. All other dataframes are added as regular outputs.

_values

Holds dataframes for four output types (“series”, “enduse”, “cohorts”, “outputs”).

Type:

dict

_metadata

Metadata of database object. Contains references for MFA system definition and run options as well as original outputs and stats metadata and saved file names.

Type:

dict

add_miso_outputs(miso_outputs, save_cohorts=True, db_folder=None)

Add MISO2Output files to database.

If not yet set, control references for MFA system definition and run options are taken from the first output in the list. If enduse cohorts are present, they will be saved as parquet into an external folder to be lazily loaded as a Dask dataframe. Please note that is not written for efficiency, since it is expected to be a one-time operation. A large number of output files might lead to significant processing time.

Parameters:
  • miso_outputs (list) – List of MISO2Outputs.

  • save_cohorts (bool) – If True (default), cohorts, if any are present in the outputs, will be saved as parquets in the db folder.

  • db_folder (str) – Relative path folder name where parquets are saved. This will override any existing associated database folder of the object. Defaults “miso_database”.

Raises:
  • TypeError – If trying to add non-MISO2Output or an empty input list.

  • ValueError – If output MFA system definition and run options do not match database control values.

create_database_from_output_files(directory, output_type='output')

Convenience wrapper to fill a MISO2 database from outputs on disk.

The function will read in all .json metadata in the directory and all subdirectories

Note that there are some validity checks in place to prevent adding outputs with different system definitions and run options, but these do not cover all possible scenarios of garbled input. Make sure that the specified directory only contains MISO2Outputs and MISO2Stats that you really want to add to the database.

Parameters:
  • directory (str) – Path to folder which is to be read in.

  • output_type (str) – One of “output” or “stats”

get_categories(df_type=None)

Return unique index values of the database.

Parameters:

df_type (string) – Specific dataframe type to be returned. One of “series”,”enduse”,”cohorts”,”outputs”. If None (default), all categories are returned.

Returns:

Dictionary of available categories.

Return type:

indices(dict)

get_cohorts(cohort_paths=None)

Returns a Dask dataframe of the enduse cohorts from parquet.

Parameters:

cohort_paths (str/list/glob) – Path, list of path or glob from where to read in cohorts. If None (default), the database will try to read in cohorts from its associated filepaths.

Returns:

Dask DataFrame with cohorts.

Return type:

dask_cohorts(Dask.DataFrame)

get_subset(output_type='outputs', result_type=slice(None, None, None), region=slice(None, None, None), parameter=slice(None, None, None), material=slice(None, None, None), sector=slice(None, None, None), years=slice(None, None, None), drop_levels=False)

Convenience method to subset the multi-index database and return a copy of the result.

Single value arguments can be passed as strings, while multi-value arguments need to be passed as lists or tuples. All arguments default to slice(None), returning the entire index if not otherwise specified. Use the get_categories() method to see which index values are available.

Parameters:
  • output_type (string) – One of “series”, “enduses”, “cohorts” or “outputs” (the default).

  • result_type (str/list) – The type of result to be returned. One of e.g. “result”, “mean”, “s_var”.

  • region (str/list) – The region, e.g. “Austria” or “World”.

  • parameter (str/list) – The name of the output parameter.

  • material (str/list) – The name of the material.

  • sector (str/list) – The name of the sector.

  • years (tuple) – Start and end year to be returned. The upper bound is included.

  • drop_levels (bool) – Remove single-value levels of subset when querying.

Returns:

A subset of a dataframe.

Return type:

df(Pandas.DataFrame/Pandas.Series)

Raises:

ValueError

restore_from_parquet(folder_path, metadata_filename)

Restore a MISO2Database object from saved files.

Parameters:
  • folder_path (str) – Location of files.

  • metadata_filename (str) – Name of metadata JSON file.

Raises:

FileNotFoundError – When metadata file does not exist at given location.

save_to_file(file_type, output_path, export_cohorts=False, filename='MISO2_data')

Wrapper method for saving database to file.

Cohorts are already saved as parquets, so they will not be exported via save in that format.

Parameters:
  • file_type (str) – One of “xls”, “parquet” and “csv”

  • output_path (str) – Output folder

  • export_cohorts (bool) – Export cohorts. These files may be excessively large. Defaults to false.

  • filename (str) – Filename prefix of output files