Skip to content

New workflow: MapReduce #14

@gpauloski

Description

@gpauloski

MapReduce Workflow

Goal: Create a simple map-reduce workflow for WEBS.

This will require finding a suitable map-reduce data set, but we are open to pretty much any example. For example, counting word examples in a text dataset would be sufficient.

Steps

  • Read the Creating Workflows in WEBS Guide.
  • Rewrite the reference information using the structure described in the guide.
    At the end, you should be able to run:
    python -m webs.run mapreduce {args}
    to run the workflow.
  • Write up instructions for running the workflow as a module docstring in webs/wf/mapreduce/__init__.py. This should include:
    • Installation instructions if there are external dependencies.
    • Data download instructions.
    • Discussion of important parameters or results.

Tips

  • Please reply to the issue with any questions.
  • The WEBS CI requires 100% code coverage. It is okay to exclude the workflow from coverage by adding the following to pyproject.toml.
    [tool.coverage.run]
    ...
    omit = [
        "examples",
        "webs/wf/{workflow-dir}/",
    ]
  • Python dependencies needed by the workflow can be included as an "extras" option in the pyproject.toml.
    [project.optional-dependencies]
    mapreduce = ["numpy", "pandas"]
    Non-Python dependencies will require adding documentation instructions in webs/wf/{workflow-name}/__init__.py.
  • If the reference implementation includes a license, we will need to include that in third_party_licenses/{workflow-name}-{licence-type}.md.

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions