Skip to content

All generator CLIs should accept an --output argument and handle output args in the same way #455

@cmungall

Description

@cmungall

Looking at this together with @sujaypatil96

Currently each gen-X method will write to stdout, so the idiomatic usage is:

gen-X my.yaml > my.X_format

This is fine, but we should strive to follow https://clig.dev/ and have an explicit -o/--output

Some wrinkles to be aware of:

The way the generator framework currently works is that each generator print()s its output. This is sometimes done in a streaming fashion; sometimes all at once, at end_schema time.

E.g. in owlgen we have:

    def end_schema(self, output: Optional[str] = None, **_) -> None:
        data = self.graph.serialize(format='turtle' if self.format in ['owl', 'ttl'] else self.format).decode()
        if output:
            with open(output, 'w') as outf:
                outf.write(data)
        else:
            print(data)

Note that end_schema signature is over-ridden to give an output arg (this isn't accessible via the CLI though!)

jsonschemagen also prints its full payload in end_schema -- however, it doesn't override the signature of end_schema:

   def end_schema(self, **_) -> None:
        # create more lax version of every class that is used as an inlined dict reference;
        # in this version, the primary key/identifier is optional, since it is used as the key of the dict
        for cls_name, (id_slot, cls_name_lax) in self.optional_identifier_class_map.items():
            lax_cls = deepcopy(self.schemaobj['$defs'][cls_name])
            lax_cls.required.remove(id_slot)
            self.schemaobj['$defs'][cls_name_lax] = lax_cls
        print(as_json(self.schemaobj, sort_keys=True))

in contrast, graphqlgen will stream the output:

def visit_class(self, cls: ClassDefinition) -> bool:
        etype = 'interface' if (cls.abstract or cls.mixin) and not cls.mixins else 'type'
        mixins = ', '.join([camelcase(mixin) for mixin in cls.mixins])
        print(f"{etype} {camelcase(cls.name)}" + (f" implements {mixins}" if mixins else ""))
        print("  {")
        return True

in all cases, each generator prints output

the parent generator class takes care of routing this to a buffer, which is later exported:

   def serialize(self, **kwargs) -> str:
        """
        Generate output in the required format

        :param kwargs: Generater specific parameters
        :return: Generated output
        """
        output = StringIO()
        with redirect_stdout(output):
            self.visit_schema(**kwargs)
        ....

           self.end_schema(**kwargs)
        return output.getvalue()

and a typical CLI will look like this:

def cli(yamlfile, **kwargs):
    """ Generate JSON Schema representation of a LinkML model """
    print(JsonSchemaGenerator(yamlfile, **kwargs).serialize(**kwargs))

Some generators such as javagen and markdown gen are deliberate exceptions, as they output multiple files given a directory arg; e.g. in javagen

   def serialize(self, directory: str) -> None:
        sv = self.schemaview

        if self.template_file is not None:
            with open(self.template_file) as template_file:
                template_obj = Template(template_file.read())
        else:
            template_obj = Template(default_template)

        oodocs = self.create_documents()
        self.directory = directory
        for oodoc in oodocs:
            cls = oodoc.classes[0]
            code = template_obj.render(doc=oodoc, cls=cls)

            os.makedirs(directory, exist_ok=True)
            filename = f'{oodoc.name}.java'
            path = os.path.join(directory, filename)
            with open(path, 'w') as stream:
                stream.write(code)

This heterogeneity is causing confusion see also #451

I propose that we think in terms of two kinds of generators:

  1. single-payload generators (most)
  2. multi-file generators (javagen, markdowngen, projectgen)

For single-payload generators, each individual generator should not have to worry about output file paths. Its responsibility should just be to write to a stream. The generator superclass should take care of the details of whether that stream goes to a file or stdout. (or this can be deferred to the CLI part of each generator)

I think all of the output arguments currently in end_schema can probably be safely removed? This must be some legacy?

For streaming, I don't love the output = StringIO() ; with redirect_stdout(output): idiom but I suggest keeping this in place at first, consistifying everything

Metadata

Metadata

Assignees

Labels

clideveloper-dayssmallish tickets that can be considered "maintenance" and fixed within a single sessiongenerator-miscPertaining to more than one generator, or perhaps one that doesn't exist yetgood first issueGood for newcomers

Type

No type

Projects

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions