All generator CLIs should accept an --output argument and handle output args in the same way

Looking at this together with @sujaypatil96 

Currently each gen-X method will write to stdout, so the idiomatic usage is:

```bash
gen-X my.yaml > my.X_format
```

This is fine, but we should strive to follow https://clig.dev/ and have an explicit `-o/--output`

Some wrinkles to be aware of:

The way the generator framework currently works is that each generator `print()`s its output. This is sometimes done in a streaming fashion; sometimes all at once, at end_schema time.

E.g. in owlgen we have:

```python
    def end_schema(self, output: Optional[str] = None, **_) -> None:
        data = self.graph.serialize(format='turtle' if self.format in ['owl', 'ttl'] else self.format).decode()
        if output:
            with open(output, 'w') as outf:
                outf.write(data)
        else:
            print(data)
```

Note that end_schema signature is over-ridden to give an output arg (this isn't accessible via the CLI though!)

jsonschemagen also prints its full payload in end_schema -- however, it doesn't override the signature of end_schema:

```python
   def end_schema(self, **_) -> None:
        # create more lax version of every class that is used as an inlined dict reference;
        # in this version, the primary key/identifier is optional, since it is used as the key of the dict
        for cls_name, (id_slot, cls_name_lax) in self.optional_identifier_class_map.items():
            lax_cls = deepcopy(self.schemaobj['$defs'][cls_name])
            lax_cls.required.remove(id_slot)
            self.schemaobj['$defs'][cls_name_lax] = lax_cls
        print(as_json(self.schemaobj, sort_keys=True))
```

in contrast, graphqlgen will stream the output:

```python
def visit_class(self, cls: ClassDefinition) -> bool:
        etype = 'interface' if (cls.abstract or cls.mixin) and not cls.mixins else 'type'
        mixins = ', '.join([camelcase(mixin) for mixin in cls.mixins])
        print(f"{etype} {camelcase(cls.name)}" + (f" implements {mixins}" if mixins else ""))
        print("  {")
        return True
```

in all cases, each generator `print`s output

the parent generator class takes care of routing this to a buffer, which is later exported:

```python
   def serialize(self, **kwargs) -> str:
        """
        Generate output in the required format

        :param kwargs: Generater specific parameters
        :return: Generated output
        """
        output = StringIO()
        with redirect_stdout(output):
            self.visit_schema(**kwargs)
        ....

           self.end_schema(**kwargs)
        return output.getvalue()
```

and a typical CLI will look like this:

```python
def cli(yamlfile, **kwargs):
    """ Generate JSON Schema representation of a LinkML model """
    print(JsonSchemaGenerator(yamlfile, **kwargs).serialize(**kwargs))
```

Some generators such as javagen and markdown gen are deliberate exceptions, as they output multiple files given a directory arg; e.g. in javagen

```python
   def serialize(self, directory: str) -> None:
        sv = self.schemaview

        if self.template_file is not None:
            with open(self.template_file) as template_file:
                template_obj = Template(template_file.read())
        else:
            template_obj = Template(default_template)

        oodocs = self.create_documents()
        self.directory = directory
        for oodoc in oodocs:
            cls = oodoc.classes[0]
            code = template_obj.render(doc=oodoc, cls=cls)

            os.makedirs(directory, exist_ok=True)
            filename = f'{oodoc.name}.java'
            path = os.path.join(directory, filename)
            with open(path, 'w') as stream:
                stream.write(code)

```

This heterogeneity is causing confusion see also #451

I propose that we think in terms of two kinds of generators:

 1. single-payload generators (most)
 2. multi-file generators (javagen, markdowngen, projectgen)

For single-payload generators, each individual generator should not have to worry about output file paths. Its responsibility should just be to write to a stream. The generator superclass should take care of the details of whether that stream goes to a file or stdout. (or this can be deferred to the CLI part of each generator)

I think all of the `output` arguments currently in end_schema can probably be safely removed? This must be some legacy?

For streaming, I don't love the `output = StringIO() ;        with redirect_stdout(output):` idiom but I suggest keeping this in place at first, consistifying everything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All generator CLIs should accept an --output argument and handle output args in the same way #455

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

All generator CLIs should accept an --output argument and handle output args in the same way #455

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions