-
Notifications
You must be signed in to change notification settings - Fork 162
All generator CLIs should accept an --output argument and handle output args in the same way #455
Description
Looking at this together with @sujaypatil96
Currently each gen-X method will write to stdout, so the idiomatic usage is:
gen-X my.yaml > my.X_formatThis is fine, but we should strive to follow https://clig.dev/ and have an explicit -o/--output
Some wrinkles to be aware of:
The way the generator framework currently works is that each generator print()s its output. This is sometimes done in a streaming fashion; sometimes all at once, at end_schema time.
E.g. in owlgen we have:
def end_schema(self, output: Optional[str] = None, **_) -> None:
data = self.graph.serialize(format='turtle' if self.format in ['owl', 'ttl'] else self.format).decode()
if output:
with open(output, 'w') as outf:
outf.write(data)
else:
print(data)Note that end_schema signature is over-ridden to give an output arg (this isn't accessible via the CLI though!)
jsonschemagen also prints its full payload in end_schema -- however, it doesn't override the signature of end_schema:
def end_schema(self, **_) -> None:
# create more lax version of every class that is used as an inlined dict reference;
# in this version, the primary key/identifier is optional, since it is used as the key of the dict
for cls_name, (id_slot, cls_name_lax) in self.optional_identifier_class_map.items():
lax_cls = deepcopy(self.schemaobj['$defs'][cls_name])
lax_cls.required.remove(id_slot)
self.schemaobj['$defs'][cls_name_lax] = lax_cls
print(as_json(self.schemaobj, sort_keys=True))in contrast, graphqlgen will stream the output:
def visit_class(self, cls: ClassDefinition) -> bool:
etype = 'interface' if (cls.abstract or cls.mixin) and not cls.mixins else 'type'
mixins = ', '.join([camelcase(mixin) for mixin in cls.mixins])
print(f"{etype} {camelcase(cls.name)}" + (f" implements {mixins}" if mixins else ""))
print(" {")
return Truein all cases, each generator prints output
the parent generator class takes care of routing this to a buffer, which is later exported:
def serialize(self, **kwargs) -> str:
"""
Generate output in the required format
:param kwargs: Generater specific parameters
:return: Generated output
"""
output = StringIO()
with redirect_stdout(output):
self.visit_schema(**kwargs)
....
self.end_schema(**kwargs)
return output.getvalue()and a typical CLI will look like this:
def cli(yamlfile, **kwargs):
""" Generate JSON Schema representation of a LinkML model """
print(JsonSchemaGenerator(yamlfile, **kwargs).serialize(**kwargs))Some generators such as javagen and markdown gen are deliberate exceptions, as they output multiple files given a directory arg; e.g. in javagen
def serialize(self, directory: str) -> None:
sv = self.schemaview
if self.template_file is not None:
with open(self.template_file) as template_file:
template_obj = Template(template_file.read())
else:
template_obj = Template(default_template)
oodocs = self.create_documents()
self.directory = directory
for oodoc in oodocs:
cls = oodoc.classes[0]
code = template_obj.render(doc=oodoc, cls=cls)
os.makedirs(directory, exist_ok=True)
filename = f'{oodoc.name}.java'
path = os.path.join(directory, filename)
with open(path, 'w') as stream:
stream.write(code)This heterogeneity is causing confusion see also #451
I propose that we think in terms of two kinds of generators:
- single-payload generators (most)
- multi-file generators (javagen, markdowngen, projectgen)
For single-payload generators, each individual generator should not have to worry about output file paths. Its responsibility should just be to write to a stream. The generator superclass should take care of the details of whether that stream goes to a file or stdout. (or this can be deferred to the CLI part of each generator)
I think all of the output arguments currently in end_schema can probably be safely removed? This must be some legacy?
For streaming, I don't love the output = StringIO() ; with redirect_stdout(output): idiom but I suggest keeping this in place at first, consistifying everything
Metadata
Metadata
Assignees
Labels
Type
Projects
Status