feat: pull Python types into TypeScript#126
Conversation
There was a problem hiding this comment.
We can use this to validate responses or generate OpenAPI / Swagger docs if we like.
| commands = [*parser._get_positional_actions()[0].choices.keys()] | ||
| cli_schema = CliCommands.json_schema() | ||
|
|
||
| # There needs to be a 1-1 match between subcommands and CliCommands properties. |
There was a problem hiding this comment.
This is the check that the subcommands and the CliCommands dataclass stay in sync.
| """ | ||
| prepared_references = [] | ||
| for ref in references: | ||
| # no need to include reference abstract, contents, or chunks in response |
There was a problem hiding this comment.
JSON serialization can be done using the dataclass's .to_json() or .to_dict() method, no need to roll it by hand. If we want to drop these fields, we can define a new dataclass that doesn't have them and put it in the API schema.
There was a problem hiding this comment.
@danvk it seems this doesn't work for nested dataclasses. for example:
references = [
Reference(
source_filename="some_file.pdf",
filename_md5="abc123",
title="Some title",
authors=[
Author(full_name="John Doe"),
Author(full_name="Jane Doe"),
],
chunks=[
Chunk(text="This is some text."),
Chunk(text="This is some more text."),
]
),
Reference(
source_filename="another_file.pdf",
filename_md5="abcdefg123",
title="Another title",
authors=[
Author(full_name="John Doe"),
Author(full_name="Jane Doe"),
],
chunks=[
Chunk(text="Some text about something interesting."),
Chunk(text="More text about something."),
]
)
]In [26]: ref = references[0]
In [27]: ref.to_json()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 ref.to_json()
File ~/Library/Caches/pypoetry/virtualenvs/refstudio-iTeERiwC-py3.10/lib/python3.10/site-packages/dataclasses_jsonschema/__init__.py:1017, in JsonSchemaMixin.to_json(self, omit_none, validate, **json_kwargs)
1016 def to_json(self, omit_none: bool = True, validate: bool = False, **json_kwargs) -> str:
-> 1017 return json.dumps(self.to_dict(omit_none, validate), **json_kwargs)
File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
226 # cached encoder
227 if (not skipkeys and ensure_ascii and
228 check_circular and allow_nan and
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
195 return encode_basestring(o)
196 # This doesn't pass the iterator directly to ''.join() because the
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
252 else:
253 _iterencode = _make_iterencode(
254 markers, self.default, _encoder, self.indent, floatstr,
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
160 def default(self, o):
161 """Implement this method in a subclass such that it returns
162 a serializable object for ``o``, or calls the base implementation
163 (to raise a ``TypeError``).
(...)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
TypeError: Object of type Author is not JSON serializableCalling to_dict() will run, but Author will not be serialized and will remain as a dataclass. maybe i'm doing something wrong?
i know that asdict will serialize everything.
There was a problem hiding this comment.
i think this was ultimately related to using str | None vs Optional[str]
| chat = Chat(prompt, n_options) | ||
| response = chat.get_response() | ||
| sys.stdout.write(json.dumps(response)) | ||
| sys.stdout.write(json.dumps([r.to_dict() for r in response])) |
There was a problem hiding this comment.
I can't call response.to_dict() or response.to_json() here since it's a list. (Generally it's better to have the top-level response type of APIs be a dict so that you can more easily add new fields.)
There was a problem hiding this comment.
Agree that we should try to return dict at top-level to be able to evolve the response payload. We should keep it for how, but create a follow-up ticket.
There was a problem hiding this comment.
That's nice to have these automatically generated types! 🎉
I don't believe the optional fields are that much of an issue, because we are converting the received type to our internal type anyway (in parsePdfIngestionResponse).
Maybe this function could throw an error (or ignore the entry and log the problem somewhere) if authors is empty. This way we don't have to specify the default value both in python and in typescript
This goes a long way towards #19
The Python API is defined using dataclasses in
typing.py. The top-level entry point isCliCommands, which defines the return type for each subcommand. These get serialized to JSON Schema using dataclasses-jsonschema. This JSON Schema file gets converted to TypeScript using json-schema-to-typescript.So Python is the source of truth. To update the TypeScript, run
yarn codegen. There's a check for diffs on CI so this should be hard to get wrong (here's what it looks like when this fails). There's also a check that properties inCliCommandsand the subcommands stay in sync.On the TypeScript side, we have a new
callSidecarhelper that gets us types based on the subcommand:This should work well for now. A few warts:
authorsis optional when you create aReferencebut because it has a default value, that property will always be present when you receive it as a response. But it still comes out of codegen as optional.