feat: pull Python types into TypeScript by danvk · Pull Request #126 · refstudio/refstudio

danvk · 2023-06-08T21:09:41Z

This goes a long way towards #19

The Python API is defined using dataclasses in typing.py. The top-level entry point is CliCommands, which defines the return type for each subcommand. These get serialized to JSON Schema using dataclasses-jsonschema. This JSON Schema file gets converted to TypeScript using json-schema-to-typescript.

So Python is the source of truth. To update the TypeScript, run yarn codegen. There's a check for diffs on CI so this should be hard to get wrong (here's what it looks like when this fails). There's also a check that properties in CliCommands and the subcommands stay in sync.

On the TypeScript side, we have a new callSidecar helper that gets us types based on the subcommand:

This should work well for now. A few warts:

I made no attempt to encode the arguments to each subcommand. We'll probably to tackle this as part of a move towards a long-running Python service, whatever form that takes (JSON on stdin, websockets, etc.)
Default properties on a dataclass are a bit awkward to model. For example authors is optional when you create a Reference but because it has a default value, that property will always be present when you receive it as a response. But it still comes out of codegen as optional.
dataclasses-jsonschema is marked as being "in maintenance mode" as of last fall.

danvk · 2023-06-08T21:50:55Z

We can use this to validate responses or generate OpenAPI / Swagger docs if we like.

danvk · 2023-06-08T21:51:26Z

+    commands = [*parser._get_positional_actions()[0].choices.keys()]
+    cli_schema = CliCommands.json_schema()
+
+    # There needs to be a 1-1 match between subcommands and CliCommands properties.


This is the check that the subcommands and the CliCommands dataclass stay in sync.

danvk · 2023-06-08T21:52:53Z

        """
-        prepared_references = []
-        for ref in references:
-            # no need to include reference abstract, contents, or chunks in response


JSON serialization can be done using the dataclass's .to_json() or .to_dict() method, no need to roll it by hand. If we want to drop these fields, we can define a new dataclass that doesn't have them and put it in the API schema.

@danvk it seems this doesn't work for nested dataclasses. for example:

references = [ Reference( source_filename="some_file.pdf", filename_md5="abc123", title="Some title", authors=[ Author(full_name="John Doe"), Author(full_name="Jane Doe"), ], chunks=[ Chunk(text="This is some text."), Chunk(text="This is some more text."), ] ), Reference( source_filename="another_file.pdf", filename_md5="abcdefg123", title="Another title", authors=[ Author(full_name="John Doe"), Author(full_name="Jane Doe"), ], chunks=[ Chunk(text="Some text about something interesting."), Chunk(text="More text about something."), ] ) ]

In [26]: ref = references[0] In [27]: ref.to_json() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[27], line 1 ----> 1 ref.to_json() File ~/Library/Caches/pypoetry/virtualenvs/refstudio-iTeERiwC-py3.10/lib/python3.10/site-packages/dataclasses_jsonschema/__init__.py:1017, in JsonSchemaMixin.to_json(self, omit_none, validate, **json_kwargs) 1016 def to_json(self, omit_none: bool = True, validate: bool = False, **json_kwargs) -> str: -> 1017 return json.dumps(self.to_dict(omit_none, validate), **json_kwargs) File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 226 # cached encoder 227 if (not skipkeys and ensure_ascii and 228 check_circular and allow_nan and 229 cls is None and indent is None and separators is None and 230 default is None and not sort_keys and not kw): --> 231 return _default_encoder.encode(obj) 232 if cls is None: 233 cls = JSONEncoder File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o) 195 return encode_basestring(o) 196 # This doesn't pass the iterator directly to ''.join() because the 197 # exceptions aren't as detailed. The list call should be roughly 198 # equivalent to the PySequence_Fast that ''.join() would do. --> 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): 201 chunks = list(chunks) File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot) 252 else: 253 _iterencode = _make_iterencode( 254 markers, self.default, _encoder, self.indent, floatstr, 255 self.key_separator, self.item_separator, self.sort_keys, 256 self.skipkeys, _one_shot) --> 257 return _iterencode(o, 0) File ~/.pyenv/versions/3.10.11/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o) 160 def default(self, o): 161 """Implement this method in a subclass such that it returns 162 a serializable object for ``o``, or calls the base implementation 163 (to raise a ``TypeError``). (...) 177 178 """ --> 179 raise TypeError(f'Object of type {o.__class__.__name__} ' 180 f'is not JSON serializable') TypeError: Object of type Author is not JSON serializable

Calling to_dict() will run, but Author will not be serialized and will remain as a dataclass. maybe i'm doing something wrong?

i know that asdict will serialize everything.

i think this was ultimately related to using str | None vs Optional[str]

danvk · 2023-06-08T21:55:11Z

    chat = Chat(prompt, n_options)
    response = chat.get_response()
-    sys.stdout.write(json.dumps(response))
+    sys.stdout.write(json.dumps([r.to_dict() for r in response]))


I can't call response.to_dict() or response.to_json() here since it's a list. (Generally it's better to have the top-level response type of APIs be a dict so that you can more easily add new fields.)

Agree that we should try to return dict at top-level to be able to evolve the response payload. We should keep it for how, but create a follow-up ticket.

sehyod

That's nice to have these automatically generated types! 🎉
I don't believe the optional fields are that much of an issue, because we are converting the received type to our internal type anyway (in parsePdfIngestionResponse).
Maybe this function could throw an error (or ignore the entry and log the problem somewhere) if authors is empty. This way we don't have to specify the default value both in python and in typescript

gjreda

LGTM - thanks for this one @danvk!

This reverts commit 6c02266.

danvk added 15 commits June 8, 2023 16:13

some codegen!

c926c76

types from Python schema; some optional property issues

13c700a

Force subcommands to match

0391a93

More verification, use dataclasses on Python side

5a6e6d3

Try to run on CI

879d7f5

rm unused import

129683e

Rework Python modules

251ec10

make a change w/o corresponding codegen

8b1bcd7

fix python import path

cf9875b

fix python test

42fbea6

set -x in codegen.sh

d9b0421

rv codegen change

bdd47d7

trying to make this fail

d41ca6c

pre-push hook, rv doc change

839cfb1

Use types.ts

c0c7d51

danvk commented Jun 8, 2023

View reviewed changes

danvk marked this pull request as ready for review June 8, 2023 22:01

danvk assigned gjreda and unassigned gjreda Jun 8, 2023

danvk requested review from gjreda and sehyod June 8, 2023 22:01

cguedes self-requested a review June 9, 2023 09:42

sehyod reviewed Jun 9, 2023

View reviewed changes

cguedes added 2 commits June 9, 2023 12:42

Merge branch 'main' into api-types

4e97271

Fix main

0d1c28e

cguedes approved these changes Jun 9, 2023

View reviewed changes

gjreda approved these changes Jun 9, 2023

View reviewed changes

sergioramos changed the title ~~Pull Python types into TypeScript~~ feat: pull Python types into TypeScript Jun 9, 2023

sergioramos merged commit 6c02266 into main Jun 9, 2023

sergioramos deleted the api-types branch June 9, 2023 16:47

gjreda added a commit that referenced this pull request Jun 9, 2023

Revert "feat: pull Python types into TypeScript (#126)"

a783aef

This reverts commit 6c02266.

hammer mentioned this pull request Jun 10, 2023

Find replacement for dataclasses-jsonschema #136

Closed

gjreda mentioned this pull request Jun 15, 2023

Defining an API schema #19

Closed

danvk mentioned this pull request Jul 6, 2023

TypeScript types for sidecar CLI args #260

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: pull Python types into TypeScript#126

feat: pull Python types into TypeScript#126
sergioramos merged 17 commits into
mainfrom
api-types

danvk commented Jun 8, 2023 •

edited

Loading

Uh oh!

danvk Jun 8, 2023

Uh oh!

danvk Jun 8, 2023

Uh oh!

Uh oh!

danvk Jun 8, 2023

Uh oh!

gjreda Jun 9, 2023

Uh oh!

gjreda Jun 10, 2023

Uh oh!

danvk Jun 8, 2023

Uh oh!

cguedes Jun 9, 2023

Uh oh!

sehyod left a comment •

edited

Loading

Uh oh!

gjreda left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

danvk commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danvk Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

danvk Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danvk Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

gjreda Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

gjreda Jun 10, 2023

Choose a reason for hiding this comment

Uh oh!

danvk Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

cguedes Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

sehyod left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gjreda left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danvk commented Jun 8, 2023 •

edited

Loading

sehyod left a comment •

edited

Loading