A quick prototype of client side inferencing in the browser. See https://sushanthr.github.io/RapidChat/
Original model - https://github.com/jzhang38/TinyLlama
This page however uses the quantized model from https://huggingface.co/shoibl/TinyLlama-Chat-v1.1-onnx_quantized.