cheshire icon indicating copy to clipboard operation
cheshire copied to clipboard

Stream parsing is limited to one item only

Open seprich opened this issue 10 years ago • 6 comments

I ran into problems with stream parsing where messages mysteriously disappear when using parse-stream with (java.io.BufferedReader. (java.io.InputStreamReader. (java.io.PipedInputStream. op))) -kind of stream (op is java.io.PipedOutputStream getting its content ultimately from TCP socket).

After debugging it turns out that the problem boils down to this:

user=> (require '[cheshire.core :as json])
nil
user=> (def reader (java.io.BufferedReader. (java.io.StringReader. "{\"a\": \"test\"}{\"b\": \"second\"}")))
#'user/reader
user=> (def a (json/parse-stream reader true))
#'user/a
user=> (def b (json/parse-stream reader true))
#'user/b
user=> a
{:a "test"}
user=> b
nil

parse-stream returns the first item and throws away the rest. If the stream content happens to come from network the consecutive parse-stream calls may even seem to work properly if only one JSON message at time becomes available for the .read -method of the Reader, however with streams the amount of bytes is arbitrary and one .read(..) may contain multiple JSON objects and parser should yield them all.

ADDED: The parsed-seq seems to resolve the the String backed BufferedReader example above but doesn't seem to work with PipedInputStream where data is added. Also seems that its intended use case does not match this problem.? (Lazy access to static content vs. streams?)

seprich avatar Dec 08 '15 00:12 seprich

Here is yet better example:

user=> (require '[cheshire.core :as json])
nil
user=> (def op (java.io.PipedOutputStream.))
#'user/op
user=> (def ip (java.io.PipedInputStream. op))
#'user/ip
user=> (def reader (->> (java.nio.charset.Charset/forName "UTF8")
  #_=>                  (java.io.InputStreamReader. ip)
  #_=>                  (java.io.BufferedReader.)))
#'user/reader
user=> (def blob (byte-array (concat (.getBytes (json/generate-string {:a "test"})) (.getBytes (json/generate-string {:b "sec"})))))
#'user/blob
user=> (.write op blob 0 (count blob))
nil
user=> (json/parsed-seq reader true)

.. in which point it deadlocks, probably waiting for EOF.

  • Using parse-stream drops away the second message as shown in the StringReader example above
  • Replaced cheshire with (clojure.data.json/read reader :key-fn keyword) and everything works as expected.

seprich avatar Dec 08 '15 01:12 seprich

@dakrone I am also running into this issue. It would be great if we could call parse-stream multiple times on the same reader. I have this use case:

Content-Type: foo

{"a": 1 ....
some json
}

Content-Type: foo

{"b": 1 ....
some other json
}

So I need to alternate read-line with json/parse-stream. I can do this using clojure.data.json, but I would prefer to have this in cheshire, since cheshire is bundled with babashka and I would like to support this in babashka scripts.

If you are open to this, I can look into a PR for this.

borkdude avatar Feb 02 '21 19:02 borkdude

@dakrone In jet I managed to do it this way:

  
(ns jet.formats
  {:no-doc true}
  (:require
   [cheshire.core :as cheshire]
   [cheshire.factory :as factory]
   [cheshire.parse :as cheshire-parse]
   ...)
  (:import [com.fasterxml.jackson.core JsonFactory]
           [java.io Reader]
        ...))

(set! *warn-on-reflection* true)

(defn json-parser []
  (.createParser ^JsonFactory (or factory/*json-factory*
                                  factory/json-factory)
                 ^Reader *in*))

(defn parse-json [json-reader keywordize]
  (cheshire-parse/parse-strict json-reader keywordize ::EOF nil))

So maybe just exposing a function like json-parser (maybe in the parse namespace?) and adding some docs of how to use these together with parse/parse-strict would be sufficient as an alternative solution.

However I still can't read anything other than JSON also using this approach.

borkdude avatar Feb 02 '21 19:02 borkdude

@borkdude I'd definitely be interested if you would like to look into a PR for this!

dakrone avatar Feb 05 '21 00:02 dakrone

@borkdude I found that JsonParser has a .releaseBuffered method that you could use but I imagine it requires some annoying buffer management.

nilern avatar Mar 18 '21 17:03 nilern

@nilern Luckily when reading from stdin it works like intended. You can read one JSON object when feeding one object to stdin, and then wait, then feed the next one and read the next one.

borkdude avatar Mar 18 '21 18:03 borkdude