Generalize StringLike to StreamLike fix #58 by safareli · Pull Request #62 · purescript-contrib/purescript-parsing

safareli · 2017-05-26T02:32:18Z

safareli · 2017-05-26T02:35:10Z

I was thinking on how to solve the issue but decided to actually open pr instead of writing this as comment.

API of whitespace has changed. We can rename string to stream and char to tok and remove alternatives from Token module.

let me know what you think @paf31

safareli · 2017-05-26T19:22:37Z

src/Text/Parsing/Parser/String.purs

+-- |
+class StreamLike f c | f -> c where
+  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: (Position -> Position) }
+  drop :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: (Position -> Position) }


We can name it stripPrefix

paf31 · 2017-06-03T17:54:18Z

src/Text/Parsing/Parser/String.purs

+  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: (Position -> Position) }
+  drop :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: (Position -> Position) }
+
+instance stringLikeString :: StreamLike String Char where


stringLikeString here should be streamLikeString? Same below.

paf31 · 2017-06-03T17:56:48Z

src/Text/Parsing/Parser/String.purs

+instance listcharLikeString :: (Eq a, HasUpdatePosition a) => StreamLike (L.List a) a where
+  uncons f = L.uncons f <#> \({ head, tail}) ->
+    { head: head, updatePos: (_ `updatePos` head), tail}
+  drop (Prefix p') s' = case (tailRecM3 go p' s' id) of -- no MonadRec for Maybe


Maybe it's worth adding stripPrefix to Data.List?

I think Yes

Should we define it like in String ? (ie add Pattern type)?

I'm not sure. I'm tempted to say no, but if you'd like to open a PR, we can discuss it.

purescript/purescript-lists#120

paf31 · 2017-06-03T18:01:46Z

src/Text/Parsing/Parser/String.purs

+-- | Instances must satisfy the following laws:
+-- |
+class StreamLike f c | f -> c where
+  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: (Position -> Position) }


Parens are redundant here around the type of updatePos.

paf31 · 2017-06-03T18:02:16Z

Sorry for the delay. I really like this, once the tailrec and list dependencies are updated, I'll merge this and make a major release. Thanks!

safareli · 2017-06-04T16:07:19Z

src/Text/Parsing/Parser/String.purs

-  cs <- many $ satisfy \c -> c == '\n' || c == '\r' || c == ' ' || c == '\t'
-  pure $ fromCharArray cs
+-- | Match many whitespace characters.
+whiteSpace :: forall f m g. StreamLike f Char => Unfoldable g => Monoid f => Monad m => ParserT f m (g Char)


are you fine with the signature?

safareli · 2017-06-04T16:08:35Z

src/Text/Parsing/Parser/String.purs

+
+-- | Match a whitespace characters but returns them as Array.
+whiteSpace' :: forall f m. StreamLike f Char => Monad m => ParserT f m (Array Char)
+whiteSpace' = many $ satisfy \c -> c == '\n' || c == '\r' || c == ' ' || c == '\t'


Are we sure Data.Array.many is fine here? maybe we should use catenable list or something similar.

safareli · 2017-06-04T16:09:23Z

src/Text/Parsing/Parser/String.purs

-
-- | Match end-of-file.
-eof :: forall s m. StringLike s => Monad m => ParserT s m Unit
+-- |


this description is outdated

are we fine with description and the law?

I think it's fine, yeah.

safareli · 2017-06-04T16:10:58Z

we should update descriptions here and possibly remove some of combinators from Token

paf31 · 2017-06-05T17:00:06Z

👍 Looks good to me, thanks!

I'd be fine with removing overlapping functionality from Token.

@garyb Any comments before I merge this? Obviously it'll be a breaking change.

garyb · 2017-06-05T17:01:42Z

Looks like it's conflicting currently?

garyb · 2017-06-05T17:03:06Z

LGTM if it LGTY though!

paf31 · 2017-06-06T00:05:03Z

@safareli Can you please merge with what's on master?

safareli · 2017-06-06T07:14:27Z

@paf31 Will tackle this on weekend.

instead String{anyChar,satisfy,char} chould be used

safareli · 2017-06-10T12:13:36Z

src/Text/Parsing/Parser/String.purs

+-- |
+class StreamLike f c | f -> c where
+  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: Position -> Position }
+  drop :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: Position -> Position }


Are we fine with this name? stipPrefix could be used instead too

safareli · 2017-06-10T12:44:37Z

src/Text/Parsing/Parser/String.purs


-- | Match the specified string.
-string :: forall s m. StringLike s => Monad m => String -> ParserT s m String
+-- | Match the specified stream.


Are we fine wis descriptions?
We can use function names from Token instead of char and anyChar.

Yes, I think that's better. So maybe prefix or something?

ok will use:

match for char

token for anyChar

prefix for string

Will also rename drop from StreamLike to stipPrefix so it lines up with prefix.

safareli · 2017-06-10T12:45:02Z

src/Text/Parsing/Parser/String.purs

-  cs <- many $ satisfy \c -> c == '\n' || c == '\r' || c == ' ' || c == '\t'
-  pure $ fromCharArray cs
+-- | Match many whitespace character in some Unfoldable.
+whiteSpace :: forall f m g. StreamLike f Char => Unfoldable g => Monoid f => Monad m => ParserT f m (g Char)


We can remove this function as it's still braking change

Remind me again why we'd need to remove it?

If you are operating on list of some tokens you most likely are not gonna use it.

Major use case of this would be to get String as result, but String is not Unfoldable, so you would still need to map over it with stringFromChars.

I think we can just returning Array Char is fine, and if client wants a string they can map over it (as they would need to do it any ways).

If you agree i would remove this function and rename whitespace' to whitespace (this way we wouldn't have two whitespace functions)

Sounds good, thanks!

safareli · 2017-07-30T18:25:09Z

@paf31 can you take a look, I have renamed StreamLike to Stream and added m to it

paf31 · 2017-07-30T18:53:26Z

src/Text/Parsing/Parser/Stream.purs

-class StreamLike f c | f -> c where
-  uncons :: f -> Maybe { head :: c, tail :: f, updatePos :: Position -> Position }
-  stripPrefix :: Prefix f -> f -> Maybe {  rest :: f, updatePos :: Position -> Position }
+class StreamLike s m t | s -> t where


I'm wondering if we should add s -> m here too.

not sure, in parsec this is how it looks class (Monad m) => Stream s m t | s -> t

paf31 · 2017-07-30T18:54:46Z

Looks good to me! Just one comment about that potential fundep. What do you think?

Also, could you please add a small test for a custom stream type which involves a non-trivial monad - Eff perhaps?

safareli · 2017-07-30T18:58:05Z

Will try to add tests using Eff.

paf31 · 2017-11-19T20:07:55Z

@safareli Any more thoughts on testing? If you don't have time, I'm happy to merge this and leave it until later.

safareli · 2017-11-20T18:32:28Z

this comment made me think that maybe pos -> pos is not a good thing, but i don't fully understand kmet's comment, and then I didn't found time try it harder. what you think on that comment?

if you think current signature is fine we can merge it.

safareli · 2017-12-03T01:09:47Z

@paf31 I have addressed last issue I had can you take a look?

s -> m (Maybe { head :: t, tail :: s, updatePos :: Position -> Position }) instead of having updatePos as a result of uncons or stripPrefix now this operations take position with input which is part of a parser state. this way we should allocation less of intermediate objects.

safareli · 2017-12-04T12:26:36Z

@ekmett has made some good points about this design here. if you think it's still fine for us we can go ahead and merge it, otherwise I would need to think on it for a bit more.

JordanMartinez · 2021-09-19T03:39:15Z

Closing due to lack of activity.

JordanMartinez · 2021-09-19T03:39:38Z

Also, it sounded like the design needed to be thought through more to account for kmett's feedback.

safareli · 2021-09-20T08:36:56Z

Also, it sounded like the design needed to be thought through more to account for kmett's feedback.

It's exactly what was stated in the last comment of this PR #62 (comment)

JordanMartinez · 2021-09-20T14:34:31Z

@kmett has made some good points about this design here. if you think it's still fine for us we can go ahead and merge it, otherwise I would need to think on it for a bit more.

I skimmed through kmett's comment, but without studying this in greater detail, I wasn't sure of how to proceed. My interpretation of your last sentence above was that merging this as is could be done, but that we might want to reconsider some things after thinking about kmett's feedback more.

I'll keep this PR open.

safareli · 2021-09-20T15:17:09Z

I'm not actively doing PS any more (for now), so this could be closed. (if anyone is up for taking this on they can just checkout from this branch or copy paste code)

Correctly handle UTF-16 surrogate pairs in `String`s. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```

Correctly handle UTF-16 surrogate pairs in `String`s. We keep all of the API, but we change the primitive parsers so that instead of succeeding and incorrectly returning half of a surrogate pair, they will fail. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Split dev dependencies into spago-dev.dhall. Add benchmark suite. Add astral UTF-16 test. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```

jamesdbrock · 2021-10-08T13:23:36Z

In v7.0.0 we eliminated the StringLike class entirely. The string parser now operates on type String.

Generalize StringLike to StreamLike

f0ba9e4

safareli force-pushed the string branch from d02e52a to f0ba9e4 Compare May 26, 2017 19:06

safareli commented May 26, 2017

View reviewed changes

paf31 reviewed Jun 3, 2017

View reviewed changes

safareli mentioned this pull request Jun 4, 2017

add stripPrefix and Pattern for Data.List purescript/purescript-lists#120

Merged

update list instance

a991f94

safareli commented Jun 4, 2017

View reviewed changes

safareli added 2 commits June 4, 2017 20:13

fix redundant parens and imports

2f59245

update lists

fdcb5ba

safareli added 4 commits June 10, 2017 15:43

Merge branch 'master' into string

4f74e34

update description

9ff887b

add script.test

2471c05

remove Token{token,when,match}

ad4a76c

instead String{anyChar,satisfy,char} chould be used

safareli commented Jun 10, 2017

View reviewed changes

safareli force-pushed the string branch from 30ae3d9 to 3a7d083 Compare June 11, 2017 07:27

add 'drop (Prefix a) a >>= uncons = Nothing' law

b89442b

safareli force-pushed the string branch from 3a7d083 to b89442b Compare June 11, 2017 07:27

safareli added 3 commits June 18, 2017 15:45

remove String.whitespace

67926be

rename String.char to String.match

453d6a1

rename String.anyChar to String.token

96dc7da

use correct wording in setisfy

ea96e73

paf31 reviewed Jul 30, 2017

View reviewed changes

safareli force-pushed the string branch from 80785b0 to 61d6317 Compare December 3, 2017 01:18

Merge branch 'master' into string

13d4bf1

safareli mentioned this pull request Dec 3, 2017

combine purescript-string-parsers ans purescript-parsing #69

Closed

thomashoneyman changed the base branch from master to main October 6, 2020 02:58

JordanMartinez closed this Sep 19, 2021

safareli mentioned this pull request Sep 20, 2021

Generalize StringLike to StreamLike fix safareli/purescript-parsing#1

Open

JordanMartinez reopened this Sep 20, 2021

jamesdbrock mentioned this pull request Sep 24, 2021

Unicode correctness #119

Merged

jamesdbrock closed this Oct 8, 2021

Conversation

safareli commented May 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

safareli commented May 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paf31 commented Jun 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

safareli commented Jun 4, 2017

Uh oh!

paf31 commented Jun 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyb commented Jun 5, 2017

Uh oh!

garyb commented Jun 5, 2017

Uh oh!

paf31 commented Jun 6, 2017

Uh oh!

safareli commented Jun 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

safareli commented Jul 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paf31 commented Jul 30, 2017

Uh oh!

safareli commented Jul 30, 2017

Uh oh!

paf31 commented Nov 19, 2017

Uh oh!

safareli commented Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

safareli commented Dec 3, 2017

Uh oh!

safareli commented May 26, 2017 •

edited

Loading

paf31 commented Jun 5, 2017 •

edited

Loading

safareli commented Nov 20, 2017 •

edited

Loading

safareli commented Dec 4, 2017 •

edited

Loading