Skip to content

Conversation

@spamegg1
Copy link
Contributor

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 18, 2023

@spamegg1

Thank you for submitting this PR.

The "lint" failure can be corrected by running scalafmt at the command line. At the project root "scripts/scalafmt",
then git add, git commit, git push. Yell, if that does not work for you. I usually wait for the rest of CI to finish
to see if there are other errors which can be corrected.

Doc built sucessfully, so that is a good thing.

@@ -0,0 +1,52 @@
// Ported from Scala.js commit: 57d71da dated: 2023-05-31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this origin & commit info is quite useful to future developers trying to keep things up to date.

}

@Test def testNPE(): Unit = {
assumeTrue("requires compliant null pointers", Platform.hasCompliantNullPointers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different developers and/or reviewers may well give different suggestions. Wojciech is the lead developer and gatekeeper of the merge, so that is the most valid opinion.

By my understanding, aided by @armanbilge, both JVM and Scala Native have compliant null pointer checks.

My practical suggestion is to comment out line #1736 add a /* */ comment before it stating that both Scala JVM & Scala Native have compliant pointers (giving the url from Arman: https://discord.com/channels/632150470000902164/635668881951686686/1130847953062473738) and stating
that the commented Scala.js line is left as a reference point to the Scala.js original.

Copy link
Contributor

@LeeTibbert LeeTibbert Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full fix would be to edit both

./unit-tests/jvm/src/test/scala/org/scalanative/testsuite/utils/Platform.scala
./unit-tests/native/src/test/scala/org/scalanative/testsuite/utils/Platform.scala

and add a line such as (cribbed from Scala.js JVM Platform.scala) def hasCompliantNullPointers: Boolean = true Again, posting the reference URL as justification in the Scala Native case would be good.

Since the hasCompliant is unlikely to be used outside of the StringJoinerTest, is to place the declaration at the top of the class, with a comment about both JVM and Scala Native
having compliant null pointers, so defining to allow the
Scala.js code to remain unaltered.

Now that I think of it, I like the third approach as useful but minimal effort and unlikely to trip-up future maintainers.
Your thoughts?

Actually, the third alternative requires a slight edit of Platform. to be removed from the assumeTrue.
That makes the include of Platform unnecessary & unused, increasing the edit count.
Dancing in the land of complexity. Arrgh.

Copy link
Contributor Author

@spamegg1 spamegg1 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original approach was to just remove it entirely. But commenting it out (your first suggestion) sounds good.

@LeeTibbert
Copy link
Contributor

Almost there, the failures are because Platform.hasCompliantNullPointers is not defined. But then
we knew that.

The third alternative in one of my review comments suggests defining hasCompliantNullPointers
with appropriate comments, at the top of the file and not mucking around with Platform.

- fix linting
- fix `hasCompliantNullPointer` issues, add comments

@Test def testNPE(): Unit = {

/** Both Scala Native and Scala JVM has compliant null pointers. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice handling of complexity.

A native speaker of English would probably use "have" instead of "has". Not worth a CI run, leave some room for improvement for the next generation.

The content is what is important, IMO.

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 18, 2023

Looks good to me (LGTM). Well done @spamegg1

I am going offline now, but will check again this evening to see if CI encountered any problems.

Comment on lines 44 to 48
if (isEmpty)
isEmpty = false
else
value += delimiter
value += newElement // if newElement is null, adds "null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala.js has used string concat here becouse string are native construct in JS runtime, and this action does not contain performance penalty. However, on JVM or Native we don't have native String primitives and their concatontation is expensive. We should use java.lang.StringBuilder instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

    val builder = java.lang.StringBuilder(value)
    if (isEmpty)
      isEmpty = false
    else
      builder.append(delimiter)

    if (newElement == null)
      builder.append("null")
    else
      builder.append(newElement)

    value = builder.toString()
    this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, something like that. Should do the job. We probably don't need to store result in value

- use `StringBuilder` instead of `+`
- fix typo
Comment on lines 50 to 51
// `+` is expensive on JVM/Native, use StringBuilder instead.
val builder = new java.lang.StringBuilder(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When having a new string builder for each add invocation, its cost would be similar to String.+. Instead we should keep single instance of StringBuilder that would be used for all add operations. It should replace value: String which would not be required any more. In to string we materialize the curren, joined value using StringBuilder.toString.
Sorry if I was not clear about it previously.

Comment on lines +28 to +33
def this(
delimiter: CharSequence,
prefix: CharSequence,
suffix: CharSequence
) =
this(delimiter.toString(), prefix.toString(), suffix.toString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this auxilary constructor as main constructor here. The variant using Strings is a Scala.js specific optimization. We should keep all string-like elements as CharSequence in JVM/Native.

Why to do that? In all this operations we don't really need to materialize a string. By having a CharSequence we can directly use String, StringBuilder, StringBuffer, CharBuffer and others and pass them directly to final StringBuilder for efficent joining.

Comment on lines 19 to 20
/** The current value, excluding prefix and suffix. */
private var value: String = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's works nice in JS, but for native/jvm we can do better. The main operation for adding new string is add(CharSequence). Upon String materialization in toString we need to add custom prefix and suffix. Instead of constructing intermiediete string we can store parts of it comming from add. We can use the fact the List prepend in Scala is very cheap. Here's how I think it could be implemented to more efficient in jvm/native. We have extreamly fast add, and slower materializaiton

private def delimiter, prefix, suffix: CharSequence 
private def emptyValue: CharSequence | Null 
private var parts: List[CharSequence] = Nil 

def add(v: CharSequence) = { parts ::= v; this } 
def toString() = {
 if parts.isEmpty && emptyValue != null 
 then emptyValue 
 else {
   val sb = new java.lang.StringBuilder(this.length()) // alloc exactly as much buffer as required to skip internal buffer resize
   sb.append(prefix) 
   val head :: tail = parts.reverse(): @unchecked // we know parts is non-empty, parts are in reverse order
   sb.append(head)
   tail.foreach{ part => sb.append(delimiter).append(part) }
   sb.append(suffix)
   sb.toString()
 }
 def merge(other: StringJoiner): StringJoiner = { 
    // take all the parts from other and join 
    if (!other.isEmpty){
      parts = other.parts ::: this.parts
    }
    this
 }
 def length(): Int = {
   val baseLength = if(emptyValue != null) emptyValue.length else prefix.length + suffix.length() 
   val delimLength = delimiter.length() 
   parts.foldLeft(baseLength){case (acc, part) => acc + delimLength + part.length }
 }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like, with merge, we have to use the delimiter of other for parts that come from other:

scala> val multiple = StringJoiner(";", "[", "]").setEmptyValue("--").add("a").add("b").add("c")
val multiple: java.util.StringJoiner = [a;b;c]
                                                                                                                      
scala> val joiner = StringJoiner(", ", "{", "}").add("one").merge(multiple).add("two")
val joiner: java.util.StringJoiner = {one, a;b;c, two}
// here a;b;c is using delimiter of multiple, not of joiner

So the parts = other.parts ::: this.parts approach doesn't work (or maybe I'm not seeing something obvious).
What should I do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok, that's oversight from me. So docs state that the other.parts are concatanate using other delimieter, and it should not add prefix/suffix or empty value. So we can do either do 1 one of things here:

  1. flatMap other.parts to create a list of delimiters and parts and then prepend them to this.parts
  2. create new StringBuilder (which is also an instance of CharSequence) and materialize the other joiner without prefix/suffix or empty value. For that we can just copy-past code from toString with minor modifications, or even refector them out to common function. I think this solution might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I kinda understand what you're saying.
I'll have to figure out the string length of (other.parts + the delimiters that go between them) for the new StringBuilder, I might also refactor that into another helper function and reuse it in length().

@ekrich
Copy link
Member

ekrich commented Jul 20, 2023

Just another monkey wrench but you should only use javalibs not Scala but you can use ScalaOps to get some Scala like operations. https://github.com/scala-native/scala-native/blob/main/javalib/src/main/scala/java/util/ScalaOps.scala

@spamegg1
Copy link
Contributor Author

spamegg1 commented Jul 21, 2023

@ekrich

you should only use javalibs not Scala but you can use ScalaOps to get some Scala like operations.

I have an implementation that passes the tests. What can I use in place of scala.collection.immutable.List[CharSequence] that still follows along Wojciech's approach above, and good performance-wise? (Lots of stuff in the java/util folder but I'm not too familiar with them)

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 21, 2023

FYI, the reason for the "please do not use scalalib" is to break a dependency where scalalib relies upon
javalib which relies upon scalalib. We are trying to eventually get rid of all scalalib usage in the javalib
code. If you are reading existing code, you will see many such, mostly from the early days of Scala Native.
The current practice is to be stricter with new code being added or undergoing major renovation.

For me, most of the time knowing the reasoning for a prohibition makes it easier to live with it.

@ekrich Please keep me on the rails here...
I believe the "rule of thumb" is that Scala Collections are not used in javalib but can be used in
the Tests for javalib, i.e. testsuite. I usually avoid the latter just to avoid confusion in my head.

I also usually use a simple List and just avoid any Scala specific methods on List. That is use
only the methods defined in JVM. One can be a little more explicit and specify java.util.List.
Sometimes you will see imports at the top of the file import java.{util => ju} which allow
the former to become ju.List.

What pattern matching is concerning you. I did a quick look through the StringJoiner.scala file
and nothing jumped out as "non-JVM". Sorry if I missed obvious things in my haste.

@spamegg1
Copy link
Contributor Author

spamegg1 commented Jul 21, 2023

@LeeTibbert

I also usually use a simple List and just avoid any Scala specific methods on List. That is use only the methods defined in JVM. One can be a little more explicit and specify java.util.List.

Considering Wojciech's approach above (relying on prepend :: being cheap), how can I use java.util.List to achieve something similar?

I did a quick look through the StringJoiner.scala file and nothing jumped out as "non-JVM".

My current implementation changed completely, I haven't pushed the changes yet. I'm using scala.collection.immutable.List[CharSequence].

Should I just push it?

- making suggested changes
@ekrich
Copy link
Member

ekrich commented Jul 21, 2023

@spamegg1 I really should have given more info like Lee has done. I have been doing this awhile so I wasn't really communicating enough information. ScalaOps gives you basic Scala Operations like foreach and the like so you can use java.util.List and java.util.Set etc. and not have to give up all the niceties of Scala.

We try to learn from Scala.js which has removed all the Scala stdlib from their javalib so we are headed that direction. Most of our collection code has examples because much of it has been ported from Scala.js so that is a good resource many times. Here is some code, it has 2 annotations from the Scala stdlib but not much else.

https://github.com/scala-native/scala-native/blob/main/javalib/src/main/scala/java/util/Properties.scala

This has examples of ScalaOps.exists|foldLeft. https://github.com/scala-native/scala-native/blob/main/javalib/src/main/scala/java/util/IdentityHashMap.scala

- add missing parentheses
Comment on lines 46 to 58
private def lengthOf(
parts: scala.collection.immutable.List[CharSequence],
delim: CharSequence
): Int = {
val delimLength = delim.length
parts match {
case Nil => 0
case head :: tail =>
tail.foldLeft(head.length)((acc, part) =>
acc + delimLength + part.length
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private def lengthOf(
parts: scala.collection.immutable.List[CharSequence],
delim: CharSequence
): Int = {
val delimLength = delim.length
parts match {
case Nil => 0
case head :: tail =>
tail.foldLeft(head.length)((acc, part) =>
acc + delimLength + part.length
)
}
}
// at the top
import java.{util => ju}
import ScalaOps._
private def lengthOf(
parts: ju.List[CharSequence],
delim: CharSequence
): Int = {
val delimLength = delim.length
if (parts.length = 0) 0
else {
val headLen = parts(0).length
parts.subList(1, parts.length).ScalaOps.foldLeft(headLen)((acc, part) =>
acc + delimLength + part.length
)
}
}

This should be close to what you need. Hope this helps.

- greatly simplify implementation using `ScalaOps`
@LeeTibbert
Copy link
Contributor

@spamegg1 CI is all green, great.

I want you/us to succeed early, especially on a first PR.

Without holding up this PR, let me mention a few things for one or more possible PRs.
I mention these as encouragement and to illustrate some of the factors that go into
SN library code. Also the power of supportive review to improve the product (but not
at the last minute).

I have heard the expression "Those who propose, dispose." more than
once. (Meaning, "You think it is a good idea, you do it." Said to keep projects from
ballooning with 1+s and never getting shipped).

  • While reviewing the later change to this code, I discovered that two String.join() are not
    implemented. Carve another notch on your coup stick.
    I created PR javalib String is missing two static join methods #3408 to keep track of the issue.

  • In StringJoiner.scala line 33, it seems like private var isEmpty: Boolean = true and its
    associated bookkeeping later can be replaced by contents.isEmpty() in toString() and merge().
    This gets rid of a var and bookeeping. Taking the size of an ArrayList is more expensive than
    accessing a variable, but it is pretty cheap and easy to get right. I suspect the crossover point
    where one method call is cheaper than a series of increments (in adds) is probably about 10 or so.
    I have no data to backup that WAG (guess).

  • The implementation of StringJoiner.length looks easy to get right but expensive. Instantiating the
    String and doing nothing beyond taking its length means doing a, potentially, lot of memory allocations.

    Seems like a follow on could determine the length by doing the math. All the lengths are known at the
    time of the call. Something like (untried):
    lengthOfPrefix + lengthOfSuffix + (lengthOfSeparator * lengthOfArrayList) +
    contents.foldLeft(0)((a: Int, b: String) => a + b.length())

    The potentially painful part is that Scala 2 can have problems with anonymous_functions/lambdas,
    especially with certain version of Java. Scala2.12 & Java 17 has been giving me a kicking several times
    this week.

    I mention this to cut down frustration if a first submission to CI fails.

    Wojciech has fixed my StreamTest.scala several (three or more, set face red) times., so the
    iterate() tests there give an example of making a lambda robust. I usually just create a
    local def for the function and use that. (I did such two days or so ago. If you want, I can
    try to find the file.)

spamegg1 added 2 commits July 22, 2023 07:26
- making suggested changes
- fixing parentheses
@LeeTibbert
Copy link
Contributor

@spamegg1
The timing of this PR is excellent, see Issue #3409. It sure has flushed out a number of bugs.
Thank you

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 23, 2023

Bug reports are a sign that someone is actually using the product of your hard work (I keep telling myself that
when I get bug reports ;-)).

I tried using the three argument constructor of StringJoner in this PR from java.util.streamCollectors.scala.
I got compilation errors, which sent me off to the races of reading both the Scala.js code and this Scala Native code.
I was quite astonished by the compilation errors, since StreamJoinerTest has cases which use the 3Arg form.

  1. Both have a @inline annotation. That may be good for Scala.js. I can understand its "this was supposed to be
    an easy, straightforward port' transfer to SN. I suspect/believe that the @inline should be deleted. My concern
    is that it will generate lots of code at each use site. (and I am using it in private work).

  2. The problem at hand. The current code declares the StringJoiner constructor with private args.

final class StringJoiner  private (                                                   
    delimiter: CharSequence,                                                    
    prefix: CharSequence,                                                       
    suffix: CharSequence                                                        
) extends AnyRef {  

I believe this makes the constructor inaccessible (outside the package, including sub-packages.??)

Scala.Js declares the primary constructor as private but then has a 1-arg & 3-arg public method (corresponding
to Java API (edited):

final class StringJoiner private (delimiter: String, prefix: String, suffix: String) extends AnyRef {

  def this(delimiter: CharSequence) =
    this(delimiter.toString(), "", "")

  def this(delimiter: CharSequence, prefix: CharSequence, suffix: CharSequence) =
    this(delimiter.toString(), prefix.toString(), suffix.toString())

I think that the toString() will trigger the NullPointerExceptions described in the API without requiring
explicit Objects.NotNull() checks.

There must be a good reason why the code of this PR is the way it is. Sorry if I am missing it.

PS: Once I hacked a private copy of StringJoiner to get the 3-arg constructor, the rest, including merge(),
worked just fine in my work-in-progress Collectors.joiner() tests.

@spamegg1
Copy link
Contributor Author

spamegg1 commented Jul 24, 2023

There must be a good reason why the code of this PR is the way it is. Sorry if I am missing it.

It's my mistake. I wasn't sure but I thought switching to CharSequence made the alternate constructor redundant since every String is a CharSequence. Didn't pay attention to the private keyword there! (I've never seen it used that way in Scala code before.) Yes the null checks were added due to lack of toString().

I'm not sure how to make an alternate constructor now. Should it also take CharSequences? Then I get ambiguous reference to overloaded definition. Do I change it to take String instead? But above Wojciech said to use Scala.js's alternate constructor (with CharSequences) as the main one... should I remove the private from the main constructor instead?

So confused 😕

- changing constructors and fields to use `String`, and removing null checks
@LeeTibbert
Copy link
Contributor

@spamegg1 I need to apologize. You may be confused because it looks like I gave you advise which
conflicts with the CharSequence advice given above by @WojciechMazur. In re-reading this series of
replys, I just came across that. Sorry.

I think the part that is tripping me/us up in trying to convert the original .Js code to SN is the
convenient but limiting Scala mkstring(). That requires String arguments, and the pressure to
use Strings backs up from there all the way to the constructor.

@LeeTibbert
Copy link
Contributor

I have spent several hours on this in the middle of the night and have run out of time.
I will be offline most of the time until Thursday.

I created a private study version of StringJoiner.scala. That revealed a problem with having
the primary constructor arguments as CharSequence. As the testState test in StringJoinerTest
shows, those arguments can be changed from outside after they are passed to the StringJoiner
argument. That means that StringJoiner needs to make a deep copy. Taking a deep copy of a StringBuilder
or a CharSequence other than String is not going to be simple. Cost/benefit considers may well drive
us to using String as the simplest copy.

I suggest that this PR, assuming it passes CI, be merged and remaining issues be addressed in one or
more followup PRs.

I will check in on Thursday to see where the discussion goes.

@spamegg1
Copy link
Contributor Author

@LeeTibbert
Yes in my previous approach I was making immutable String copies of the parameters to fix testState.

In the latest commit I changed the primary (private) constructor to use String instead, while the auxiliary (public) constructor still uses CharSequence. So testState is fine now.

@LeeTibbert
Copy link
Contributor

LeeTibbert commented Jul 27, 2023

This PR gives a good outward facing base level 1. The reasonably done is far better than the absent
prefect.

I have a next evolution PR ready to submit after this PR is merged. It uses a StringBuilder and some other
details to reduce the number of String allocation and character copies in StringJoiner.scala. It passes
StringJoinerTest of this PR, without changing the tests.

It also passes CollectorsTest.scala using my WIP Collectors#joining code which uses StringJoiner.

I have not done performance metrics but think it will be correct and have a lower resource cost.

This will make a difference when someone wants to Collect a stream of 10_000 elements.

I think submitting the file as a follow-on PR will cause less confusion than posting a gist.
Let's get a firm baseline established.

Footnotes

  1. I believe there is one small bug in that CharSequences can be altered after they have been passed
    to add() and the mutation shows up in StringJoiner.toString(). I have not written a test but
    have inspected the code.

@WojciechMazur WojciechMazur merged commit b1bba76 into scala-native:main Aug 8, 2023
@LeeTibbert
Copy link
Contributor

@spamegg1

Congratulation on the merge of your first Scala Native PR.

Programming/Software_Engineering in the large.

Looking forward to your next PR.

WojciechMazur pushed a commit to WojciechMazur/scala-native that referenced this pull request Sep 1, 2023
* Support java.util.StringJoiner
- ported from Scala.js scala-js/scala-js#4873

(cherry picked from commit b1bba76)
WojciechMazur pushed a commit that referenced this pull request Sep 4, 2023
* Support java.util.StringJoiner
- ported from Scala.js scala-js/scala-js#4873

(cherry picked from commit b1bba76)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support java.util.StringJoiner

4 participants