Skip to content

Stdlib: New functions Map.find_after and Map.find_before#665

Closed
gerdstolpmann wants to merge 4 commits intoocaml:trunkfrom
gerdstolpmann:map-find
Closed

Stdlib: New functions Map.find_after and Map.find_before#665
gerdstolpmann wants to merge 4 commits intoocaml:trunkfrom
gerdstolpmann:map-find

Conversation

@gerdstolpmann
Copy link

This implements the suggestion of http://caml.inria.fr/mantis/view.php?id=7277

I'm going here with Alain's idea of iterating over the successors/predecessors of the key until the callback function returns true.

@alainfrisch
Copy link
Contributor

My proposal was a bit different, namely:

 val find_first: (key -> bool) -> 'a t -> (key * 'a) option
 val find_last: (key -> bool) -> 'a t -> (key * 'a) option

where the predicate is assumed to be monotonic. One could also add:

val partition: (key -> bool) -> 'a t -> 'a t * 'a t

(And probably the same for Set.)

@c-cube
Copy link
Contributor

c-cube commented Jul 6, 2016

This find_first looks like it would be easily implemented with iterators from #635 (with a generic Iter.find p or Iter.filter p |> Iter.head). Just saying! :-)
The find_last would require the iterators that traverse the map in decreasing key order.

@gerdstolpmann
Copy link
Author

Yep, it's a little bit generalized. find_first could already be emulated by iteration, and find_last can often be reduced to find_firstby the user. These may also be considered as gaps, but it is not as bad as when you want to find something in the vicinity of a key, where you can so far only resort to Map.split. However, splitting is fairly inefficient.

@c-cube You can reduce find_before and find_after to iterators (but that's IMHO an alternate implementation and not a substitute). My guess is that you'll face resistence to your iterator approach because some people may regard it as too imperative (though it isn't). A better name/direction might here be views - you restrict the view of the map to a subset without having to construct the subset. Implementation-wise this is close to iterators, because views mostly also allow iteration to access the elements (but you could extend that, and allow other operations like find).

@alainfrisch
Copy link
Contributor

I don't see how an iterator-based approach would work. The idea here is that the lookup should take O(log n) time.

@alainfrisch
Copy link
Contributor

find_first could already be emulated by iteration

Except that because of the assumption on the predicate to be monotonic, the lookup is O(log N) instead of O(n).

@gerdstolpmann
Copy link
Author

Sometimes it is good to have lunch. I finally got your idea, Alain, it's a bisection following the tree, right? Given that c-cube was also on the wrong track, I guess we should also name it like that, i.e. bisect_first and bisect_last. This is definitely something different.

Implementation-wise there is a nasty case - when you get to a leaf for bisect_first you'd have to find the next bigger element which is then somewhere else in the tree, and you have to wind up.

I wonder whether this is better, I'm not sure whether you always have a monotone f. This depends on the application.

@Gbury
Copy link
Contributor

Gbury commented Jul 6, 2016

@alainfrisch I don't understand your remark about complexity. From what I understand of the code and what the documentation says, the given predicate will be called on every key greater than the given start (in the case of find_after) in increasing order, in which case it is exactly the same as calling the predicate on every element returned by an iterator which starts at the given key (see to_iter_at in PR #635 ). Indeed, the first lookup to get to the correct key will be in O(log(N)), but afterwards, you have to iterate over all elements which is in O(N) (well, linear in the number of remaining elements). Or at least that's what the current code seems to do.
Using the fact that the predicate is monotonic, some sort of dichotomy could work and be faster (and probably what you mean by a logarithmic complexity), but then the documentation should probably be changed to reflect the fact that the predicate may not be called on every element greater than the given start.

@alainfrisch
Copy link
Contributor

alainfrisch commented Jul 6, 2016

My comments applied to my proposal, not to this specific PR. To fix the idea, my proposal is:

    let rec find_first f = function
      | Empty ->
          None
      | Node(l, v, d, r, _) ->
          if f v then
            match find_first f l with
            | None -> Some (v, d)
            | Some _ as r -> r
          else
            find_first f r

It's indeed a bisection, although bisecting seems more like an internal implementation detail. The specification is really: find the first element matching a monotonic predicate.

I'm not sure whether you always have a monotone f.

Typical uses would be:

 find_first (fun x -> compare x0 x <= 0)

or

 find_first (fun x -> compare x0 x < 0)

which correspond to the initial need (find the first binding after a given key, with the two strict / non-strict variants of "after").

@bobot
Copy link
Contributor

bobot commented Jul 6, 2016

@alainfrisch I like your proposal. Do you compared the speed of your version and a tail-rec one?

@alainfrisch
Copy link
Contributor

A tail-rec version that avoids intermediate allocations:

    let rec find_first_aux v0 d0 f = function
      | Empty ->
          Some (v0, d0)
      | Node(l, v, d, r, _) ->
          if f v then
            find_first_aux v d f l
          else
            find_first_aux v0 d0 f r


    let rec find_first f = function
      | Empty ->
          None
      | Node(l, v, d, r, _) ->
          if f v then
            find_first_aux v d f l
          else
            find_first f r

I did not benchmark any of those, but at least the tailrec version seems hard to beat (except by providing versions specialized to given predicate forms).

@bobot
Copy link
Contributor

bobot commented Jul 7, 2016

@alainfrisch Oh nice, I though only about the tail-rec version with allocation. Yours is better and indeed no benchmark is needed. If one day the optimizer can transform the tail-rec with allocation into a tail-rec without (the ('a * 'b) option is passed in registers) we could simplify the code.

@alainfrisch
Copy link
Contributor

@gerdstolpmann Would my proposal cover your use cases? If so, feel free to include the code in your PR. I'd also suggest to add similar functions to Set.

@alainfrisch
Copy link
Contributor

If one day the optimizer can transform the tail-rec

I would not bet on that, and generally speaking, such hand-optimization in low-level common data structures are not too bad (more explicit performance profile, less dependent on compiler internals, benefits to alternative backends -- js_of_ocaml or Bucklescript).

@gerdstolpmann
Copy link
Author

Thinking twice, it is probably less work to copy map.ml to my own sources and add it there than to push for a patch. Sorry for opening this, I should have known better.

@bobzhang
Copy link
Member

g2p added a commit to g2p/ocaml that referenced this pull request Oct 24, 2016
Finds a binding where the key is the ceiling for a given key.

Addresses ocaml#665.
@g2p g2p mentioned this pull request Oct 24, 2016
@g2p
Copy link
Contributor

g2p commented Oct 24, 2016

I've implemented a similar find_ceil before reading @alainfrisch's proposal.
Sending the pull request in case it's useful to anyone else.

g2p added a commit to g2p/ocaml that referenced this pull request Oct 24, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
g2p added a commit to g2p/ocaml that referenced this pull request Oct 24, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
@g2p
Copy link
Contributor

g2p commented Oct 24, 2016

And I've implemented something relatively complete (first and last, maps and sets) in #869, based on the above proposal.

g2p added a commit to g2p/ocaml that referenced this pull request Oct 24, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
@yminsky
Copy link

yminsky commented Oct 25, 2016

You might want to look at the somewhat more general "closest_key" function in Core/Base:

https://github.com/janestreet/base/blob/master/src/base_map.mli#L343

Having the ability to use > or >= as the criterion is valuable in my experience.

@alainfrisch
Copy link
Contributor

The functions proposed in #869 do allow to use > or >=.

@yminsky
Copy link

yminsky commented Oct 25, 2016

Ah, you're right. Sorry for the noise.

On Tue, Oct 25, 2016 at 7:46 AM Alain Frisch [email protected]
wrote:

The functions proposed in #869 #869
do allow to use > or >=.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#665 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AArqJpkqO1HOr0dESMTv-D21eUhvngBYks5q3ewMgaJpZM4JGCVV
.

g2p added a commit to g2p/ocaml that referenced this pull request Nov 8, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 9, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 10, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
g2p added a commit to g2p/ocaml that referenced this pull request Nov 12, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
alainfrisch pushed a commit that referenced this pull request Nov 12, 2016
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses #665, supersedes #868.
Thanks @alainfrisch for the idea and most of the implementation.
camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
Finds the first/last binding where the key satisfies a monotonic
predicate.

Addresses ocaml#665, supersedes ocaml#868.
Thanks @alainfrisch for the idea and most of the implementation.
@vicuna vicuna mentioned this pull request Mar 14, 2019
stedolan added a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
Simplify "make build_upstream"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants