Skip to content

[6.0] Add lazy collections and an enumerable contract #29415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 16, 2019

Conversation

JosephSilber
Copy link
Contributor

@JosephSilber JosephSilber commented Aug 5, 2019

This PR adds a new LazyCollection class and an Enumerable interface.

This is similar to Swift's LazyCollection and C#.NET's IEnumerable.

Here's a pithy post describing the benefits of lazy collections.


This PR is purely additive. It does not affect the current Collection class, and is fully backwards compatible.


Motivation

The Collection class is one of the most underappreciated tools in the Laravel toolchain. It allows you to declaratively chain methods using its fluent API, mapping, filtering, tranforming, sorting and reducing its values in an extremely readable format.

However, this comes with two downsides:

  1. Every additional method invocation creates a new array. If you're dealing with a collection with thousands (or even millions) of items, reallocating and garbage collecting such a huge array at every method call can be quite taxing.

  2. The Collection class only works with a static list of items. If the items are being streamed (from an API, from the DB, or from lines in a multi-GB file), the collection class as it stands can't help us.

Explanation

  1. Whereas the existing Collection class wraps a native array, the new LazyCollection class wraps a native iterator, while still providing most of the methods that we know and love from a regular collection:

    LazyCollection::times(INF)
        ->map(function ($number) {
            return $number ** 3;
        })
        ->filter(function ($number) {
            return $number % 2;
        })
        ->take(100)
        ->all();
    
        // [1, 27, 125, 343, 729, 1331, ...]

    This starts off by creating a lazy collection that has an infinite number of elements, maps those numbers, filters them, and then only takes 100 of them. Since the collection is a lazy collection, starting off with an infitine amount of elements is not a problem. No elements are actually generated until we pull them out of the collection. Since we finished our chain with ->take(100), it will stop generating new values after it generated 100 items that satisfy the filter.

    This isn't really a very useful example, but it serves as a simple demonstration of what's possible.

  2. Let's look at a more realistic example. Imagine a remote products API. We're looking for a specific product, but the API unfortunately only has a single, paginated list of products. Here's how we would write this using a lazy collection:

    $products = new LazyCollection(function () {
        do {
            $page = 1;
    
            $response = Client::get('products', $page++);
    
            yield from $response['products'];
        } while($response['hasMore']);
    });
    
    $product = $products->where('itemNumber', $itemNumber)->first();

    Using a lazy collection, we nicely abstracted the pagination, and ended up dealing with a simple collection on our end.

    (BTW, the Collection class has a whereFirst(...) method, to get the first item matching the given check without having to spin through the whole collection. With a lazy collection, this is of no concern; calling ->where(...)->first() will stop enumerating the values as soon as any item matches).

  3. Here's another example. The query builder's cursor() method lets us iterate over the records in the database without pulling them out all at once. It currently returns a simple Generator, which is a little crude. We could easily change it to return a LazyCollection:

    public function cursor()
    {
        if (is_null($this->columns)) {
            $this->columns = ['*'];
        }
    
        return new LazyCollection(function () {
            yield from $this->connection->cursor(
                $this->toSql(), $this->getBindings(), ! $this->useWritePdo
            );
        });
    }

    This would allow us to treat the result of a cursor the same way as a regular Collection instance:

    $postIds = Post::cursor()
        ->filter(function ($post) {
            return Gate::can('view', $post);
        })
        ->pluck('id')
        ->take(20)
        ->all();

    This will get the IDs of the first 20 posts that the user may view. Doing this with a native generator would be a lot more code, and far less readable.

  4. Another use-case would be reading the lines of a multi-GB log file. Now imagine every log is 4 lines long, so we want to be able to easily read 4 lines, pass it off to a value object, and inspect it.

    LazyCollection::make(function () {
        $handle = fopen('log.txt', 'r');
    
        while (($line = fgets($handle)) !== false) {
            yield $line;
        }
    })
    ->chunk(4)
    ->map(function ($lines) {
        return LogEntry::fromLines($lines);
    })
    ->each(function ($logEntry) {
        // Do what you gotta do with $logEntry
    });

I can keep going, but I think I've made my point. Lazy collections are of immense use whenever we're not dealing with an existing (finite) set of items.

Implementation

The way this PR implements lazy collections is as follows:

  • Create a new LazyCollection class, which implements most of the methods available on the Collection class, with identical functionality.

  • Create a new Enumerable interface, so that both collection classes can adhere to the same contract.

  • Create a new EnumeratesValues trait. This houses all of the methods that have an identical implementation for both collection classes.

  • In order to reuse as much code as possible, the LazyCollection class has 4 types of methods:

    1. Those that have an identical implementation to the Collection class. These methods are actually in the EnumeratesValues trait, and are shared between the two classes.

    2. Those for which it has its own implementation.

    3. Those methods, such as sorting and grouping, which cannot be done incrementally; you need the full dataset in order to tranform the collection. In these cases, the lazy collection will defer to the standard collection class for the actual operation, but only when the values are actually enumerated. So long as no values are being pulled out of the lazy collection, no operation will run prematurely.

    4. Those methods that don't return a collection. For example, the implode method. In these cases, it simply calls that method on the standard collection and returns its value.

  • The SupportCollectionTest has been updated to use PHPUnit data providers, to make sure that both collection classes are tested.

Commits

For cleanliness, for organization, and to ease review of this PR, the commits have been thoughtfully constructed and compartmentalized:

  1. Create Enumerable contract - this commit simply creates the new Enumerable contract, which contains all methods that will be common to both collection classes. It also moves the additional interfaces from the Collection's implements clause to the Enumerable's extends clause.

  2. Extract reusable methods into the EnumeratesValues trait - this commit creates a new EnumeratesValues trait, and moves many of the Collection's methods into that trait. This trait will later be shared with the LazyCollection class.

  3. Support all Enumerables in proxy - This simply switches the typehint for the HigherOrderCollectionProxy class from Collection to Enumerable, so that it'll support both collection classes.

  4. Use collectionClassProvider to inject the collection class into each test - To prepare the tests to test both implementations, we'll use PHPUnit data providers to inject the collection class (as a string) into each test method that should test both implementations. Without changing the actual tests, every method now instantiates the class being passed into it, instead of the hard-coded Collection class.

  5. Add Enumerable skip() method - Since skip is an important method for lazy collections, we'll first add it here to the Collection class, the Enumerable interface, and the tests.

  6. Create LazyCollection class - Finally, after all this preparation, we can actually create the implementation for the LazyCollection. To test it, we simply add it as an entry in the collectionClassProvider method, and now all existing tests will also test those methods against the LazyCollection implementation!

  7. Add lazy() method to standard collection - Finally, we add a lazy() method to the Collection class, to easily get a lazy collection from a regular collection.

Phew! I do hope that if this is merged, these precious commits won't be squashed into oblivion 😛

@JosephSilber JosephSilber force-pushed the lazy-collection branch 3 times, most recently from 317c288 to a20da28 Compare August 5, 2019 05:07
@JosephSilber
Copy link
Contributor Author

JosephSilber commented Aug 5, 2019

Interesting that Travis fails the testShiftReturnsAndRemovesFirstItemInCollection test (the other 250 tests in that file pass properly).

This test passes properly on my machine (running PHP v7.2.1) 🤷‍♂️

@nokios
Copy link

nokios commented Aug 5, 2019

Having had to use generators in the past, I love the abstraction this provides. Dunno if my opinion matters, but I definitely would love to see this merged into the core! +1

@taylorotwell
Copy link
Member

So, I think is a cool feature, but my main concern is that now any addition to the Collection class requires developers to also think about writing a separate implementation for LazyCollection.

How do you generally convert a regular collection method into a LazyCollection method. What do you need to avoid? What is the general approach?

@JosephSilber
Copy link
Contributor Author

JosephSilber commented Aug 5, 2019

my main concern is that now any addition to the Collection class requires developers to also think about writing a separate implementation for LazyCollection

You definitely have to think about it, the same way you have to think about any two classes implementing the same interface. We already have many of those in Laravel, such as DB/cache/queue drivers, auth guards and much more.

That said, many of the collection methods can actually be shared, as you can see in the EnumeratesValues trait. When they can't be shared, they can often defer to each other.

In fact, many of the existing methods on the standard collection can be rewritten in the opposite direction: by deferring back to the lazy collection. Take for example the contains method, which could be rewritten as:

public function contains($key, $operator = null, $value = null)
{
	return $this->lazy()->contains(...func_get_args());
}

Or, as another example, take the mapWithKeys method:

public function mapWithKeys(callable $callback)
{
    return $this->lazy()->mapWithKeys($callable)->collect();
}

I chose not to do it yet in this PR, because it's big enough as is. But I can definitely see a future where there are only a small handful of methods that have a separate implementation.


How do you generally convert a regular collection method into a LazyCollection method. What do you need to avoid? What is the general approach?

The idea is basically to not enumerate any values prematurely. So instead of creating a local array and accumulating the values, you instead yield the values one by one.

Let's take the collapse method as an example. Instead of first collecting all of the values into a local $results array:

public static function collapse($array)
{
    $results = [];

    foreach ($array as $values) {
        if ($values instanceof Collection) {
            $values = $values->all();
        } elseif (! is_array($values)) {
            continue;
        }

        $results[] = $values;
    }

    return array_merge([], ...$results);
}

...we instead yield each value individually.

public function collapse()
{
    return new static(function () {
        foreach ($this as $values) {
            if (is_array($values) || $values instanceof Enumerable) {
                foreach ($values as $value) {
                    yield $value;
                }
            }
        }
    });
}

This will definitely require a little more thought when adding new methods to the collection, but I do think it's well worth it.

The collection class at this point is mature enough that it doesn't see any meaningful daily changes, so the additional implementation shouldn't be that much of a burden.

@Gummibeer
Copy link
Contributor

I'm a bit confused about the naming of the Enumerable interface. In which kind are the methods related to enumeration?
They are some kind of math (median, average) but also query (where) and so on.

I'm honest that I don't have a better/good name but enumerable seems a bit confusing.

@thecrypticace
Copy link
Contributor

iterable would've likely been the preferred name but that's actually a type in PHP 7.2 which means it cannot be used as an interface or class name.

@Gummibeer
Copy link
Contributor

Ok, but even this doesn't seem good for me because the contained methods rely to multiple interfaces for me. Like Queryable, Calculatable and things like this.
For sure this would make the list of interfaces much longer. But I can't imagine any name in which an average calculation matches the same like a query by attribute method.

@garygreen
Copy link
Contributor

Woah this is amazing Joseph, amount of work that went into this is mind blowing. It's definitely a big win for the community 👍

I'm guessing this is also similar to how Lazy.js works?

@staudenmeir
Copy link
Contributor

staudenmeir commented Aug 5, 2019

The failing test is caused by this line. Mockery::namedMock() registers a Foo class and this causes is_callable() to return true in the LazyCollection constructor.

@JosephSilber JosephSilber force-pushed the lazy-collection branch 3 times, most recently from 26772d1 to 1f8f584 Compare August 6, 2019 03:09
@JosephSilber
Copy link
Contributor Author

JosephSilber commented Aug 6, 2019

@staudenmeir amazing catch! (I do wonder how come it worked for me locally) I was only running the SupportCollectionTest class.

I changed the test to use non-conflicting values, and now it passes 🎉

public function testShiftReturnsAndRemovesFirstItemInCollection($collection)
{
$data = new $collection(['Taylor', 'Otwell']);
$this->assertEquals('Taylor', $data->shift());
$this->assertEquals('Otwell', $data->first());
$this->assertEquals('Otwell', $data->shift());
$this->assertEquals(null, $data->first());
}

@JosephSilber JosephSilber force-pushed the lazy-collection branch 3 times, most recently from 4817593 to 05ed863 Compare August 9, 2019 03:20
@laravel laravel deleted a comment from garygreen Aug 9, 2019
@laravel laravel deleted a comment from staudenmeir Aug 9, 2019
@laravel laravel deleted a comment from staudenmeir Aug 9, 2019
@laravel laravel deleted a comment from staudenmeir Aug 9, 2019
@GrahamCampbell
Copy link
Member

I wonder if the contract belongs in the contracts repo?

@taylorotwell taylorotwell merged commit 2bc4288 into laravel:master Aug 16, 2019
@drupol
Copy link

drupol commented Aug 16, 2019

Nice ! Congratulations, very nice addition.

@garygreen
Copy link
Contributor

Congrats Joseph! Awesome work on this, excited to try it out. 😃

giphy

@JosephSilber JosephSilber deleted the lazy-collection branch August 16, 2019 17:37
@JosephSilber
Copy link
Contributor Author

For anyone following along:

I decided to remove the mutating methods.
They don't really make sense on a lazy collection.

See [6.0] Remove mutating methods from lazy collection.

@nalin897
Copy link

nalin897 commented Nov 2, 2019

Really excited to try this feature after the end of my semester exam... Love from India 🇮🇳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants