-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[6.0] Add lazy collections and an enumerable contract #29415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
317c288
to
a20da28
Compare
Interesting that Travis fails the This test passes properly on my machine (running PHP v7.2.1) 🤷♂️ |
Having had to use generators in the past, I love the abstraction this provides. Dunno if my opinion matters, but I definitely would love to see this merged into the core! +1 |
So, I think is a cool feature, but my main concern is that now any addition to the Collection class requires developers to also think about writing a separate implementation for LazyCollection. How do you generally convert a regular collection method into a LazyCollection method. What do you need to avoid? What is the general approach? |
You definitely have to think about it, the same way you have to think about any two classes implementing the same interface. We already have many of those in Laravel, such as DB/cache/queue drivers, auth guards and much more. That said, many of the collection methods can actually be shared, as you can see in the In fact, many of the existing methods on the standard collection can be rewritten in the opposite direction: by deferring back to the lazy collection. Take for example the public function contains($key, $operator = null, $value = null)
{
return $this->lazy()->contains(...func_get_args());
} Or, as another example, take the public function mapWithKeys(callable $callback)
{
return $this->lazy()->mapWithKeys($callable)->collect();
} I chose not to do it yet in this PR, because it's big enough as is. But I can definitely see a future where there are only a small handful of methods that have a separate implementation.
The idea is basically to not enumerate any values prematurely. So instead of creating a local array and accumulating the values, you instead Let's take the public static function collapse($array)
{
$results = [];
foreach ($array as $values) {
if ($values instanceof Collection) {
$values = $values->all();
} elseif (! is_array($values)) {
continue;
}
$results[] = $values;
}
return array_merge([], ...$results);
} ...we instead public function collapse()
{
return new static(function () {
foreach ($this as $values) {
if (is_array($values) || $values instanceof Enumerable) {
foreach ($values as $value) {
yield $value;
}
}
}
});
} This will definitely require a little more thought when adding new methods to the collection, but I do think it's well worth it. The collection class at this point is mature enough that it doesn't see any meaningful daily changes, so the additional implementation shouldn't be that much of a burden. |
I'm a bit confused about the naming of the I'm honest that I don't have a better/good name but enumerable seems a bit confusing. |
|
Ok, but even this doesn't seem good for me because the contained methods rely to multiple interfaces for me. Like |
Woah this is amazing Joseph, amount of work that went into this is mind blowing. It's definitely a big win for the community 👍 I'm guessing this is also similar to how Lazy.js works? |
The failing test is caused by this line. |
26772d1
to
1f8f584
Compare
@staudenmeir amazing catch! I changed the test to use non-conflicting values, and now it passes 🎉 framework/tests/Support/SupportCollectionTest.php Lines 144 to 152 in ca4a708
|
4817593
to
05ed863
Compare
I wonder if the contract belongs in the contracts repo? |
05ed863
to
2f00816
Compare
2f00816
to
2bc4288
Compare
Nice ! Congratulations, very nice addition. |
For anyone following along: I decided to remove the mutating methods. |
Really excited to try this feature after the end of my semester exam... Love from India 🇮🇳 |
This PR adds a new
LazyCollection
class and anEnumerable
interface.This is similar to Swift's
LazyCollection
and C#.NET'sIEnumerable
.Here's a pithy post describing the benefits of lazy collections.
This PR is purely additive. It does not affect the current
Collection
class, and is fully backwards compatible.Motivation
The
Collection
class is one of the most underappreciated tools in the Laravel toolchain. It allows you to declaratively chain methods using its fluent API, mapping, filtering, tranforming, sorting and reducing its values in an extremely readable format.However, this comes with two downsides:
Every additional method invocation creates a new array. If you're dealing with a collection with thousands (or even millions) of items, reallocating and garbage collecting such a huge array at every method call can be quite taxing.
The
Collection
class only works with a static list of items. If the items are being streamed (from an API, from the DB, or from lines in a multi-GB file), the collection class as it stands can't help us.Explanation
Whereas the existing
Collection
class wraps a native array, the newLazyCollection
class wraps a native iterator, while still providing most of the methods that we know and love from a regular collection:This starts off by creating a lazy collection that has an infinite number of elements, maps those numbers, filters them, and then only takes 100 of them. Since the collection is a lazy collection, starting off with an infitine amount of elements is not a problem. No elements are actually generated until we pull them out of the collection. Since we finished our chain with
->take(100)
, it will stop generating new values after it generated 100 items that satisfy the filter.This isn't really a very useful example, but it serves as a simple demonstration of what's possible.
Let's look at a more realistic example. Imagine a remote products API. We're looking for a specific product, but the API unfortunately only has a single, paginated list of products. Here's how we would write this using a lazy collection:
Using a lazy collection, we nicely abstracted the pagination, and ended up dealing with a simple collection on our end.
(BTW, the
Collection
class has awhereFirst(...)
method, to get the first item matching the given check without having to spin through the whole collection. With a lazy collection, this is of no concern; calling->where(...)->first()
will stop enumerating the values as soon as any item matches).Here's another example. The query builder's
cursor()
method lets us iterate over the records in the database without pulling them out all at once. It currently returns a simpleGenerator
, which is a little crude. We could easily change it to return aLazyCollection
:This would allow us to treat the result of a cursor the same way as a regular
Collection
instance:This will get the IDs of the first 20 posts that the user may view. Doing this with a native generator would be a lot more code, and far less readable.
Another use-case would be reading the lines of a multi-GB log file. Now imagine every log is 4 lines long, so we want to be able to easily read 4 lines, pass it off to a value object, and inspect it.
I can keep going, but I think I've made my point. Lazy collections are of immense use whenever we're not dealing with an existing (finite) set of items.
Implementation
The way this PR implements lazy collections is as follows:
Create a new
LazyCollection
class, which implements most of the methods available on theCollection
class, with identical functionality.Create a new
Enumerable
interface, so that both collection classes can adhere to the same contract.Create a new
EnumeratesValues
trait. This houses all of the methods that have an identical implementation for both collection classes.In order to reuse as much code as possible, the
LazyCollection
class has 4 types of methods:Those that have an identical implementation to the
Collection
class. These methods are actually in theEnumeratesValues
trait, and are shared between the two classes.Those for which it has its own implementation.
Those methods, such as sorting and grouping, which cannot be done incrementally; you need the full dataset in order to tranform the collection. In these cases, the lazy collection will defer to the standard collection class for the actual operation, but only when the values are actually enumerated. So long as no values are being pulled out of the lazy collection, no operation will run prematurely.
Those methods that don't return a collection. For example, the
implode
method. In these cases, it simply calls that method on the standard collection and returns its value.The
SupportCollectionTest
has been updated to use PHPUnit data providers, to make sure that both collection classes are tested.Commits
For cleanliness, for organization, and to ease review of this PR, the commits have been thoughtfully constructed and compartmentalized:
Create
Enumerable
contract - this commit simply creates the newEnumerable
contract, which contains all methods that will be common to both collection classes. It also moves the additional interfaces from theCollection
'simplements
clause to theEnumerable
'sextends
clause.Extract reusable methods into the
EnumeratesValues
trait - this commit creates a newEnumeratesValues
trait, and moves many of theCollection
's methods into that trait. This trait will later be shared with theLazyCollection
class.Support all Enumerables in proxy - This simply switches the typehint for the
HigherOrderCollectionProxy
class fromCollection
toEnumerable
, so that it'll support both collection classes.Use
collectionClassProvider
to inject the collection class into each test - To prepare the tests to test both implementations, we'll use PHPUnit data providers to inject the collection class (as a string) into each test method that should test both implementations. Without changing the actual tests, every method now instantiates the class being passed into it, instead of the hard-codedCollection
class.Add Enumerable skip() method - Since
skip
is an important method for lazy collections, we'll first add it here to theCollection
class, theEnumerable
interface, and the tests.Create
LazyCollection
class - Finally, after all this preparation, we can actually create the implementation for theLazyCollection
. To test it, we simply add it as an entry in thecollectionClassProvider
method, and now all existing tests will also test those methods against theLazyCollection
implementation!Add
lazy()
method to standard collection - Finally, we add alazy()
method to theCollection
class, to easily get a lazy collection from a regular collection.Phew! I do hope that if this is merged, these precious commits won't be squashed into oblivion 😛