Skip to content

Conversation

@cjihrig
Copy link
Contributor

@cjihrig cjihrig commented Oct 29, 2018

WIP continuation of #416

Details

Some questions about the current API design:

  • uv_fs_opendir() currently allocates a uv_dir_t that is used when reading and closing the directory. It's not clear to me where the matching free() should live (uv_fs_req_cleanup(), uv_fs_close_dir(), both, other). It might be better to pass the uv_dir_t to uv_fs_opendir() to avoid the issue.
  • How should errors in uv__fs_readdir() be handled? Memory is allocated via uv__strdup(). It can be freed via uv_fs_req_cleanup(). But, what condition should the outputs be left in if one of the readdir() calls fails in the middle of the call to uv__fs_readdir()?
  • In general, I think this attaches too much new information to uv_fs_s, which impacts all other fs operations.

@cjihrig
Copy link
Contributor Author

cjihrig commented Nov 10, 2018

Anyone have any thoughts on the points in the OP?

@Fishrock123
Copy link
Contributor

@cjihrig I think it would be more libuv-style to pass uv_dir_t to uv_fs_opendir()?

which impacts all other fs operations.

Impacts in what way?

@cjihrig
Copy link
Contributor Author

cjihrig commented Dec 4, 2018

Impacts in what way?

I didn't mean anything too drastic. Adding fields to a struct that is shared between all fs operations means increased memory on all operations. If we initialize the new fields to some value, then it's also a (very minor) performance hit.

@rijnhard
Copy link

Not to nag but any update?

@Fishrock123
Copy link
Contributor

@cjihrig Can uv_dir_t "extend" (not that C has "extend" functionality) uv_fs_t? (Or have the new struct have it as a field.)

It may be worthwhile to have this return/use a non-uv_fs_t struct, based on what you said.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Feb 11, 2019

Not 100% sure this would work but this is what I had in mind: Fishrock123/libuv@91e5a09...06fa01b

(Compiles but tests have some failures.)

@Fishrock123
Copy link
Contributor

@cjihrig Any thoughts?

@cjihrig
Copy link
Contributor Author

cjihrig commented Mar 18, 2019

@Fishrock123 sorry for the delay. Looking at this again I'm thinking...

Instead of adding the fields dir, dirents, and nentries to the uv_fs_t, maybe we could just add the uv_dir_t* to uv_fs_t, and add everything else to the uv_dir_t. That would prevent the problem of adding a bunch of things to uv_fs_t, while keeping a similar API to the rest of the fs functions.

Maybe the uv_dir_t* could even go in the UV_FS_PRIVATE_FIELDS to avoid breaking ABI.

I'll take another look at this tonight with fresh eyes.

@guymguym
Copy link

Hey @cjihrig I still don't see mutex around the call to linux readdir()...

@cjihrig cjihrig force-pushed the readdir branch 3 times, most recently from 99c1f24 to 2fb2d05 Compare March 20, 2019 16:06
@cjihrig
Copy link
Contributor Author

cjihrig commented Mar 20, 2019

I've refactored the unix implementation (including adding the mutex), and I think I've got this to a place where it can land on the v1.x branch. I still want to do a bit more cleanup, and I haven't updated the Windows implementation yet.

@cjihrig cjihrig force-pushed the readdir branch 11 times, most recently from 9050b07 to 18fc857 Compare March 24, 2019 03:10
@cjihrig
Copy link
Contributor Author

cjihrig commented Mar 26, 2019

@guymguym
Copy link

guymguym commented Mar 26, 2019

Hey @bnoordhuis @cjihrig

Thanks for your replies!

That isn't so different from multiple uv_fs_write() requests for the same file descriptor, for example. We don't make any promises about correctness or order there, either.

IMHO there is quite a difference with concurrent uv_fs_write() calls - the worst outcome is that the caller will not know which write won - which is expected for any unsynchronized write operations in any storage API known to man, and also the case between different processes. With concurrent uv_fs_readdir(), which is an unsynchronized read operation, the effect is unspecified and will likely include memory corruptions and crash depending on the underlying implementation.

Note that the reason that readdir_r() was deprecated by POSIX is that struct dirent is poorly defined in a cross platform way and the caller cannot reliably send a length for the provided d_name buffer. I think that this POSIX behavior is unfortunate to have been created in the first place (just needed to standardize d_reclen?), and a modern library like libuv should not expose these kind of unneeded POSIX complexities.

Anyway, if you only care for how Node.js uses this API then I am sure it can handle synchronization on its own, but as a general library this is a TMI API :)

Because the code currently isn't equipped to deal with that. The mutex guards the dir->dir handle but not the dir->dirents array. Two concurrent threads will overwrite each other's entries in that array.

Oh I just noticed that dirents are passed on the dir structure itself. I would actually expect that dirents and ndirents be passed to the uv_fs_readdir call as arguments, and store it in req->ptr as a request scope struct. And actually, since the underlying impl is producing entries one by one anyway, isn't it best to design the API to return a single entry at a time? that would simplify it even further for the caller.

Thanks!

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM. Since Colin wrote it, I'll defer to him on whether or not to make allowances for thread-safety, but I think it's fine the way it is.

@bnoordhuis
Copy link
Member

You mean something more in the shape of uv_fs_scandir*...?

uv_fs_scandir() has an iterator interface for pulling out the results but the actual work is done in a single round-trip through the thread pool. uv_readdir() is different because it segments the work (i.e., does multiple round-trips.)

After thinking about it some more, I think the uv_readdir() interface is fine. I can live with it being slightly different from everything else.

@guymguym
Copy link

Hey @bnoordhuis @cjihrig

I had a second thought about what I wrote above.

I still think it is safer and simpler to the caller to implement with a mutex internally, but I also understood that concurrent calls to uv_fs_readdir() would probably not make sense from a usage perspective because it's a streaming API that maintains the dir position inside the DIR structure (there is no POSIX API for a "stateless" call to readdir(dir, position) like we have for files read(file, offset)). So the caller will inevitably have to serialize the readdir calls to collect all the entries, and therefore the mutex overhead is redundant.

Anyway, I also think the PR looks good.
Too bad we can't just protect the calling code from making that mistake.

Thanks!

cjihrig added a commit to cjihrig/libuv that referenced this pull request Mar 26, 2019
Co-authored-by: Julien Gilli <[email protected]>
Co-authored-by: Jeremy Whitlock <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
Co-authored-by: Julien Gilli <[email protected]>
Co-authored-by: Jeremy Whitlock <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
@cjihrig
Copy link
Contributor Author

cjihrig commented Mar 26, 2019

Squashed the commits down. Final CI before landing with no related failures: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/1327/

Landing. Finally 🎉. Thanks everyone involved in this PR and all of its previous iterations.

@cjihrig cjihrig merged commit 99440bb into libuv:v1.x Mar 26, 2019
@cjihrig cjihrig deleted the readdir branch March 26, 2019 23:08
njlr pushed a commit to buckaroo-pm/libuv that referenced this pull request Apr 5, 2019
Co-authored-by: Julien Gilli <[email protected]>
Co-authored-by: Jeremy Whitlock <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
Fishrock123 added a commit to Fishrock123/node that referenced this pull request Oct 7, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs#583
Refs: libuv/libuv#2057
Fishrock123 added a commit to nodejs/node that referenced this pull request Oct 8, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: #583
Refs: libuv/libuv#2057

PR-URL: #29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
BridgeAR pushed a commit to nodejs/node that referenced this pull request Oct 9, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: #583
Refs: libuv/libuv#2057

PR-URL: #29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
addaleax pushed a commit to nodejs/quic that referenced this pull request Oct 9, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs/node#583
Refs: libuv/libuv#2057

PR-URL: nodejs/node#29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
addaleax pushed a commit to nodejs/quic that referenced this pull request Oct 9, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs/node#583
Refs: libuv/libuv#2057

PR-URL: nodejs/node#29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
addaleax pushed a commit to nodejs/quic that referenced this pull request Oct 11, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs/node#583
Refs: libuv/libuv#2057

PR-URL: nodejs/node#29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
juanarbol pushed a commit to juanarbol/quic that referenced this pull request Dec 17, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs/node#583
Refs: libuv/libuv#2057

PR-URL: nodejs/node#29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
juanarbol pushed a commit to juanarbol/quic that referenced this pull request Dec 17, 2019
This adds long-requested methods for asynchronously interacting and
iterating through directory entries by using `uv_fs_opendir`,
`uv_fs_readdir`, and `uv_fs_closedir`.

`fs.opendir()` and friends return an `fs.Dir`, which contains methods
for doing reads and cleanup. `fs.Dir` also has the async iterator
symbol exposed.

The `read()` method and friends only return `fs.Dirent`s for this API.
Having a entry type or doing a `stat` call is deemed to be necessary in
the majority of cases, so just returning dirents seems like the logical
choice for a new api.

Reading when there are no more entries returns `null` instead of a
dirent. However the async iterator hides that (and does automatic
cleanup).

The code lives in separate files from the rest of fs, this is done
partially to prevent over-pollution of those (already very large)
files, but also in the case of js allows loading into `fsPromises`.

Due to async_hooks, this introduces a new handle type of `DIRHANDLE`.

This PR does not attempt to make complete optimization of
this feature. Notable future improvements include:
- Moving promise work into C++ land like FileHandle.
- Possibly adding `readv()` to do multi-entry directory reads.
- Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation.

Refs: nodejs/node-v0.x-archive#388
Refs: nodejs/node#583
Refs: libuv/libuv#2057

PR-URL: nodejs/node#29349
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: David Carlier <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
Co-authored-by: Julien Gilli <[email protected]>
Co-authored-by: Jeremy Whitlock <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
PR-URL: libuv/libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv/libuv#175
Refs: nodejs/node#583
Refs: libuv/libuv#416
Refs: libuv/libuv#170
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
Co-authored-by: Julien Gilli <[email protected]>
Co-authored-by: Jeremy Whitlock <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
PR-URL: libuv/libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv/libuv#175
Refs: nodejs/node#583
Refs: libuv/libuv#416
Refs: libuv/libuv#170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants