Skip to content

Conversation

@dagitses
Copy link
Collaborator

@dagitses dagitses commented Sep 9, 2021

Stack from ghstack:

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
len(os.sched_getaffinity(0)) to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:

  • the cmake command did not specify "-j"
  • the number of top-level jobs in top/pstree never exceeded 26 (24 +
    2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: D30968796

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 9, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 2e28130 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

[ghstack-poisoned]
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 9, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

ghstack-source-id: 3ed3bf6
Pull Request resolved: #64733
@dagitses dagitses marked this pull request as ready for review September 9, 2021 12:06
@dagitses dagitses requested a review from malfet September 9, 2021 17:54
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 15, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

ghstack-source-id: c3cc538
Pull Request resolved: #64733
@dagitses dagitses requested a review from driazati September 15, 2021 18:56
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 16, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

ghstack-source-id: 752dcd0
Pull Request resolved: #64733
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 16, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

ghstack-source-id: b9e932c
Pull Request resolved: #64733
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 16, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

ghstack-source-id: 03c51be
Pull Request resolved: #64733
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 16, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

ghstack-source-id: 5061a5f
Pull Request resolved: #64733
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

Differential Revision: [D30968796](https://our.internmc.facebook.com/intern/diff/D30968796)

[ghstack-poisoned]
dagitses pushed a commit that referenced this pull request Sep 17, 2021
The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

And I also verified that MAX_JOBS trumps everything if specified.

ghstack-source-id: 3457894
Pull Request resolved: #64733
@dagitses
Copy link
Collaborator Author

@dagitses has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@dagitses merged this pull request in 047e682.

@facebook-github-bot facebook-github-bot deleted the gh/dagitses/27/head branch September 21, 2021 14:17
@cloudhan
Copy link
Contributor

cloudhan commented Sep 22, 2021

@dagitses @malfet @driazati

This PR causes building failure on windows when building with

$env:MAX_JOBS="15"
python setup.py develop

with the failure as:

cmake --build . --target install --config Release -- /p:CL_MPCount=15
ninja: error: unknown target '/p:CL_MPCount=15'

See comment
047e682#r56870845

@dagitses
Copy link
Collaborator Author

@dagitses @malfet @driazati

This PR causes building failure on windows when building with

$env:MAX_JOBS="15"
python setup.py develop

with the failure as:

cmake --build . --target install --config Release -- /p:CL_MPCount=15
ninja: error: unknown target '/p:CL_MPCount=15'

See comment
047e682#r56870845

thanks! @malfet is fixing in #65444.

facebook-github-bot pushed a commit that referenced this pull request Sep 22, 2021
Summary:
Reported by cloudhan in #64733 (comment)

Fixes regression introduced by 047e682

cc malfet seemethere

Pull Request resolved: #65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
malfet added a commit to malfet/pytorch that referenced this pull request Oct 5, 2021
Summary:
Reported by cloudhan in pytorch#64733 (comment)

Fixes regression introduced by pytorch@047e682

cc malfet seemethere

Pull Request resolved: pytorch#65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
malfet added a commit that referenced this pull request Oct 5, 2021
Summary:
Reported by cloudhan in #64733 (comment)

Fixes regression introduced by 047e682

cc malfet seemethere

Pull Request resolved: #65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants