_halide_buffer_crop() needs to check for runtime failures #6402

steven-johnson · 2021-11-09T22:50:49Z

We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend).

When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401).

This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and to wrap all callers in a require() clause to check the result. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.)

This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend.

(Also: drive-by whitespace fix in CodegenC)

We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the `device_crop` field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the `dst` buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and to wrap all callers in a `require()` clause to check the result. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC)

abadams · 2021-11-09T22:56:49Z

src/StorageFolding.cpp

                vector<Expr> new_args = op->args;
                new_args[3] = new_mins;
-                expr = Call::make(op->type, op->name, new_args, op->call_type);
+                internal_assert(op->type == type_of<halide_buffer_t *>() &&


This mutates an existing buffer_crop call, so isn't this going to redundantly wrap it in another require?

hmm, yes, I suppose it could -- let me see if it will get optimized away or not

The gotcha here is that this code transmutes _halide_buffer_crop() -> _halide_buffer_set_bounds(_halide_buffer_crop()); if we leave out the require(), we can risk passing nullptr to the enclosing call. I suppose we could just make that call check for an incoming nullptr and bail early...

abadams

The other runtime calls are checked using assertstmts after the lets, not by wrapping a require. For consistency, I think it would be better to add a non-null assertstmts for each entry in cropped_buffers to the pre_call stmt in the loop on line ~790 (edit: in ScheduleFunctions.cpp)

steven-johnson · 2021-11-09T23:10:05Z

The other runtime calls are checked using assertstmts after the lets, not by wrapping a require.

Not so: look at the code you just commented on, StorageFolding.cpp:132

abadams · 2021-11-09T23:14:10Z

Fair enough, I guess both idioms are in use.

(Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC)

steven-johnson · 2021-11-09T23:29:41Z

The other runtime calls are checked using assertstmts after the lets, not by wrapping a require. For consistency, I think it would be better to add a non-null assertstmts for each entry in cropped_buffers to the pre_call stmt in the loop on line ~790 (edit: in ScheduleFunctions.cpp)

Please see #6403

steven-johnson · 2021-11-09T23:44:15Z

Closing in favor of #6403

* _halide_buffer_crop() needs to check for runtime failures (v2) (Alternate to #6402) We currently assume that _halide_buffer_crop() will never fail. This is a bad assumption, as it can call device_crop(), which can fail due to unexpected runtime errors, or from a backend simply leaving the device_crop field at the default (unimplemented) case (as is currently the case for the OGLC backend). When this happens, the dst buffer was left in an inconsistent, invalid state (which was what led to the crashes fixed by #6401). This change modifies _halide_buffer_crop() to return nullptr in the event of an error, and ensure that all cropped buffers are checked for null at the right point. (This is not optimal, of course, since the specific error returned by device_crop is getting dropped on the floor, but the existence of an error is no longer ignored.) This addresses at least some of the failure issues we are seeing in performance_async_gpu with the OpenGLCompute backend. (Also: drive-by whitespace fix in CodegenC) * Oops

steven-johnson requested review from abadams and shoaibkamil November 9, 2021 22:50

Merge branch 'master' into srj/halide-buffer-crop

24a8ddd

abadams reviewed Nov 9, 2021

View reviewed changes

steven-johnson mentioned this pull request Nov 9, 2021

_halide_buffer_crop() needs to check for runtime failures (v2) #6403

Merged

steven-johnson closed this Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

_halide_buffer_crop() needs to check for runtime failures #6402

_halide_buffer_crop() needs to check for runtime failures #6402

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

abadams Nov 9, 2021

Uh oh!

steven-johnson Nov 9, 2021

Uh oh!

steven-johnson Nov 9, 2021

Uh oh!

abadams left a comment •

edited

Loading

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

abadams commented Nov 9, 2021

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

_halide_buffer_crop() needs to check for runtime failures #6402

_halide_buffer_crop() needs to check for runtime failures #6402

Uh oh!

Conversation

steven-johnson commented Nov 9, 2021

Uh oh!

abadams Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

steven-johnson Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

steven-johnson Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

abadams left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

abadams commented Nov 9, 2021

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

steven-johnson commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abadams left a comment •

edited

Loading