Optimize disjointed unions & differences (+ introduce caching of costly nef<->polyset conversions) by ochafik · Pull Request #3636 · openscad/openscad

ochafik · 2021-01-29T00:30:02Z

Hi all!

This PR introduces a fast-union feature that avoids using Nef polyhedra for unions of "clearly disjoint" solids (as per their respective bboxes). It simply concatenates PolySets in that case (converting from Nef polyhedra if needed), which is much quicker (Nef joins themselves are very slow, but creating Nef polyhedra can be very slow too at the first place). It plays well with lazy-union (and some upgrades to it I'm proposing in #3637).

Edit: As suggested below, now also testing for fast-unionability of subtracted terms in differences (fast-difference feature).

It also makes full use of the caches for Nef polyhedra <-> polysets conversions in the process.

As it turns out, spheres love caching: the following example now renders in 5 sec (instead of 4min30sec; will add more examples here):

// openscad --enable=fast-union test.scad -o test.stl

N = 5;

// When this is true, we end up with lots of overlaps.
// Rendering is still faster than before thanks to the caching of the spheres,
// but the true benefits will come from the reordering of unioned bodies, see details below.
overlap = false;

// A simple grid of smooth spheres
union() {
  for (i=[0:N-1], j=[0:N-1])
    translate([i, j, 0])
      sphere(d=(overlap ? 1.1 : 0.9), $fn=50);
}

(less dramatic, but this brick wall benchmark's rendering time is almost cut in half)

Some implementation comments: (suggested entry point for review: CGALUtils::applyUnion3DFast):

Added a BoundingBoxes helper, which allows testing a new solid's bbox against all the previously unioned solids' bounding boxes in the union (allows optimizing non-overlapping grid patterns). It currently does a crude linear search on those bounding bboxes (could speed up with some smart CGAL structure but this isn't the new bottleneck anyway)
Added a LazyGeometry helper, which does nef ↔ polyset conversions on demand, provides their bboxes and does disjointed unions or normal ones (using / filling the cache in the process). Because of some pathological cases, even disjointed unions may fall back to Nef polyhedra operations (as a result, need to keep the original polysets, which prevents in-place concatenation: copying and checking increasingly large PolySets over can become the bottleneck in large grids of objects, for instance the example above with N=10). May want to revisit the costly detection of such cases + fallback strategy (edit: added an optional -DOPTIMISTIC_FAST_UNION define to disable these checks: the spheres then render in 700ms).
The order of unions matters now even more than ever. Currently I'm only using the order of the children, but I plan to find "easy" groups of disjoint operands that can be unioned as polysets (say, looking at morton invervals overlaps of the operands' bounding boxes), to minimize the number of Nef polyhedra built and operated on. Much potential for parallelism here too.
I had to switch some APIs to return shared_ptr and propagate the changes, hence a bit of diff noise, sorry.
There are other cache-unaware code paths (e.g. intersection / difference / minkowski) that I'd be happy to tackle in a follow up.

Managed to get all tests green locally (macosx), except for differences-tests for which I had to do some compromise.

Welcoming comments & suggestions / happy to refactor.

…f non-overlapping solids

…ed polysets

…full unions)

Crucial now that we're concatenating polysets everywhere!

Difference somehow seems marked green on Nef bodies, which is an issue when we now only have PolySets after unions (when --enable=fast-union, which I set for all tests). Updated expectation + added a union-free test that checks colors

… older boost version linux builds run with (should address build error on ubuntu 18, which has Boost 1.65.1 ; ubuntu 20 has Boost 1.71.0 which has has_value)

…oint union Also allow skipping the (relatively) costly pathological case detection with `-DOPTIMISTIC_FAST_UNION` (e.g. `qmake ... QMAKE_CSSFLAGS+=-DOPTIMISTIC_FAST_UNION ...` to build on mac)

thehans · 2021-01-29T14:58:42Z

Seems like a very good ideal overall. I get substantial speedup for the sample script with overlap=false;
3.5s vs 82s with fast-union off, about 23x speedup.

And just for fun, I wrote a variation of your script which is more cache-friendly (it makes the inner loop / entire columns of spheres cache-able)

union() {
  for (i=[0:N-1]) translate([i,0,0])
    for(j=[0:N-1]) translate([0,j,0])
      sphere(d=(overlap ? 1.1 : 0.9), $fn=50);
}

That gets my render time down to 0.732s, which is outstanding!

...However, going back to your original script and changing to overlap=true;, my testing actually shows that fast-union is significantly slower than before:
fast-union OFF: Total rendering time: 0:01:30.999
fast-union ON: Total rendering time: 0:03:08.003
So just over 2x slower with fast-union enabled.
And I similar speeds of ~1m30s on latest nightly snapshot, and even 2019.05 release as well.

// Rendering is still faster than before thanks to the caching of the spheres,

Since I can't reproduce any speedup, I'm curious what you mean about caching spheres? I haven't had a chance to look over the whole PR yet, but spheres should have already gone into Geometry Cache, no different from other Polysets.

And on the topic of prioritizing order of operations, I don't know if you've seen it, but the discussion in #1234 and my subsequent PR #3121 are maybe worth looking at.

As a quick example of why combining least-complex-geometries first, is a reasonable heuristic:
Say we have 8 objects A,B,C...H, which are all disjoint and have the same complexity.
The main cost here is basically memory accesses to read input operands, and write the output.
Since these all have same complexity, lets call it 1 Standardized Memory Access Cost Unit (SMACU)

If we just union them in order: ((((((A+B)+C)+D)+E)+F)+G)+H
Then each subsequent operation costs increasingly more:
1+1 (read) = 2 (write) (4 total SMACUs for this op)
2+1 = 3 (6 SMACUs)
3+1 = 4 (8 SMACUs)
...
7+1 = 8 (16 SMACUs)
In the end we get (4+6+8+10+12+14+16) = Grand Total 70 SMACUs

If we instead pick the lowest complexity pair between inputs and intermediate results at each step, then we end up looking more like this: ((A+B)+(C+D))+((E+F)+(G+H))
1+1=2 (4)
1+1=2 (4)
1+1=2 (4)
1+1=2 (4)
2+2 = 4 (8)
2+2 = 4 (8)
4+4 = 8 (16)
(4+4+4+4+8+8+16) = Grand Total 48 SMACUs

Differences would be even more dramatic if for example the first object A had a cost of 100, while the remaining objects stayed at cost = 1.

Anyways, using the "Morton" Z-Order curve you mentioned (i had to look it up) seems promising as well. I had also thought about mapping bounds or centroid to a position on a 3D Hilbert Curve to facilitate sorting by proximity. More or less the same idea, just using a different space filling curve.

ochafik · 2021-01-29T20:35:46Z

Seems like a very good ideal overall. I get substantial speedup [...]

Thanks for trying things out!

variation of your script which is more cache-friendly

Hah, neat trick!

what you mean about caching spheres? [...] spheres should have already gone into Geometry Cache, no different from other Polysets.

Sorry I was a bit cryptic: the spheres' PolySets are indeed in the geometry cache, but when unioning N of them we convert each one to a Nef polyhedron independently (createNefPolyhedronFromGeometry in applyUnion3D) and don't cache that Nef. When I ran with $fn=100 (that is where I saw the speedup), each conversion alone was taking 7 seconds on my laptop.

The questionably-named LazyGeometry helper provides a transparent on-demand conversion & cache behaviour and I think could potentially be used beyond the union case.

I don't know if you've seen it, but the discussion in #1234 and my subsequent PR #3121 are maybe worth looking at.

I hadn't, thanks for the great pointers! I hadn't understood the extent of the problem that queue was solving, thanks for elaborating.

I'm working on refactoring the code to preserve the queue logic ("almost" done), will ping this thread when it's ready (moving the BoundingBoxes object inside the LazyGeometry, lots of the code should stay the same).

pick the lowest complexity pair between inputs and intermediate results at each step

I'd assume the most efficient way to multithread this would be work-stealing from the same queue with a proper lock?

mapping bounds or centroid to a position on a 3D Hilbert Curve to facilitate sorting by proximity.

I haven't played with Hilbert's curve yet, could be a better pick (thanks for adding the links ;-)). In any case, overlaps of the 1D intervals corresponding to a bounding box in the curve space may not result in actual overlaps of the 3D bounding boxes, so will still need some tests on the bboxes but it should make it more likely for the heuristic to find good non-overlaps in some bounded time. Can't wait to give it a try :-)

ochafik · 2021-01-29T20:47:52Z

more cache-friendly (it makes the inner loop / entire columns of spheres cache-able)

Oh, and this opens an entirely new can of worms. A rendering tree optimizer might be able to detect repeating subpatterns and reproduce this manual optimization... Something I'd love to explore (see my ramblings in #3637 about tree rewrites).

Aligns applyUnion3DFast on applyUnion3D's behaviour. Can be disabled w/ -DFAST_UNION_LINEAR_UNION for comparison

Confirmed noticeable speed up on large disjoint grids

ochafik · 2021-01-30T04:48:52Z

I've retrofitted the original queue logic and did see sensible improvements, although haven't done thorough benchmarks yet.

(Also, I tried caching intermediate results to see if it could somehow approximate your manual optimization: it resulted in lots of cache evictions, so not a free pass just yet; left it out of the last batch of commits)

lgtm-com · 2021-01-30T06:18:20Z

This pull request introduces 1 alert when merging 8ea55bc into 99cae25 - view on LGTM.com

new alerts:

1 for Inconsistent definition of copy constructor and assignment ('Rule of Two')

thehans · 2021-01-30T12:27:24Z

what you mean about caching spheres? [...] spheres should have already gone into Geometry Cache, no different from other Polysets.

Sorry I was a bit cryptic: the spheres' PolySets are indeed in the geometry cache, but when unioning N of them we convert each one to a Nef polyhedron independently (createNefPolyhedronFromGeometry in applyUnion3D) and don't cache that Nef. When I ran with $fn=100 (that is where I saw the speedup), each conversion alone was taking 7 seconds on my laptop.

~~I tried again with $fn=100 and i still get 2x slowdown:~~

N=5, overlap=true, $fn=100
nightly:    Total rendering time: 0:06:31.726
fast-union: Total rendering time: 0:12:57.206

Edit: Took me a while from starting this reply to finishing it, and in the meantime I pulled your latest changes and retested.
I am seeing some speedup now even with overlap = true;, so that's good.

N=5, overlap=true, $fn=50
nightly:    Total rendering time: 0:01:32.899
fast-union  Total rendering time: 0:01:17.869

The questionably-named LazyGeometry helper provides a transparent on-demand conversion & cache behaviour and I think could potentially be used beyond the union case.

OK, this sounds a bit similar to the smartCache* functions in GeometryEvaluator. It is supposed to handle getting from and inserting to the proper cache: GeometryCache vs CGALCache, but I think in many ways it is suboptimal, and not particularly "smart" in how it operates. Maybe the LazyGeometry functions can be used to replace this instead?
I definitely would prefer more general solutions that can be applied in multiple places over having tons of niche classes for every little thing.

I wonder about the actual complexity of Nef unions, I feel like there might be some log complexity lookups (do they use some bounding box hierarchy of faces to compute intersections?) that would make the costs even less linear?

I don't know precisely, but I'm pretty sure its very close to linear time by total number of faces. IIRC @rcolyer helped show this to me a while ago by writing some scripts and benching them at various complexities, maybe around when I was writing PR 3121.

pick the lowest complexity pair between inputs and intermediate results at each step

I'd assume the most efficient way to multithread this would be work-stealing from the same queue with a proper lock?

I don't know much about work stealing, but i guess that implies each thread would have maintain its own separate queue of work to be stolen from? I think it would be better to have a centralized queue so that ordering can be applied across threads. So I would probably start with a basic design where all threads share a single priority queue and just pull from and push to that (with locking), something like:

lock priority queue
if worker has a result (from previous loop iteration) then insert onto the queue
if (queue.size < 2) { release lock; break; }
pop next two objects
release lock
do work
loop to 1)

But yes proper locking will be crucial for any shared resources (and there's a lot of static / global data around the codebase, for various geometry caches, stat cache, module cache, stackchecker, progress bar, console output handlers etc. which will require careful consideration). There has been some talk, about working towards minimizing such global state wherever possible so that its more compatible with multi-threading.

I haven't played with Hilbert's curve yet, could be a better pick (thanks for adding the links ;-)). In any case, overlaps of the 1D intervals corresponding to a bounding box in the curve space may not result in actual overlaps of the 3D bounding boxes, so will still need some tests on the bboxes but it should make it more likely for the heuristic to find good non-overlaps in some bounded time. Can't wait to give it a try :-)

Another idea is maybe to just calculate volume of bbox intersections.
If you assume uniformly distributed vertices/faces, then you could estimate that the number of faces removed by a union would be roughly:
NumFaces(A) * Volume( bbox(A) ⋂ bbox(B) ) / Volume( bbox(A) ) + NumFaces(B) * Volume( bbox(A) ⋂ bbox(B) ) / Volume( bbox(B) )

I don't think this would be compatible in any way with the priority queue as-is, since that only calculates cost of individual geometries and not pairs of them.
Evaluating all possible pairs would also be a O(n^2) operation for n geometries, which might be too much overhead to do in-between every operation, maybe there could be some threshold of face count, below which the code doesn't bother to find optimal bounding box pairs etc, and falls back to priority queue. For example if a script is only doing union on a huge number axis aligned cubes, I would think checking bounding boxes etc could be as much overhead or more compared to just doing the operation itself.

Its hard to predict the impact (positive or negative) of various optimization attempts without just implementing and measuring them directly.

thehans · 2021-01-30T12:44:48Z

Oh, and this opens an entirely new can of worms. A rendering tree optimizer might be able to detect repeating subpatterns and reproduce this manual optimization...
(Also, I tried caching intermediate results to see if it could somehow approximate your manual optimization: it resulted in lots of cache evictions, so not a free pass just yet; left it out of the last batch of commits)

I don't think this type of optimization will be possible, due to the way caching works. Cache keys are basically just the substring / subtree of the CSG Tree which you see from "Design -> Display CSG Tree..."

The manual optimization works by splitting the translate into two separate operations. So the inner loop gets repeated verbatim, as a child of the outer translate. Without splitting up the translates, then each one will always be unique, so it would be pretty tough I think to match patterns like that, and even if something was devised it would probably have to be specific to grid-like translations, which is a niche solution.

Also, I will probably expand some more in your other PR eventually, but keep in mind that CSGTreeNormalizer already basically converts to a sum of products form. You can see CSGTreeNormalizer::match_and_replace for the replacement rules etc.
But this is only used for preview and not tied to CGAL evaluation. The OpenCSG rendering library requires that the CSG terms be in this form, but I have doubts that it would be helpful for CGAL due to the way that the number of terms tend to multiply to quite large values for certain combinations of ops.

ochafik · 2021-01-30T18:27:59Z

I am seeing some speedup now

Yeah for me it only seems significant with large $fn values (i.e. when nefs are costly to create)

The questionably-named LazyGeometry helper provides a transparent on-demand conversion & cache behaviour and I think could potentially be used beyond the union case.

OK, this sounds a bit similar to the smartCache* functions in GeometryEvaluator. It is supposed to handle getting from and inserting to the proper cache: GeometryCache vs CGALCache, but I think in many ways it is suboptimal, and not particularly "smart" in how it operates. Maybe the LazyGeometry functions can be used to replace this instead? I definitely would prefer more general solutions that can be applied in multiple places over having tons of niche classes for every little thing.

I've seen it, and agree! I've split this logic to a mixed_cache.h file as a starting point (with a meta-templated getOrSet that picks the right cache(s) for the asked type). Would it be fine if I attempted to unify cache access in a subsequent PR? (this one is already getting large)

So I would probably start with a basic design where all threads share a single priority queue and just pull from and push to that (with locking), something like [...]

Glad we're on the same page on this.

I've actually taken a stab at this a few weeks ago: parallel_reduce.h. I wasn't using the priority queue mechanism yet, and the rest of that branch would need rethink or refresh. It does naive parallelism using futures: I essentially swapped shared_ptr<const Geometry> for lazy_ptr<const Geometry>, which is a drop-in replacement that blocks when the actual geom is dereferenced (lazy_ptr.h, pardon my clunky C++). I unlocked a few blocking calls (i.e. in caching code or everywhere that wants to know about the actual geometry's class or dimension) and make a few calls async (transforms, operations... which use that parallel_reduce, but really should probably just use TBB), and despite being globally thread-unsafe, it still seemed very stable (prolly didn't try to crash it hard enough).

My take away from that experiment was that unions were still too slow even with multithreading (and the evaluation tree is too deep, which serializes lots of operations), hence the lazy union extravaganza from #3637 (to flatten the tree) and now this fast union route.

FYI here's the real-world model that motivated this month of hacking: it renders 20x faster with all my (monothreaded) features combined.

BTW: would you be open to having some benchmark model collection + scripts? (I'm sure there's cool tools to use?) I'd be happy to contribute.

[...] calculate volume of bbox intersections. [...]

Interesting idea! (even if we don't apply it here, could help with the cost function of some global optimizer... thinking out loud, but imagine if we could build a gross, octree-based approximation of the model with just the leaf #facet & bbox information? we might have wild estimates of the number of vertices in each quadrant after CSG operations, which could help rule out some global tree rewrites)

Evaluating all possible pairs would also be a O(n^2) [...] some threshold [...] Its hard to predict the impact (positive or negative) of various optimization attempts without just implementing and measuring them directly.

So... I couldn't resist, I did a crude heuristic that tries to find pair-wise disjoint geometries in the following neigbhours, examining up to K geometries unsuccessfully before calling it a day. But that K allowance is reset if it gathered something in its snowball. Since K is a constant, the search isn't quite quadratic, but I'm too lazy to find out if how much more than linear it actually is.

I ran a few tests and it cuts the N=5 overlap=true example by a sizeable amount (full test results + code): the single loop case is 2x faster than baseline, your nested loops trick is 35% faster than baseline.

The code is a bit crude so I'll work on it it to ensure we fast-union the solids in the right order before I commit anything, but I'm pleased :-D

Btw, added some copyright headers to the new files as per my employer's wishes (I'm doing all this just for fun, though), please let me know if that's an issue.

caching intermediate results to see if it could somehow approximate your manual optimization: it resulted in lots of cache evictions, so not a free pass just yet; left it out of the last batch of commits)

I don't think this type of optimization will be possible, due to the way caching works. Cache keys are basically just the substring / subtree of the CSG Tree which you see from "Design -> Display CSG Tree..."

What I did is to retain a sorted set of the cache keys of the geometries that have been unioned in the LazyGeometry, so that I can create a semantic key ("union(){" + join(componentKeys) + "}) that's order invariant (order of children and/or operations, e.g. same key for union{ a; union { b; c; } } as for union{ a; b; c; } or for union{ c; b; a; }, especially if we flatten the unions with the other PR). An issue though is the priority queue would need to dynamically prioritize already cached subsets of the operands (should be possible?)

I haven't looked enough at the cache size issue but maybe with persistent caching it could become okay to store all that extra data?

keep in mind that CSGTreeNormalizer already basically converts to a sum of products form. You can see CSGTreeNormalizer::match_and_replace

I saw this too early in my discovery of the codebase, concluded it wasn't used in rendering and completely forgot about it! 🤦‍♂️

lgtm-com · 2021-01-30T18:42:46Z

This pull request introduces 1 alert when merging be8ab70 into 99cae25 - view on LGTM.com

new alerts:

1 for Inconsistent definition of copy constructor and assignment ('Rule of Two')

ochafik · 2021-01-30T18:44:25Z

Maybe the LazyGeometry functions can be used to replace this instead? I definitely would prefer more general solutions that can be applied in multiple places over having tons of niche classes for every little thing.

I've seen it, and agree! I've split this logic to a mixed_cache.h file as a starting point

(oh, I think I now see what you mean, did you suggest also moving the on-demand conversion logic to that centralized cache facade? lots more code could get speed ups with that, e.g. intersections / differences)

t-paul · 2021-01-30T19:28:34Z

src/bounding_boxes.h

+ * OpenSCAD (www.openscad.org)
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or


GPL2-only does not work for OpenSCAD, while the OpenSCAD code itself is GPL2+, CGAL which is a very core part of the application is GPL3+, so we need the GPL2+ (or at your option) any later version definition to build an application that can be distributed.

Hi Olivier,

* OpenSCAD (www.openscad.org) * Copyright 2021 Google LLC

I would also note that this part is a little confusing, since under the standard structure for copyright notices, it looks like Google wrote OpenSCAD (which of course was not your intention). Of course, Google via you is a contributor now, and it's perfectly reasonable to note that to satisfy your employer's conditions on your flex time. To prepare ahead for the new files being modified by other contributors after being integrated, and to include the GPL2+ status without being wordy, perhaps a good notice is something like:

// Portions of this file are Copyright 2021 Google LLC, and licensed under GPL2+. See COPYING.

Do you think this would satisfy it adequately?

Oops good to know, I should have checked, now fixed.

@rcolyer sorry, replied before seeing your comment. I'll reword to something less ambiguous indeed!

Used your suggestion verbatim, hope it wasn't copyrighted ;-)

…ested by reviewer

lgtm-com · 2021-01-30T22:48:45Z

This pull request introduces 1 alert when merging 7b6e2d2 into 99cae25 - view on LGTM.com

new alerts:

1 for Inconsistent definition of copy constructor and assignment ('Rule of Two')

thehans · 2021-01-30T23:15:52Z

I think could potentially be used beyond the union case
...
(oh, I think I now see what you mean, did you suggest also moving the on-demand conversion logic to that centralized cache facade? lots more code could get speed ups with that, e.g. intersections / differences)

Yeah, I guess I was just meaning that it would be a shame if on demand conversion code ended up not being used beyond the union case, and that it sounds like there's now two places where we have some code choosing between two caches in one way or another. Definitely needs some unified solution, I had at one point looked into making a single GeometryCache which took take a tuple of shared_ptrs for the different types, not sure if that's ideal either, but it was one idea.

Just advocating the DRY principle basically, after all I'm quite proud that my sum contributions to OpenSCAD is in the negative LoC :D

What I did is to retain a sorted set of the cache keys of the geometries that have been unioned in the LazyGeometry, so that I can create a semantic key ("union(){" + join(componentKeys) + "}) that's order invariant (order of children and/or operations, e.g. same key for union{ a; union { b; c; } } as for union{ a; b; c; } or for union{ c; b; a; }, especially if we flatten the unions with the other PR). An issue though is the priority queue would need to dynamically prioritize already cached subsets of the operands (should be possible?)

I see, that's clever and maybe useful in some cases, but I'm not sure. I still don't see it helping with the original script's translate([i,j,0]) ...; because that single loop's translate* never has any repeating values for the combined [i,j,0] ... every single translate is unique. Does that make sense?

* Well its technically a multmatrix node, after being formatted as CSG Tree text.

I haven't looked enough at the cache size issue but maybe with persistent caching it could become okay to store all that extra data?

If there is still some use case for caching intermediate results like that, then maybe instead of appending disjoint geometries to Polysets, we could just useGeometryList (or maybe a new derived DisjointGeometryList?) so that only the shared_ptrs are copied.

As an aside, if you are interested in a more immediate / informal discussion, I'd invite you to join the IRC channel: #openscad on irc.freenode.net
The three of us: t-paul, rcolyer, and I are pretty much on there 24/7 along with some other regular users, and happy to discuss new ideas for improving the project, help with scad scripts, etc.

ochafik · 2021-01-31T19:21:07Z

Ok, so I'm currently reworking the code to be faster, and less complex.

My current approach is to do an analysis phase where I partition union operands into clusters of mutually non-overlapping geometries (no LazyGeometry class anymore). Some of these clusters will be of size 1. Then I reduce each of these non-overlapping clusters by appending in-place to a PolySet (avoids many copies, especially in pathological cases with, say, hundreds of thousands of small geometries). At the end of each cluster reduction I do a single check of integrity and abort that cluster if it's not closed or valid ("explode" it to clusters of size 1 for subsequent normal union), to amortize the costs of integrity checks (which should rarely fail anyway?). The resulting clusters of size 1 are slow-unioned as usual.

I've got a working proof of concept (using CGAL::spatial_sort and CGAL::Union_find) and I'm working on switching to CGAL::box_intersection_d for faster bbox tests.

All this to say, please hold off before reviewing too much of this as I hope the code will become smaller (will also try and merge some of the LazyGeometry methods with existing cache helpers as suggested by @thehans ).

Added CGALUtils helpers: spatialSort, generalized bbox, getNumberOfFacets.

…ing CGAL packages (*massive* speedup) Uses blazing fast PMP csg operations, with in-place updates as much as possible. Detects non-manifold meshes not handled by PMP and reverts back to Nefs (to avoid costly checks, convex polysets are trusted, but also introduced --enable=trust-manifold to allow speedups on large STL imports, for instance). Also includes "fast-union" trick introduced in openscad#3636 (concatenate disjoint bodies) Speed improvements in the 10x to 100x range for my various grid patterns. More improvements to come when Polyhedron will become a Geometry subclass (should be able to cache a read-only version of them), and chain CSG operations instead of converting to PolySet as currently done.

ochafik · 2021-02-02T23:38:11Z

So, that algorithm wasn't such a good idea after all, it did find large patches of fast-unionable polysets and that bit was fast, but if there were any overlaps left it ended up unioning much larger nef polyhedra at the end, making things slower for some large models.

I've now pursued a different route in #3641, which is to use the Polygon Mesh Processing library (and to salvage the fast-union idea from this PR, but that's now almost a detail).

That PR completely obsoletes this one so I'll close this one :-)

…ing CGAL packages (*massive* speedup) Uses blazing fast PMP csg operations, with in-place updates as much as possible. Detects non-manifold meshes not handled by PMP and reverts back to Nefs (to avoid costly checks, convex polysets are trusted, but also introduced --enable=trust-manifold to allow speedups on large STL imports, for instance). Also includes "fast-union" trick introduced in openscad#3636 (concatenate disjoint bodies) Speed improvements in the 10x to 100x range for my various grid patterns. More improvements to come when Polyhedron will become a Geometry subclass (should be able to cache a read-only version of them), and chain CSG operations instead of converting to PolySet as currently done. Create scoped_timer.h Some cleanups Update test expectations (5 failing tests left) Conditional import path for PMP::is_non_manifold_vertex (moved files in https://doc.cgal.org/5.1/Polygon_mesh_processing/group__PMP__repairing__grp.html#ga121f588ac324938d9a6b6931a08661e1 Conditional import, take two [fast-csg] Require CGAL 5.0+ for now (compile fast polyhedron away in earlier versions) Most of the functionality still seems possible in 4.x (e.g. is_non_manifold_vertex available from 4.14), but that's too many hoops to go through for this initial version. Require CGAL 5.1.2 :-( include <CGAL/version.h> otherwise the feature isn't built, ever, d'oh [fast-csg] Renaming to FastPolyhedron to avoid naming clashes w/ OGL class [fast-csg] Add helpers to simplify boost::variant code Fix scoped performance timings syntax -DPERFORMANCE_TIMINGS [fast-csg] Drastically reduce manifoldness check costs w/ PMP::duplicate_non_manifold_vertices instead of quadratic calls to PMP::is_non_manifold_vertex [fast-csg] Pass manifoldness through PolySet [fast-csg] Rename: CGALPolyhedron Update test expectation (different triangulation of nonplanar polys) [fast-csg] Don't invert faces in CGALUtils::createPolyhedronFromPolySet [fast-csg] Surface_mesh cgal utils + copy fixes (exact conversions) [fast-csg] Fence cgal hash helpers behind ENABLE_CGAL Exact hull geometry conversion [fast-csg] Optimize differences + add fast-csg-exact Difference sped up by union their subtracted terms. Exact mode converts to Nefs instead of PolySet. It's transitional as the new poly class will soon become a Geometry, so conversion will only happen at the top of the visitation [fast-csg] CGALUtils: transform, getGeometryAs{NefPolyhedron,PolySet} Also, handle CGALPolyhedron in createNefPolyhedronFromGeometry [fast-csg] CGALPolyhedron extends Geometry! Also, a few helpers: - ResultObject.asMutableGeometry - Geometry.{transform, resize} Note that GeometryEvaluator force-converts to PolySet at the root level for now, for higher compatibility w/ exports [fast-csg] Hull for CGALPolyhedron [fast-csg] Fix convexity on editable geom [fast-csg] Fix a few crashes + move resize logic to CGALUtils [fast-csg] Fix a few crashes + move resize logic to CGALUtils [fast-csg] Fix a few crashes + move resize logic to CGALUtils [fast-csg] Download CGAL 5.2 in {uni,macosx}-build-dependencies.sh [fast-csg] Disable failing tests. One last regression to fix. remove files added by mistake [fast-csg] remove unused code in export_stl [fast-csg] Fix stl export regression [fast-csg] Add invert_orientation option to CGALUtils::createPolyhedronFromPolySet defaults to true for backwards compatibility [fast-csg] Attempt to have Travis download CGAL 5.2 on Linux [fast-csg] Take2 of CGAL 5.2 on travis linux [fast-csg] omit fast-csg on more tests [fast-csg] fix finding of boost for uni-build-dependencies cgal [fast-csg] Hunt for cgal in github-ci.sh [fast-csg] (attempt to) fix win build w/o fast-csg support [fast-csg] update test expectations Update lazyunion-difference-for-expected.png [fast-csg] fix build w/ old cgal versions [fast-csg] More/fix #ifdef FAST_CSG_AVAILABLE fences [fast-csg] Fix minkowksi crash + label skipped tests [fast-csg] Give CGAL 5.2 a spin in mxe builds (uses / tests my mxe branch for now, but should be upstreamed to openscad/mxe then mxe/mxe once it works) [fast-csg] Attempt to force CGAL 5.2 onto circleci setup Revert "[fast-csg] Attempt to force CGAL 5.2 onto circleci setup" This reverts commit 8f83001. [fast-csg] Try using images w/ cgal5.2 Built with: docker run -it openscad/mxe-i686-deps:latest /bin/bash git remote set-url origin https://github.com/ochafik/mxe.git git pull git checkout openscad-cgal-5.2 make MXE_PLUGIN_DIRS=plugins/gcc7 cgal MXE_TARGETS=i686-w64-mingw32.static.posix -j 4 JOBS=4 rm -fR .ccache/ tmp-* pkg/ rm -fR usr/bin/x86_64-* usr/x86_64-* usr/*/gcc/x86_64-* exit docker commit `docker ps -a | head -n 2 | tail -n 1 | awk '{ print $1 }'` ochafik/openscad-mxe-i686-deps-fast-csg docker push ochafik/openscad-mxe-i686-deps-fast-csg docker run -it openscad/mxe-x86_64-deps:latest /bin/bash git remote set-url origin https://github.com/ochafik/mxe.git git pull git checkout openscad-cgal-5.2 make MXE_PLUGIN_DIRS=plugins/gcc7 cgal MXE_TARGETS=x86_64-w64-mingw32.static.posix -j 4 JOBS=4 rm -fR .ccache/ tmp-* pkg/ rm -fR usr/bin/i686-* usr/i686-* usr/*/gcc/i686-* exit docker commit `docker ps -a | head -n 2 | tail -n 1 | awk '{ print $1 }'` ochafik/openscad-mxe-x86_64-deps-fast-csg docker push ochafik/openscad-mxe-x86_64-deps-fast-csg docker run -it openscad/appimage-x86_64-base:latest /bin/bash export CGAL_VERSION=5.2 curl -L --insecure https://github.com/CGAL/cgal/releases/download/v${CGAL_VERSION}/CGAL-${CGAL_VERSION}-library.tar.xz --output CGAL-${CGAL_VERSION}.tar.xz xz -f -d CGAL-${CGAL_VERSION}.tar.xz tar -xf CGAL-${CGAL_VERSION}.tar cd CGAL-${CGAL_VERSION} cmake . -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release make install cd .. rm -fR CGAL- exit docker commit `docker ps -a | head -n 2 | tail -n 1 | awk '{ print $1 }'` ochafik/openscad-appimage-x86_64-base-fast-csg docker push ochafik/openscad-appimage-x86_64-base-fast-csg Phony change to restart circleci (docker images hadn't finished pushing) [fast-csg] don't rely on broken custom images fix typo in circleci image names [fast-csg] only build w/ 1 CPU tp avoid OOM [fast-csg] Fix compilation w/ CGAL 5.0.2 circleci: use medium+ resource class https://circleci.com/docs/2.0/configuration-reference/#resource_class Revert "circleci: use medium+ resource class" This reverts commit 8a32b95. [fast-csg] Rename: CGALHybridPolyhedron [fast-csg] reduce object code size by splitting some utils out [fast-csg] More code split + drop unused file [fast-csg] fix build [fast-csg] remove some dead code [fast-csg] Proactively fix face orientation (this will probably break gcc again, will fix tomorrow) Revert "[fast-csg] Proactively fix face orientation" This reverts commit c523fe7. [fast-csg] Fix orientations of polygons automatically Helps with models built with sweep [fast-csg] update polyhedron-tests expectations Update polyhedron-tests-expected.png added a comment to openscad.desktop file Update release list and fix order of entries. Update release notes. Initial install target for cmake. Add SUFFIX option for installation. Add suffix replacement for desktop and appstream files. Ensure template is always replaced. Allow OPENSCAD_VERSION to be set as build parameter. Use GET instead of SUBLIST which is only available in cmake >= 3.12. Don't use VERSION_GREATER_EQUAL which is only available in cmake >= 3.7. Update console text colors for dark theme Sets the default background color to transparent and sets the foreground to black for text that has a set background color. Fix resource path with suffix for cmake build. Convert CircleCI AppImage build to cmake. Init MCAD git submodule. Update command line for gpg signing. [fast-csg] Add missing -DCGAL_USE_GMPXX to cmake [fast-csg] Smaller object files (more cgalutils splitting) + remove now-useless PolySet.manifold [fast-csg] fix build [fast-csg] More code split... cgal-minkowski.o is now largest object file, can't go down? [fast-csg] Reenable fast-csg that was just.. compiled away. [fast-csg] Add a "Compile w/ CGAL 4.x" github workflow + make circleci builds fast again Update polyhedron-tests-expected.png [fast-csg] fix tests (stlcgalpngtest_polyhedron-tests and monotonepngtest_polyhedron-tests share same golden, d'oh) [fast-csg] fix build Delete legacy-cgal.yml [fast-csg] Run tests w/ different expectations separately [fast-csg] fix skipping of fast-csg tests Update travis-ci-before-install-linux.sh [fast-csg] fast-csg with CGAL 4.x on github ci w/ cmake (+ avoid weird lazy-union interaction) [fast-cgal] one build per github workflow file? [fast-csg] update expectations (somehow these tests are only run on windows?) [fast-csg] minkowski3-erosion needs different expectation. Also CSG.scad isn't heavy w/ fast-union Cleanup some comments & dead code [fast-csg] Use corefinement w/o preconditions, check their return value & catch their exceptions. Also, support using CGAL_Kernel3 w/ latest 5.2.x branch [fast-csg] Drop difference "optimization", obsolete remnant from earlier iterations of the code [fast-csg] Rehabilitate vertex tests: found tests where corefinement leaves polys in states that nef chokes on [fast-csg] fix bbox regression (for-tests) [fast-csg] add edge-cases tests (literally) [fast-csg] Bit more logs [fast-csg] fix build w/ CGAL < 5.1 [fast-csg] fix build [fast-csg] mawr... lawgs! (let's be a bit verbose, shall we?) [fast-csg] Drop the grid! [fast-csg] Skip mysterious test (new failure only on CI - not on local mac) [fast-csg] Support CGAL 5.2.x branch from github in uni-build-dependencies.sh [fast-csg] Test more CGAL versions (including 5.2.x branch for latest fixes) [fast-csg] Attempt to fix the cgal versions github workflow matrix Update cgal-versions.yml [fast-csg] Disable shared vertices testing (unless -DFAST_CSG_TEST_SHARED_VERTICES is set) [fast-csg] Don't use corefinement on non-closed polyhedra (e.g. testdata/scad/3D/issues/issue1223.scad) [fast-csg] DIsable fast-csg for mirror-tests, as it triggers a corefinement condition that leaves polyhedra in a state that can't be converted to a nef! Update cgal-versions.yml Update cgal-versions.yml [fast-csg] Don't test w/ 5.2.x branch, it doesn't build yet /openscad/libraries/include/CGAL/boost/graph/copy_face_graph.h:28:10: fatal error: boost/iterator/function_output_iterator.hpp: No such file or directory #include <boost/iterator/function_output_iterator.hpp> [fast-csg] A single manifoldness check is enough in union

Olivier Chafik added 16 commits January 28, 2021 00:47

Add fast-union feature: fixes a cache loophole + fast-tracks unions o…

062b3ce

…f non-overlapping solids

Move no_get_cache_key_fn to LazyGeometry

a1a68c4

[fast-union] Union of exact/double bbox types using boost::variant

a984984

[fast-union] Bypass faulty PolySet::is_convex detection on concatenat…

6cf96dc

…ed polysets

[lazy-union] Fix bbox of nefs (don't convert to polyhedron)

2b18e82

[fast-union] Handle pathological cases (quantize vertices, revert to …

eee3690

…full unions)

[fast-union] Fix PolySet::append(PolySet): clear convexity mark

4283533

Crucial now that we're concatenating polysets everywhere!

[fast-union] Clear is_convex hack now that it's fixed

7fb1073

[fast-union] progress_tick and some echo

2d4478b

[fast-union] fix segfault when there's no node

14b12ed

Allow extra experimental options in cgalstlsanitytest

ff104e9

Enable fast-union in all tests: one last failure left (difference-tests)

85e40c6

Light reformat + comments

8775711

[fast-union] Fix case of empty polysets (test for empty bbox)

1a37d07

Fix some stray tabs. I hate tabs.

91f60dd

ochafik mentioned this pull request Jan 29, 2021

Exploration of even lazier unions (bubbling through modules & transforms) for large speed ups #3637

Draft

Olivier Chafik added 5 commits January 29, 2021 01:03

[fast-union] Add lazy_geometry.cc to cmake file (needed for linux build)

e1f30b9

[fast-union] Don't rely on boost::optional::has_value as missing from…

e5d90ea

… older boost version linux builds run with (should address build error on ubuntu 18, which has Boost 1.65.1 ; ubuntu 20 has Boost 1.71.0 which has has_value)

remove stale comment

40ec56d

Update render-tests-expected.png

6e20c8e

[fast-union] Remove superfluous quantizeVertices and copies from disj…

753dd74

…oint union Also allow skipping the (relatively) costly pathological case detection with `-DOPTIMISTIC_FAST_UNION` (e.g. `qmake ... QMAKE_CSSFLAGS+=-DOPTIMISTIC_FAST_UNION ...` to build on mac)

Olivier Chafik added 3 commits January 30, 2021 04:25

[fast-union] Order union by increasing facet count

5cc61ce

Aligns applyUnion3DFast on applyUnion3D's behaviour. Can be disabled w/ -DFAST_UNION_LINEAR_UNION for comparison

[fast-union] Remove FAST_UNION_LINEAR_UNION option

9d66c53

Confirmed noticeable speed up on large disjoint grids

Update highlight-modifier-expected.png

8ea55bc

Add copyright headers

be8ab70

t-paul reviewed Jan 30, 2021

View reviewed changes

Olivier Chafik added 2 commits January 30, 2021 19:54

Fix license headers to align with rest of codebase

2a9c4b2

[fast-union] Leaner, less ambiguous licence/copyright headers as sugg…

7b6e2d2

…ested by reviewer

Olivier Chafik added 5 commits February 1, 2021 01:51

[fast-union] Much faster algorithm, less code.

481619e

Added CGALUtils helpers: spatialSort, generalized bbox, getNumberOfFacets.

[fast-union] New fast-difference feature (because it's easy now)

5b4576f

Merge remote-tracking branch 'upstream/master' into fast_union

55aa17a

Update openscad.pro

e08a006

Enable fast-difference for all tests

31f7da2

ochafik changed the title ~~Optimize disjointed unions (and fix unions' cache loophole)~~ Optimize disjointed unions & differences (+ introduce caching of costly nef<->polyset conversions) Feb 1, 2021

ochafik mentioned this pull request Feb 2, 2021

Blazingly fast rendering using Polygon Mesh Processing when possible #3641

Closed

8 tasks

ochafik closed this Feb 2, 2021

ochafik mentioned this pull request Feb 27, 2023

Use Manifold for much faster & multithreaded CSG & minkowski operations #4533

Merged

12 tasks

Uh oh!

Conversation

ochafik commented Jan 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thehans commented Jan 29, 2021

Uh oh!

ochafik commented Jan 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ochafik commented Jan 29, 2021

Uh oh!

ochafik commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgtm-com bot commented Jan 30, 2021

Uh oh!

thehans commented Jan 30, 2021

Uh oh!

thehans commented Jan 30, 2021

Uh oh!

ochafik commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgtm-com bot commented Jan 30, 2021

Uh oh!

ochafik commented Jan 30, 2021

Uh oh!

t-paul Jan 30, 2021

Choose a reason for hiding this comment

Uh oh!

rcolyer Jan 30, 2021

Choose a reason for hiding this comment

Uh oh!

ochafik Jan 30, 2021

Choose a reason for hiding this comment

Uh oh!

ochafik Jan 30, 2021

Choose a reason for hiding this comment

Uh oh!

ochafik Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgtm-com bot commented Jan 30, 2021

Uh oh!

thehans commented Jan 30, 2021

Uh oh!

ochafik commented Jan 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ochafik commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ochafik commented Jan 29, 2021 •

edited

Loading

ochafik commented Jan 29, 2021 •

edited

Loading

ochafik commented Jan 30, 2021 •

edited

Loading

ochafik commented Jan 30, 2021 •

edited

Loading

ochafik Jan 30, 2021 •

edited

Loading

ochafik commented Jan 31, 2021 •

edited

Loading

ochafik commented Feb 2, 2021 •

edited

Loading