Showing posts with label Java. Show all posts
Showing posts with label Java. Show all posts

Thursday, April 13, 2023

The rJava Curse Strikes Again

Apparently I have not needed the rJava R package in a while, because when I wanted to install an R package today that has rJava as a dependency, it was not there. So I tried to install it (this is on Linux Mint), and of course it failed to install. I have a long history of installation battles with rJava (see here, here and here in chronological order ... or better still don't traumatize yourself by reading them). Why should this time be different?

All my previous battles involved older versions of Java with apparently different directory locations or structures, and none of the previous fixes worked. After considerable aggravation, I found a very helpful post by "datawookie" that nearly got the job done. I did in fact get an error message about "jni" and used the trick in datawookie's post of setting JAVA_HOME to the correct path (in my case to Open JDK 17) as an argument to javareconf. When I then attempted to install rJava, I got a different error ("could not find -lbz2"), which prompted me to install libbz2.dev ("sudo apt-get install libbz2.dev"), after which R was finally able to install rJava (woo-hoo!).

I'd say this is getting ridiculous, but we passed that milestone years ago.

Monday, May 10, 2021

Symmetric Difference of Sets in Java

The symmetric difference of two sets $A$ and $B$ is defined as $$C=(A \cup B)\backslash (A \cap B).$$It is the set of objects that belong to either $A$ or $B$ but not both. I've been working on some Java code for a research project in which I will need to compute the sizes of the symmetric differences of a large number of pairs of sets. The contents of the sets are nonnegative integers (Java type Integer), which makes life a bit simpler because integers are nicely ordered. (In Java-speak, Integer implements the Comparable<Integer> interface.) Since I will be doing a large number of differences, and since the sets involved will be moderately large (by my standards), I wanted to find the fastest way to compute a difference. So I wrote a Java program to generate random pairs of integer sets and compute the size of their symmetric difference four different ways.

Since the introduction of streams in Java (I believe in version 8), what I think is the most obvious/intuitive way to do it is to convert each set to a stream, filter out members of the other set, and count up the survivors. On the other hand, I am a happy user of the Apache Commons Collections library, whose CollectionsUtils class provides not one but two ways to do this. One is to use the subtract() method twice, switching the order of the arguments. This does the same thing that my first (stream-based) method does. The other is to use the disjunction() method, which calculates the symmetric difference in one invocation.

Equally obvious to me from a mathematical perspective, but not from a Java perspective, is to take the bitwise exclusive OR of the characteristic vectors of the two sets. Java contains a BitSet class, with set() and flip() methods for individual bits, which makes this easy to do.

My program runs the experiments on instances of both HashSet<Integer> and TreeSet<Integer>. Hash sets are generally faster, but tree sets have more features (and impose an internal ordering on their contents). My somewhat naive intuition was that having the contents accessed in ascending order would make differencing tree sets faster than differencing hash sets.

There are lots of moving parts here (how large the integers get, how big the sets involved in the comparisons are, ...), so I hesitate to read too much into the results. That said, here are some timing results using sets of cardinality between 100 and 200 with integers from 0 to 100,000. Times are in microseconds and are averages of 10,000 replications. (You can afford big samples when things go this quickly.)


Method Streams subtract()
disjunction() BitSet
HashSet 10.07 36.40 62.15 9.68
TreeSet 23.79 35.37 60.94 7.21

 

I was surprised (OK, shocked) to find that the disjunction() method, which directly computes the symmetric difference, was almost twice as slow as two invocations of the subtract() method. Less surprising was that turning both sets into streams and filtering each other out was faster than calling subtract() twice. (Note that I am computing the size of the symmetric difference, not the symmetric difference itself. Uniting the two filtered streams or the two sets resulting from calls to subtract() into one combined set might move their times closer to those of disjunction()).

Another surprise to me was that the approach using streams was consistently slower with tree sets than it was with hash sets, despite the tree sets being inherently ordered. The characteristic vector (BitSet) approach seemed to have a slight preference for tree sets, but I'm not entirely sure that was consistent.

In this particular run, the characteristic vector approach (using BitSet) beat all comers. In other runs with similar parameters, streams sometimes beat BitSet and sometimes did not when the sets were instances of HashSet. With TreeSet, using BitSet appeared to be consistently faster. Let's contrast that with a second experiment, in which the integers are still between 0 and 100,000 but the sets are smaller (cardinality between 10 and 20).


Method Streams subtract()
disjunction() BitSet
HashSet 3.18 5.96 9.75
6.24
TreeSet 3.11
4.89
8.10
4.07

 

Note that with the smaller sets the method using streams beats the characteristic vector (BitSet) method, and even the Commons Collections subtract() method may be a bit better than using the characteristic vector.

For my project, I am using instances of TreeSet, and I'm fairly certain the sets will (mostly) have cardinality in the low hundreds, so I will probably go with the BitSet approach. If anyone would like to run their own tests, my code is available in a GitLab repository.

Update: It belatedly occurred to me that in my research project, where I only need to get the size of the symmetric difference and not the symmetric difference itself, it might make sense to exploit the fact that the size of the symmetric difference between $A$ and $B$ is$$\vert A\vert + \vert B \vert - 2 \cdot \vert A \cap B\vert.$$So I can compute the size of the intersection, do a little arithmetic, and have the size of the symmetric difference.

I updated my code to include two versions of this approach. One version computes the intersection by streaming one set and filtering it based on inclusion in the other set. The other version uses the CollectionUtils.intersection() method. Once again, the CollectionUtils method was not competitive. Comparing the stream intersection method to the original stream approach and the characteristic vector approach using the original specifications for sets (cardinality 100 to 200), it seems the streamed intersection method is about twice as fast as the original stream method on both hash sets and tree sets (which makes sense, since it does with one stream what the original method did with two). It is also about twice as fast as the characteristic vector method on hash sets, but roughly half as fast on tree sets, as seen in the table below. (Times for the first two methods differ from previous tables because exact timings are unfortunately not reproducible.)


Method Streams BitSet Intersection
HashSet 11.33 10.33 5.59
TreeSet 23.32 7.47
12.23

 

So the single stream intersection approach looks to be the fastest for computing the size (but, again, not contents) of the symmetric difference if I'm using hash sets, while the characteristic vector (BitSet) approach is fastest if I'm using tree sets.

Wednesday, April 28, 2021

A BDD with JGraphT

Decision diagrams, and in particular binary decision diagrams (BDDs) [1], were originally introduced in computer science to evaluate logic propositions or boolean functions. Lately, they've taken on multiple roles in discrete optimization [2]. I've been reading an excellent book [3] on them, with ideas about using BDDs in a current research project. As is my wont, I'll be coding the research in Java, so I wanted to do a little demo project to figure out how to build and process a BDD in Java.

Not wanting to reinvent any wheels, I looked for an open-source Java graph library with which to build the diagrams, and settled on JGraphT [4]. Not only does JGraphT have the necessary building blocks, it has much better online documentation than many libraries. Also, there is a very helpful article [5] about it on the Baeldung web site (which is itself an extremely useful site for all things Java).

A BDD is a directed, acyclic, layered multigraph with edge weights. If you're not familiar with the term "multigraph", it means that there can be two or more distinct edges between the same pair of nodes, in the same direction. In a BDD, each node represents a state of the system, with up to two outbound arcs, corresponding to true (1) or false (0) values for a particular decision variable. The decision variable is the same for all nodes in a particular layer. An arc is omitted if it represents a decision which, given the state, would make the solution infeasible. To keep the size of the BDD in check (somewhat), you do not want multiple nodes in a layer with the same state. The multigraph aspect arises because, in some circumstances, the next state may be the same regardless of the decision at the current node (so both arcs go to the same child node). Among the attractions of the JGraphT library are its support for nodes based on arbitrary classes (which in a BDD means the state at the node) and for multigraphs.

To learn how to build BDDs with JGraphT, I decided to solve a maximal independent set problem (MISP) [6] with integer node weights. This means choosing the subset of nodes with greatest total weight such that no two chosen nodes are adjacent. JGraphT contains routines to generate some well-known (to graph theorists -- less well known to me) graphs, and for convenience I chose the Chvátal graph [7], which has 12 nodes and 24 edges. Here is an illustration of the Chvátal graph, with the (randomly generated) node weights in parentheses.

Chvatal graph
My Java program uses routines in the JGraphT library to turn the graph into a DOT file [8], which it saves. I then use GraphViz [9] outside the Java program to convert the DOT file into the format I need for wherever the plot is going.

Using the same DOT export trick, I managed to generate a plot of the BDD, in which nodes display the set of vertices still available for addition to the independent set, arcs are solid if a vertex is being added to the independent and dotted if not, and solid arcs are annotated with the number of the vertex being added.

BDD graph
Unfortunately, Blogger does not accept SVG images and the BDD is a bit too big for a legible PNG graph. If you want to see a better image, click it and an SVG version should open in a new window or tab.

This post is already a bit long, so I won't go into details about the various coding issues I ran into or how I worked around them. I will point out one minor mathematical issue. Since the MISP is a maximization problem, the goal is to find the longest (in terms of weight, not number of edges) path from root node to terminal node in the BDD. JGraphT has a package containing shortest path algorithms, but no longest path algorithms. Fortunately, the number of layers in the graph is fixed (one layer per decision variable, plus one to hold the terminal node), which means the number $L$ of links in a longest path is fixed. So we simply find the maximum weight $W$ of any node in the graph, change the weight $w_e$ of each edge $e$ to $LW - w_e$, and find the shortest path using the modified weights. That path is guaranteed to be the longest path with respect to the original weights.

Last thing: As usual, my code is available for you to play with from my GitLab repository.

References

[1] Wikipedia entry: Binary decision diagram
[3] Bergman, D.; Cire, A. A.; van Hoeve, W.-J. & Hooker, J. Decision Diagrams for Optimization. Springer International Publishing AG, 2016.
[4] JGraphT library
[6] Wikipedia entry: Maximal independent set
[7] Wikipedia entry: Chvátal graph
[9] Graphviz - Graph Visualization Software

Wednesday, April 21, 2021

Lagrangean Relaxation: The Sequel

In a previous post, I looked at a way to solve a multiple assignment problem (where multiple users can be assigned to each server and each user can be assigned to multiple servers) using Lagrangean relaxation (LR). I won't repeat the details of the problem, or why LR was of interest, here. The post included some computational experiments in R, using CPLEX to get the optimal solution (for confirmatory purposes) and then trying out various nonlinear optimization algorithms on the Lagrangean function.

I've been looking for an open-source, derivative-free nonlinear optimizer (capable of taking box constraints) in Java, and I came across a couple in the Apache Commons Mathematics Library. Wanting to test one of them out, I repeated the experiment with the assignment problem in Java, again using CPLEX to get the optimal solution, and using the BOBYQA algorithm for minimizing the Lagrangean. As is my habit, I've made my Java code available via a GitLab repository for anyone who might want to see it. The Apache Commons library is a bit funky when it comes to using the optimization classes, so I had to do a little trial and error (and considerable staring at the Javadocs), along with a web search for examples. Hopefully my code is simple enough to be easy to digest.


Sunday, August 23, 2020

Multiobjective Optimization in CPLEX

In my previous post, I discussed multiobjective optimization and ended with a simple example. I'll use this post to discuss some of the new (as of version 12.9) features in CPLEX related to multiobjective optimization, and then apply them to the example from the previous post. My Java code can be downloaded from my GitLab repository.

Currently (meaning as of CPLEX version 12.10), CPLEX supports multiple objectives in linear and integer programs. It allows mixtures of "blended" objective functions (weighted combinations of original criteria) and "lexicographic" hierarchical objectives. Basically, you set one or more hierarchy (priority) levels, and in each one you can have a single criterion or a weighted combination of criteria. So the "classical" preemptive priority approach would involve multiple priority levels with a single criterion in each, while the "classical" weighted combination approach would involve one priority level with a blended objective in it. Overall, you are either maximizing or minimizing, but you can use negative weights for criteria that should go the opposite direction of the rest. In the example here, which is a minimization problem, the third priority level gives maximum provider utilization a weight of +1 (because we want to minimize it) and minimum provider utilization a weight of -1 (because we want to maximize it).

There are some limitations to the use of multiple objectives. The ones I think are of most general interest are the following:

  • objectives and constraints must be linear (no quadratic terms); and
  • all generic callbacks and legacy information callbacks can be used, but other legacy callbacks, and in particular legacy control callbacks (branch callbacks, cut callbacks etc.) cannot be used. So if you need to use callbacks with a multiobjective problem, now would be a good time to learn the new generic callback system.

Every criterion you specify has a priority level and, within that priority level, a weight. A feature that I appreciate, and which I will use in the code, is that you can also specify an absolute and/or a relative tolerance for each criterion. The tolerances tell CPLEX how much it can sacrifice in that criterion to improve lower priority criteria. The default tolerance is zero, meaning higher priority criteria must be optimized before lower priority criteria are even considered. A nonzero tolerance basically tells CPLEX that is allowed to sacrifice some amount (either an absolute amount or a percentage of the optimal value) in that criterion in order to improve lower priority criteria.

Defining the variables and building the constraints of a multiobjective model is no different from a typical single criterion model. Getting the solution after solving the model is also unchanged. The differences come mainly in how you specify the objectives and how you invoke the solver.

To build the objective function, you need to use one of the several overloads of IloCplex.staticLex(). They all take as first argument a one dimensional array of expressions IloNumExpr[], and they all return an instance of the new interface IloCplexMultiCriterionExpr. In addition to an array of objective expressions, one of the overloads lets you also specify arrays of weights, priorities and tolerances (absolute and relative). That's the version used in my sample code. 

This brings me to a minor inconvenience relative to a conventional single objective problem. Ordinarily, I would use IloCplexModeler.addMinimize(expr) or IloCplexModeler.addMaximize(expr) to add an objective to a model, where expr is an instance of IloNumExpr. I naively thought to do the same here, using the output of staticLex() as the expression, but that is not (currently) supported. There's no overload of addMinimize() or addMaximize() that accepts a multicriterion expression. So it's a three step process: use cplex.staticLex(...) to create the objective and save it to a temporary variable (where cplex is your IloCplex instance); pass that variable to either cplex.minimize(...) or cplex.maximize(...) and save the resulting instance of IloObjective in a temporary variable; and then invoke cplex.add(...) on that variable.

When you are ready to solve the model, you invoke the solve() method on it. You can continue to use the version of solve() that takes no arguments (which is what my code does), or you can use a new version that takes as argument an array of type IloCplex.ParameterSet[]. This allows you to specify different parameter settings for different priority levels.

Other methods you might be interested in are IloCplex.getMultiObjNSolves() (which gets the number of subproblems solved) and IloCplex.getMultiObjInfo() (which lets you look up a variety of things that I really have not explored yet).

The output from my code (log file), which is in the repository, is too lengthy to show here, but if you want you can use this link to open it in a new tab. Here's a synopsis. I first optimized each of the three objective functions separately. (Recall that maximum and minimum provider utilization are blended into one objective.) This gives what is sometimes referred to as the "Utopia point" or "ideal point". This is column "U" in the table below. Next, I solved the prioritized multiobjective problem. The results are in column "O" of the table. Finally, to demonstrate the ability to be flexible with priorities, I resolved the multiobjective problem using a relative tolerance of 0.1 (10%) for the top priority objective (average distance traveled) and 0.05 (5%) for the second priority objective (maximum distance traveled). Those results are in column "F".


U O F
Avg. distance 14.489 14.489 15.888
Max distance 58.605 58.605 60.000
Utilization spread 0.030 0.267 0.030
Max utilization 0.710 0.880 0.710
Min utilization 0.680 0.613 0.680

There are a few things to note.

  1. The solution to the multiobjective model ("O") achieved the ideal values for the first two objectives. One would expect to match the ideal value on the highest priority objective; matching on the second objective was luck. The third objective (utilization spread) was, not surprisingly, somewhat worse than the ideal value.
  2. Absolute and relative tolerances appear to work the same way that absolute and relative gap tolerances do: if a solution is within either absolute or relative tolerance of the best possible value on a higher priority objective, it can be accepted. In the third run, I set relative tolerances but let the absolute tolerances stay at default values.
  3. The relative tolerances I set in the last run would normally allow CPLEX to accept a solution with an average travel distance as large as $(1 + 0.1)*14.489 = 15.938$ and a maximum travel distance as large as $(1 + 0.05)*58.605 = 61.535$. There is a constraint limiting travel distance to at most 60, though, which supersedes the tolerance setting.
  4. The "flexible" solution (column "F") exceeds the ideal average distance by about 9.7%, hits the cap of 60 on maximum travel distance, and actually achieves the ideal utilization spread. However, without knowing the ideal point you would not realize that last part. I put a fairly short time limit (30 seconds) on the run, and it ended with about a 21% gap due to a very slow-moving best bound.

I'll close with one last observation. At the bottom of the log, after solving the "flexible" variant, you will see the following lines.

Solver status = Unknown.
Objective 0: Status = 101, value = 14.489, bound = 14.489.
Objective 1: Status = 101, value = 58.605, bound = 58.605.
Objective 2: Status = 107, value = 0.030, bound = 0.023.
Final value of average distance traveled = 15.888.
Final value of longest distance traveled = 60.000.
Final value of maximum provider utilization = 0.710.
Final value of minimum provider utilization = 0.680.

The first four lines are printed by CPLEX, the last four by my code. Note the mismatch in objective values of the first two criteria (bold for CPLEX, italic for my results). CPLEX prints the best value it achieved for each objective before moving on to lower priority objectives. When you are using the default tolerances of zero (meaning priorities are absolute), the printed values will match what you get in the final solution. When you specify non-zero tolerances, though, CPLEX may "give back" some of the quality of the higher priority results to improve lower priority results, so you will need to recover the objective values yourself.

Thursday, August 20, 2020

Multiobjective Optimization

Multiobjective optimization (making "optimal" decisions involving multiple, frequently conflicting, criteria) is a big subject. I will only nibble at the fringes of it here. In the next post, I'll describe recent additions to CPLEX that facilitate solving some multiobjective problems.

Among the various approaches to multiobjective problems, two are probably the most common, weighting and prioritization. The first approach is to merge the various criteria into a single one, usually (almost always?) by taking a weighted sum of the criteria. The CPLEX documentation refers to this as a blended objective. For this to make sense, the units of the various criteria really should be commensurable (e.g., all monetary values), but I'm pretty sure having criteria that are not commensurable doesn't stop people from trying. The weights serve two roles. First, they bring the units into some semblance of parity (so if $f()$ is in dollars and $g()$ in millions of dollars, $g()$ gets a weight roughly on millionth the size of the weight of $f()$). Second, they convey relative importance of various criteria.

The second approach is to prioritize the criteria. The solver initially optimizes the highest priority criterion, without regard to any others. Once an optimal value of the highest priority criterion is known, maintaining that value becomes a constraint, and the solver moves to the second highest priority criterion, and so on. The CPLEX documentation refers to this as a lexicographic objective, meaning that the objective function is vector-valued rather than scalar-valued, and optimization means achieving the lexicographically largest or smallest objective vector possible. A variant of this allows a little "slippage" in the value of each criterion, so that for example the solver can accept a solution that is 1% below optimal on the first criterion in return for optimizing the second criterion. A key limitation here is the solver will trade any amount of degradation in a lower priority criterion, no matter how much, for any improvement in a higher priority criterion, no matter how small.

Although they are not relevant to the recent CPLEX additions, I will mention two other approaches. One is a variant of the priority method, known as goal programming (GP). This was originally developed as an extension of linear programming, but the same general approach can be extended to problems with integer variables. The user sets target levels for each criterion, and then prioritizes them. If a goal is underachieved, work on meeting lower priority goals cannot sacrifice any amount of the higher priority criterion. On the other hand, if a goal is overachieved, any portion of the overachievement can be sacrificed in the quest to reach a lower priority goal. An interesting attribute of goal programming is that the same criterion can be used with more than one goal. Suppose that you are building a GP model allocating a budget to various conservation projects. Your highest priority goal might be to allocate at least 50% of the budget to projects in underserved communities (USCs, to save me typing, with apologies to the universities of South Carolina and Southern Califonia). Your second highest priority goal might be to allocate at least 30% of the budget to projects with matching funds from outside sources. Your third highest priority goal might be to allocate at least 75% of the budget to USCs. The other approach is to investigate the Pareto frontier, the set of all solutions for which no other solution does as well in all criteria and better in at least one. In essence, you want to present the decision-maker with the entire Pareto frontier and say "here, pick one". In practice, computing the Pareto frontier can be very computationally expensive, and trying to make sense of it might cause the decision maker to melt down.

To close this post, I'll pose a small sample problem and formulate the model for it. Suppose that we have $N$ patients in a health care system and $M$ providers, and that each patient needs to be assigned to a single provider. Provider $j$ has a limit $c_j$ on the number of patients they can handle. (To keep the example simple, and at the expense of some realism, we treat all patients as identical with regard to their capacity consumption.) We are given a matrix $D\in \mathbb{R}^{N\times M}$ of distances from patients to providers, as well as a cap $D_{max}$ on the distance that a patient can be required to travel. There are four criteria to be considered:

  • the average distance patients will travel (minimize, highest priority);
  • the maximum distance any patient must travel (minimize, second highest priority);
  • the maximum utilization of any provider as a fraction of their capacity (minimize, tied for third highest priority); and
  • the minimum utilization of any provider as a fraction of their capacity (maximize, tied for third highest priority).

So we have a mix of three things to minimize and one to maximize, with the last two criteria combining to somewhat level the workload across providers. 

Let $x_{ij}$ be 1 if patient $i$ is assigned to provider $j$ and 0 if not, let $w$ be the longest distance traveled by any patient, let $y_j$ be the fraction of provider $j$'s capacity that is utilized, and let $z_{lo}$ and $z_{hi}$ be the minimum and maximum capacity utilization rates, respectively (where 0 means the provider is unused and 1 means the provider is operating at capacity). The objective expression is $f\in\mathbb{R}^3$, whose lexicographic minimum we seek, where

\[ f=\left[\begin{array}{c} \frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M}d_{ij}x_{ij}\\ w\\ z_{hi}-z_{lo} \end{array}\right]. \]

The first and second components of $f$ are the average and maximum client travel distances. The third component is a weighted mix of maximum and minimum provider utilization, where the weights (+1, -1) are equal in magnitude to reflect the equal importance I am assigning to them and the negative coefficient for minimum utilization allows it to be maximized in what is otherwise a minimization problem.


The constraints of the model are easy to state:

\begin{align*} \sum_{j=1}^{M}x_{ij} & =1\quad\forall i\in\left\{ 1,\dots,N\right\} & (1)\\ d_{ij}x_{ij} & \le w\quad\forall i\in\left\{ 1,\dots,N\right\} ,\forall j\in\left\{ 1,\dots,M\right\} & (2)\\ \frac{1}{c_{j}}\sum_{i=1}^{N}x_{ij} & =y_{j}\quad\forall j\in\left\{ 1,\dots,M\right\} & (3)\\ y_{j} & \le z_{hi}\quad\forall j\in\left\{ 1,\dots,M\right\} & (4)\\ y_{j} & \ge z_{lo}\quad\forall j\in\left\{ 1,\dots,M\right\} & (5)\\ x & \in\left\{ 0,1\right\} ^{N\times M} & (6)\\ x_{ij} & =0\quad\forall i,j\ni d_{ij}>D_{max} & (7)\\ y & \in\left[0,1\right]^{M} & (8)\\ z_{hi},z_{lo} & \in\left[0,1\right] & (9)\\ w & \in\left[0,D_{max}\right] & (10) \end{align*} 

  • Constraint (1) ensures that each patient is assigned to exactly one provider.
  • Constraint (2) defines $w$, the maximum distance traveled.
  • Constraint (3) defines the fraction $y_j$ of capacity used at each provider $j$.
  • Constraints (4) and (5) define $z_{lo}$ and $z_{hi}$.
  • Constraints (6), (8), (9) and (10) define variable domains. The upper bound of 1 for $y_j$ in (8) ensures that no provider is assigned more patients than their capacity allows.
  • Constraint (7) enforces the travel distance limit $D_{max}$ by preventing any assignments that would violate the limit (effectively removing those assignment variables from the model).

In the next post, I will show how to solve the model using CPLEX (with, as usual, the Java API).

 

Tuesday, February 11, 2020

Collections of CPLEX Variables

Recently, someone asked for help online regarding an optimization model they were building using the CPLEX Java API. The underlying problem had some sort of network structure with $N$ nodes, and a dynamic aspect (something going on in each of $T$ periods, relating to arc flows I think). Forget about solving the problem: the program was running out of memory and dying while building the model.

A major issue was that they allocated two $N\times N\times T$ arrays of variables, and $N$ and $T$ were big enough that $2N^2T$ was, to use a technical term, ginormous. Fortunately, the network was fairly sparse, and possibly not every time period was relevant for every arc. So by creating only the IloNumVar instances they needed (meaning only for arcs that actual exist in time periods that were actually relevant), they were able to get the model to build.

That's the motivation for today's post. We have a tendency to write mathematical models using vectors or matrices of variables. So, for instance, $x_i \, (i=1,\dots,n)$ might be an inventory level at each of $n$ locations, or $y_{i,j} \, (i=1,\dots,m; j=1,\dots,n)$ might be the inventory of item $i$ at location $j$. It's a natural way of expressing things mathematically. Not coincidentally, I think, CPLEX APIs provide structures for storing vectors or matrices of variables and for passing them into or out of functions. That makes it easy to fall into the trap of thinking that variables must be organized into vectors or matrices.

Last year I did a post ("Using Java Collections with CPLEX") about using what Java calls "collections" to manage CPLEX variables. This is not unique to Java. I know that C++ has similar memory structures, and I think they exist in other languages you might use with CPLEX. The solution to the memory issue I mentioned at the start was to create a Java container class for each combination of an arc that actually exists and time epoch for which it would have a variable, and then associate instances of that class with CPLEX variables. So if we call the new class AT (my shorthand for "arc-time"), I suggested the model owner use a Map<AT, IloNumVar> to associate each arc-time combination with the variable representing it and a Map<IloNumVar, AT> to hold the reverse association. The particular type of map is mostly a matter of taste. (I generally use HashMap.) During model building, they would create only the AT instances they actually need, then create a variable for each and pair them up in the first map. When getting a solution from CPLEX, they would get a value for each variable and then use the second map to figure out for which arc and time that value applied. (As a side note, if you use maps and then need the variables in vector form, you can apply the values() method to the first map (or the getKeySet() method to the second one), and then apply the toArray() method to that collection.)

Now you can certainly get a valid model using just arrays of variables, which was all that was available to me back in the Dark Ages when I used FORTRAN, but I think there are some benefits to using collections. Using arrays requires you to develop an indexing scheme for your variables. The indexing scheme tells you that the flow from node 3 to node 7 at time 4 will be occupy slot 17 in the master variable vector. Here are my reasons for avoiding that.
  • Done correctly, the indexing scheme is, in my opinion, a pain in the butt to manage. Finding the index for a particular variable while writing the code is time-consuming and has been known to kill brain cells.
  • It is easy to make mistakes while programming (calculate an index incorrectly).
  • Indexing invites the error of declaring an array or vector with one entry for each combination of component indices (that $N\times N\times T$ matrix above), without regard to whether you need all those slots. Doing so wastes time and space, and the space, as we saw, may be precious.
  • Creating slots that you do not need can lead to execution errors. Suppose that I allocating a vector IloNumVar x = new IloNumVar[20] and use 18 slots, omitting slots 0 and 13. If I solve the model and then call getValues(x), CPLEX will throw an exception, because I am asking for values of two variables (x[0] and x[13]) that do not exist. Even if I create variables for those two slots, the exception will occur, because those two variables will not belong to the model being solved. (There is a way to force CPLEX to include those variables in the model without using them, but it's one more pain in the butt to deal with.) I've lost count of how many times I've seen messages on the CPLEX help forums about exceptions that boiled down to "unextracted variables".
So my advice is to embrace collections when building models where variables do not have an obvious index scheme (with no skips).

Thursday, October 17, 2019

A Value-ordered Java Map

For a project I'm working on (in Java), I wanted to store pairs of items in a key-value map. The catch is that I need the items sorted according to the values, not the keys. Java provides an efficient, presorted mapping in the TreeMap class, but it sorts on the keys, not the values. Revering the roles of keys and values is not an option for me, as I might have several keys mapped to the same value.

A Google search wasn't particularly helpful, other than to confirm that I was screwed. The most practical suggestions I saw revolved around the theme of using a conventional mapping (HashMap, TreeMap, whatever) and then, as needed, dumping the entries into a list or stream and sorting that using a comparator based on the values. For my application, though, this would mean modifying the map contents, exporting it and sorting it thousands of times, which would put a big drag on the application. So I really wanted something moderately efficient that would maintain the map contents in sorted order. Eventually, I gave up looking for a solution and wrote one myself. I gave my class the not entirely compact name ValueOrderedMap. It's generic, meaning you can specify any types for keys and values (as long as both types are comparable).

I've implemented most of the commonly used methods associated with map classes (but not all known map methods), and I've tried to make it thread-safe (but as yet have not tested that). Anyone who wants to take it for a spin is welcome to. It's released under a Creative Commons license. There are only two classes, and only one (ValueOrderedMap) is essential; the other is just a small demo program. You can find the source code in my campus GitLab repository. There's an issue tracker (requires registration) where you can report bugs or ask for new features. If you just want a binary jar file (along with the Javadoc, README and license files), I've parked a zip archive on my faculty website.

Saturday, July 20, 2019

Using Java Collections with CPLEX

Disclaimer: What follows is specific to Java, but with some name changes will also apply to C++. If you are using one of the other programming APIs for CPLEX, something analogous may exist, but I would have no idea about it.

I've seen a few questions by CPLEX users on forums recently that suggest the questioners may be moving from using a modeling language for optimization to using a general purpose language. Modeling languages tend to be more expressive (at least in my opinion) when it comes to writing optimization models. That's hardly shocking given that they are built for that purpose (and general programming languages are not). Recent questions have been along the lines of the following: I was using a structure (a tuple, or something similar) supported by the modeling language, and I don't know how to translate that to the general language; or I was creating arrays of variables / constraints / whatever indexed by something human-readable (probably names of things) and I don't know how to switch to integer-indexed arrays and keep it readable.

In my mind, the answer to those issues in Java is to make good use of classes and the Java Collection interface (and its various implementations). I'll give some examples in the context of a network flow problem.

In a modeling language, I might use a tuple to represent an arc. The tuple would consist of some identifier (string? index?) for the tail node of the arc, some identifier for the head node of the arc, and maybe an arc weight or some other arc property. In Java, I would create a class named Node to represent a single node, with fields including the node's name or other unique identifier and any costs, weights or other parameters associated with the node. Then I would create a class named Arc with fields of type Node holding the tail and head nodes, along with fields for any parameters associated with the arc (such as cost), and maybe a string field with a human-friendly label for the arc.

Now suppose I have an instance of IloCplex named cplex, and that I need a variable for each arc representing the flow on the arc. The conventional (?) approach would be to create a one dimensional array of IloNumVar instances, one per arc, and park the variables there. In fact, CPLEX has convenience methods for creating vectors of variables. The catch is that it may be hard to remember which index corresponds to which arc. One partial answer is to attach a label to each variable. The various methods in the CPLEX API for creating variables (and constraints) all have versions with a final string argument assigning a name to that variable or constraint. This works fine when you print out the model, but it's not terribly helpful while you're doing the coding.

I pretty much always use that feature to assign labels to variables and constraints, but I also employ collections in various ways. In my hypothetical network flow example, I would do something like the following.

HashSet<Node> nodes = new HashSet<>();
// Fill nodes with all node instances.
HashSet<Arc> arcs = new HashSet<>();
// Fill arcs with all arc instances.
HashMap<Arc, IloNumVar> flowVars = new HashMap<>();
HashMap<IloNumVar, Arc> flowArcs = new HashMap<>();
for (Arc a : arcs) {
  IloNumVar x = cplex.numVar(0, Double.MAX_VALUE, "Flow_" + a.getID());
  flowVars.put(a, x);
  flowArcs.put(x, a);
}

I put the nodes and arcs in separate collections (I used sets here, but you could equally well use lists), and then I create mappings between model constructs (in this case, arcs) and CPLEX constructs (in this case, variables). Collections can be iterated over, so there is no need to mess around with going back and forth between model constructs and integer indices. For each arc, I create a variable (and give it a name that includes the string identifier of the arc -- I'm assuming here that I gave the Arc class a getID method). I then map the arc to the variable and the variable to the arc. Why two maps? The arc to variable map lets me grab the correct variable while I'm building the model. For instance, a flow balance constraint would involve the flow variables for all arcs leading into and out of a node. I would identify all those arcs, park them in a collection (list or set), iterate over that collection of arcs, and for use the flowVars map to get the correct variables.

The reverse mapping I set up comes into play when the solver has a solution I want to inspect. I can fetch the value of each variable and map it to the corresponding variable, along the following lines.

HashMap<IloNumVar, Double> solution = new HashMap<>();
for (IloNumVar x : flowArcs.keySet()) {
  solution.put(x, cplex.getValue(x));
}

When I need to associate variable values with arcs, I can iterate over solution.entrySet(). For each entry e, flowArcs.get(e.getKey()) gives me the arc and e.getValue() gives me the flow on the arc.

Similar things can be done with constraints (IloRange instances).

Saturday, June 22, 2019

Evaluating Expressions in CPLEX

There's a feature in the Java API for CPLEX (and in the C++ and C APIs; I'm not sure about the others) that I don't see mentioned very often, possibly because use cases may not arise all that frequently. It became relevant in a recent email exchange, though, so I thought I'd highlight it. As usual, I describe it in the context of Java and let users of other APIs figure out the corresponding syntax.

Anyone who uses CPLEX knows how to use IloCplex.getValue() or IloCplex.getValues() to retrieve the values of the model variables in the final solution. What might fly under the radar is that there is an overload that takes as an argument an expression involving the variables (an instance of IloNumExpr), evaluates that expression, and returns the value. This is more a convenience than an essential feature: armed with the variable values, you could presumably compute the expression value yourself. The convenience aspects are not trivial though. Using getValue() to evaluate an expression saves you having to retrieve coefficients from someplace and likely run through a loop each time you evaluate the expression. (You do have to create the expression and store a pointer to it, but you do that once.) Also, if the only reason you need the variable values is to evaluate the expression, it saves you having to use getValue() to get the variable values.

Where might you use this? It's possible to use it to compute slacks for cuts you generate somewhere. (Use it to evaluate the cut expression, then subtract the relevant bound on the cut inequality and you have the slack.) It's also possible to use it to track secondary criteria or objectives that you did not build into the model. I cobbled together some code to demonstrate the latter use, which is now available from my campus GitLab repository. The underlying problem comes from an old post by Erwin Kalvelagen ("Minimizing Standard Deviation"). It involves loading pallets and minimizing the standard deviation of the pallet loads. Let $p$ be the vector of pallet loads and $\mu$ the mean load per pallet. My code creates expressions for both $\parallel p-(\mu,\dots,\mu)\parallel_{1}$ and $\parallel p-(\mu,\dots,\mu)\parallel_{2}^{2}$, minimizes the $L_1$ norm (which is computationally easier than minimizing the $L_2$ norm), and evaluates for each newly found solution (in an incumbent callback) and for the final solution.

The use in a callback brings up an important point. There are getValue() methods in several classes, and you need to be a bit careful about which one you are using. IloCplex.getValue() evaluates the expression using the final solution, and will generate error 1217 (no solution exists) if you try to use before the solver is done. That in particular means you cannot use it in a callback. Fortunately, the relevant callbacks have their own versions. IloCplex.LazyConstraintCallback.getValue() and IloCplex.IncumbentCallback.getValue() evaluate expressions using the new candidate and accepted integer-feasible solution, respectively. Other callbacks (solve, branch, user cut) evaluate using the solution to the LP relaxation. To use any of them, call this.getValue(...) or just getValue(...). Do not call mip.getValue(...) inside a callback, where mip is the problem you are solving: that will invoke IloCplex.getValue() and trigger the aforementioned error.

Note that there are also getSlack() and getSlacks() methods for computing the slack in a constraint. The former takes a single instance of IloRange as argument and the latter takes a vector of IloRange instances. Like getValue(), there are separate versions in IloCplex and in the callback classes. I assume they behave the same way that getValue does, but I have not actually checked that.

Finally, if you have switched to generic callbacks, there is good news. The IloCplex.Callback.Context class (the class used by the argument to your callback function) has getCandidateValue(), getIncumbentValue() and getRelaxationValue() for evaluating expressions based on a proposed new integer-feasible solution, the current best integer-feasible solution and the current node LP solution, respectively. (IloCplex.getValue() remains as it was.) Not only can you do the same evaluations, but the method names are utterly unambiguous. Gotta love that!

Friday, June 14, 2019

A Java Container for Parameters

A few days ago, I posted about a Swing class (and supporting stuff) that I developed to facilitate my own computations research, and which I have now made open-source in a Bitbucket repository. I finally got around to cleaning up another Java utility class I wrote, and which I use regularly in experiments. I call it ParameterBlock. It is designed to be a container for various parameters that I need to set during experiments.

It might be easiest if I start with a couple of screen shots. The first one shows a Swing application I was using in a recent project. Specifically, it shows the "Settings" menu, which has multiple entries corresponding to different computational stages (the overall solver, an initial heuristic, two phases of post-heuristic number crunching), along with options to save and load settings.

Parameter settings menu

Computational research can involve a ton of choices for various parameters. CPLEX alone has what seems to be an uncountable number of them. In the Dark Ages, I hard-coded parameters, which meant searching the source code (and recompiling) every time I wanted to change one. Later I graduated to putting them on the command line, but that gets old quickly if there are more than just a handful. When I started writing simple Swing platforms for my work (like the one shown above), I added menu options to call up dialogs that would let me see the current settings and change them. Over time, this led me to my current solution.

I put each collection of parameters in a separate subclass of the (abstract) ParameterBlock class. So clicking on "Solver" would access on subclass, clicking on "Heuristic" would access a different subclass, and so on. A parameter block can contain parameters of any types. The next shot shows a dialog for the solver parameters in my application. Two of the parameters are boolean (and have check boxes), two are Java enumerations (and have radio button groups), and three are numeric (and have text fields). String parameters are also fine (handled by text boxes).

Defining a parameter block is easy (in my opinion). It pretty much boils down to deciding how many parameters you are going to have, assigning a symbolic name to each (so that in other parts of the code you can refer to "DOMINANCE" and not need to remember if it parameter 1, parameter 2 or whatever), giving each one a label (for dialogs), a type (such as boolean.class or double.class), a default value and a tool tip. The ParameterBlock class contains a static method for generating a dialog like the one below, and one or two other useful methods.

Solver parameters

You can more details in the README file at the GitLab repository I set up for this. The repository contains a small example for demonstration purposes, but to use it you just need to copy ParameterBlock.java into your application. As usual, I'm releasing it under a Creative Commons license. Hopefully someone besides me will find it useful.

Tuesday, June 11, 2019

A Swing Platform for Computational Experiments

Most of my research involves coding algorithms and running computational experiments with them. It also involves lots of trial-and-error, both with the algorithms themselves and with assorted parameters that govern their functioning. Back in the Dark Ages, I did all this with programs that ran at a command prompt (or, in Linux terms, in a terminal) and wrote any output to the terminal. Eventually I got hip and started writing simple GUI applications for those experiments. A GUI application lets me load different problem without having to stop and restart the program, lets me change parameter settings visually (without having to screw around with lots of command line options ... is the switch for using antisymmetry constraints -A or -a?), and save output to text files when it's helpful.

Since I code (mostly) in Java, the natural way to do this is with a Swing application. (When I use R, I typically use an R notebook.) Since there are certain features I always want, I found myself copying code from old projects and then cutting out the project-specific stuff and adding new stuff, which is a bit inefficient. So I finally got around to creating a minimal template version of the application, which I'm calling "XFrame" (short for "Experimental JFrame" or "JFrame for Experiments" or something).

I just uploaded it to Bitbucket, where it is open-source under a very nonrestrictive Creative Commons license. Feel free to download it if you want to try it. There's an issue tracker where you can report any bugs or sorely missing features (but keep in mind I'm taking a fairly minimalist approach here).

Using it is pretty simple. The main package (xframe) contains two classes and an interface. You can just plop them into your application somewhere. One class (CustomOutputStream) you will probably not want to change. The actual GUI is the XFrame class. You will want to add menu items (the only one it comes with is File > Exit) and other logic. Feel free to change the class name as well if you wish. Finally, your program needs to implement the interface defined in XFrameController so that XFrame knows how to talk to the program.

The layout contains a window title at the top and a status message area at the bottom, both of which can be fetched and changed by code. The central area is a scrollable text area where the program can write output. It has buttons to save the content and to clear it, and the program will not exit with unsaved text unless you explicitly bless the operation.

There is a function that lets the program open nonmodal, scrollable dialog (which can be left open while the main window is in use, and whose content can be saved to a file). Another function allows the program to pop up modal dialogs (typically warnings or error messages). Yet another function provides a place where you can insert logic to tell the GUI when to enable/disable menu choices (and maybe other things). Finally, there is a built-in extension of SwingWorker that lets you launch a computational operation in a separate thread (where it will not slow down or freeze the GUI).

I included a small "Hello world!" application to show it works. I'll end with a couple of screen shots, one of the main window and the other of the nonmodal dialog (both from the demo). If it looks like something you might want to use, please head over to Bitbucket and grab the source.

Main window
Nonmodal dialog


Monday, May 13, 2019

Randomness: Friend or Foe?

I spent a chunk of the weekend debugging some code (which involved solving an optimization problem). There was an R script to setup input files and a Java program to process them. The Java program included both a random heuristic to get things going and an integer program solved by CPLEX.

Randomness in algorithms is both a blessing and a curse. Without it, genetic algorithms would reduce to identical clones of one individual engaging in a staring contest, simulation models would produce absurdly tight confidence intervals (thanks to the output have zero variance) and so on. With it, though, testing and debugging software gets tricky, since you cannot rely on the same inputs, parameter settings, hardware etc. to produce the same output, even when the program is function correctly. If I rerun a problem and get different results, or different performance (e.g., same answer but considerably faster or slower), is that an indication of a bug or just luck of the draw?

Somewhere, back in the Dark Ages, I was told that the answer to all this was to set the seed of the pseudorandom number stream explicitly. This was after the introduction of pseudorandom number generators, of course. Before that, random numbers were generated based on things like fluctuations in the voltage of the power supplied to the mainframe, or readings from a thermocouple stuck ... well, never mind. Today hardware random number generators are used mainly in cryptography or gambling, where you explicitly do not want reproducibility. With pseudorandom numbers, using the same seed will produce the same sequences of "random" values, which should hypothetically make reproducibility possible.

I think I originally came across that in the context of simulation, but it also applies in optimization, and not just in obviously random heuristics and metaheuristics. CPLEX has a built-in pseudorandom number generator which is used for some things related to branching decisions when solving integer programs (and possibly other places too). So explicitly setting a seed for both the random number generator used by my heuristic and CPLEX should make my code reproducible, right?

Wrong. There are two more wildcards here. One is that my Java code uses various types of collections (HashMaps, HashSets, ...) and also uses Streams. Both of those can behave in nondeterministic ways if you are not very careful. (Whether a Stream is deterministic may boil down to whether the source is. I'm not sure about that.) The other, and I'm pretty sure this bites me in the butt more often than any other aspect, is the use of parallel threads. My code runs multiple copies of the heuristic in parallel (using different seeds), and CPLEX uses multiple threads. The problem with parallel threads is that timing is nondeterministic, even if the sequence of operations in each thread is. The operating system will interrupt threads willy-nilly to use those cores for other tasks, such as updating the contents of the display, doing garbage collection in Java, pinging the mail server or asking Skynet whether it's time to terminate the user). If there is any communication between the processes running in parallel threads in your code, the sequencing of the messages will be inherently random. Also, if you give a process a time limit and base it on "wall clock" time (which I'm prone to doing), how far the process gets before terminating will depend on how often and for how long it was interrupted.

Limiting everything to a single thread tends not to be practical, at least for the problems I find myself tackling. So I'm (somewhat) reconciled to the notion that "reproducibility" in what I do has to be interpreted somewhat loosely.

Tuesday, April 16, 2019

CPLEX Callbacks: ThreadLocal v. Clone

A while back, I wrote a post about the new (at the time) "generic" callbacks in CPLEX, including a brief discussion of adventures with multiple threads. A key element was that, with generic callbacks, IBM was making the user more responsible for thread safety. In that previous post, I explored a few options for doing so, including use of Java's ThreadLocal class. ThreadLocal is my current first choice when writing a generic callback.

Recently, I saw a reply by IBM's Daniel Junglas on a CPLEX user forum that contained the following information.
For each thread CPLEX creates a clone of your callback object. This is done by invoking the callback's clone() method. The Java default implementation of clone() returns a shallow copy of the class.
That set me to wondering about the differences between ThreadLocal and cloning, so I did a bit of experimentation. I'll summarize what I think I learned below.

Why is any of this necessary?


It's not if you only allow CPLEX to use a single thread. When using callbacks with multiple threads, it is possible that two or more threads will access the callback simultaneously. Simultaneously reading/fetching the value of something is harmless, but trying to modify a value simultaneously will either trigger an exception or cause the JVM to crash. (That's not hypothetical, by the way. I have a test program that routinely crashes the JVM.) "Information callbacks" are harmless, but "control callbacks" (where you attempt to alter what CPLEX is doing, by adding cuts or lazy constraints or by taking control of branching), usually involve modifying something. In particular, with Benders decomposition your callback needs to solve a subproblem after first adjusting it based on the proposed master problem solution. Two threads trying to adjust the subproblem at the same time spells disaster.

Is the use of ThreadLocal an option for both legacy and generic callbacks?


Yes. The only tricky part is in initialization of the storage. Let's say that I'm doing Benders decomposition, and I have a subproblem that is an instance of IloCplex. I am going to need to create a separate version of the subproblem for each thread. So my callback class will contain a class field declared something like
private ThreadLocal<IloCplex> subproblem;
and it will need to fill in a value of the subproblem for each thread.

With a generic callback, the ThreadUp context provided by CPLEX can be used to do this. Assuming that context is the argument to the callback function you write, you can use code like the following to initialize the subproblem for each thread.
if (context == IloCplex.Callback.Context.Id.ThreadUp) {
  IloCplex s = ...; // code to generate a new subproblem
  subproblem.set(s);
}
Once the subproblem is initialized, to use it when a candidate solution is being proposed, you need to extract it from the ThreadLocal field. Here is an example of how that would look.
if (context == IloCplex.Callback.Context.Id.Candidate) {
  IloCplex s = subproblem.get(s);
  // Do Benders stuff with s.
}

Legacy callbacks do not have a mechanism like ThreadUp for detecting the creation of a new thread. You can still initialize the subproblem "lazily". In the legacy callback, before using the subproblem check to see if it exists. If not, create it. Here's some sample code.
IloCplex s = subproblem.get();  // get the subproblem
if (s == null) {
  // First use: need to generate a fresh subproblem.
  s = ...; // code to generate a new subproblem
  subproblem.set(s);
}
// Do Benders stuff with s.

Is cloning an option for both legacy and generic callbacks?


No. I don't think cloning can be used with generic callbacks. In Java, objects can be cloned. CPLEX declares legacy callbacks as Java objects, but it declares generic callbacks by means of an interface. Cloning an interface is not really a "thing" in Java.

When you use a generic callback, you create a class that implements the Callback interface. It's certainly possible to make that class cloneable, but according to my experiments CPLEX will not call the clone() method on that class, even if it exists. So the solver will be working with a single instance of that class, and if the class is not thread-safe, kaboom.

What does cloning entail?


There are quite a few web sites that explain cloning in Java, and if you intend to use it I recommend you do a search. I'll just cover the bare bones here.
  1. Declare your class with the modifier "implements Cloneable".
  2. Override the protected method Object.clone.
  3. Call the parent method (super.clone()) as the very first executable line of the clone() method.
  4. Handle the CloneNotSupportedException that the parent method might hypothetically throw, either by catching it and doing something, or by rethrowing it.
  5. After calling the parent method, fix anything that needs fixing.
I left that last step rather vague. With a Benders lazy constraint callback, you will need to replace the original subproblem with a freshly generated version (so that no two threads are chewing on the same subproblem instance). If life were fair (it's not), you would just do a deep clone of the original problem. The catch is that the IloCplex class is not cloneable. So the best (only?) alternative I can see is to build a new copy of the subproblem, the same way you built the original copy.

If you have other class fields that the callback might modify (such as a vector that stores the best feasible solution encountered), you may need to do a deep clone of them (or replace them with new versions). A detailed analysis of the differences between shallow and deep cloning is beyond the scope of this post (or my competence). As Daniel points out in his answer, super.clone() only makes a shallow copy. You'll need to take additional steps to make sure that fields containing arrays and other objects (other than primitives) are handled properly if the callback might modify them.

Here is some skeleton code.

public class MyCallback extends IloCplex.LazyConstraintCallback
                        implements Cloneable {
  private IloCplex subproblem;

  // ...

  /**
   * Clone this callback.
   * @return the clone
   * @throws CloneNotSupportedException if the parent clone method bombs
   */
  @Override
  public MyCallback clone() throws CloneNotSupportedException {
    // Call Object.clone first.
    MyCallback cb = (CloneCallback) super.clone();
    // Replace the subproblem with a fresh copy.
    subproblem = ...;  // generate a fresh copy of the subproblem
    // Make deep copies (or new values) for other fields as needed.
    return cb;
  }

}

Tuesday, July 17, 2018

Selecting Box Sizes

Someone posted an interesting question about box sizes on Mathematics Stack Exchange. He (well, his girlfriend to be precise) has a set of historical documents that need to be preserved in boxes (apparently using a separate box for each document). He wants to find a solution that minimizes the total surface area of the boxes used, so as to minimize waste. The documents are square (I'll take his word for that) with dimensions given in millimeters.

To start, we can make a few simplifying assumptions.
  • The height of a box is not given, so we'll assume it is zero, and only consider the top and bottom surfaces of the box. For a box (really, envelope) with side $s$, that makes the total area $2s^2$. If the boxes have uniform height $h$, the area changes to $2s^2 + 4hs$, but the model and algorithm I'll pose are unaffected.
  • We'll charitably assume that a document with side $s$ fits in a box with side $s$. In practice, of course, you'd like the box to be at least slightly bigger, so that the document goes in and out with reasonable effort. Again, I'll let the user tweak the size formula while asserting that the model and algorithm work well regardless.
The problem also has three obvious properties.
  • Only document sizes need be considered as box sizes, i.e. for every selected size at least one document should fit "snugly".
  • The number of boxes you need at each selected size equals the number of documents too large to fit in a box of the next smaller selected size but capable of fitting in a box of this size.
  • You have to select the largest possible box size (since that is required to store the largest of the documents).
What interests me about this problem is that it can be a useful example of Maslow's Hammer: if all you have is a hammer, every problem looks like a nail. As an operations researcher (and, more specifically, practitioner of discrete optimization) it is natural to hear the problem and think in terms of general integer variables (number of boxes of each size), binary variables (is each possible box size used or not), assignment variables (mapping document sizes to box sizes) and so on. OR consultant and fellow OR blogger Erwin Kalvelagen did a blog post on this problem, laying out several LP and IP formulations, including a network model. I do recommend your reading it and contrasting it to what follows.

The first thought that crossed my mind was the possibility of solving the problem by brute force. The author of the original question supplied a data file with document dimensions. There are 1166 documents, with 384 distinct sizes. So the brute force approach would be to look at all $\binom{383}{2} = 73,153$ or $\binom{383}{3} = 9,290,431$ combinations of box sizes (in addition to the largest size), calculate the number of boxes of each size and their combined areas, and then choose the combination with the lowest total. On a decent PC, I'm pretty sure cranking through even 9 million plus combinations will only need a tolerable amount of time.

A slightly more sophisticated approach is to view the problem through the lens of a layered network. There are either three or four layers, representing progressively larger selected box sizes, plus a "layer 0" containing a start node. In the three or four layers other than "layer 0", you put one node for each possible box size, with the following restrictions:
  • the last layer contains only a single node, representing the largest possible box, since you know you are going to have to choose that size;
  • the smallest node in each layer is omitted from the following layer (since layers go in increasing size order); and
  • the largest node in each layer is omitted from the preceding layer (for the same reason).
Other than the last layer (and the zero-th one), the layers here will contain 381 nodes each if you allow four box sizes and 382 if you allow three box sizes. An arc connects the start node to every node in the first layer, and an arc connects every node (except the node in the last layer) to every node in the next higher layer where the head node represents a larger size box than the tail node. The cost of each arc is the surface area for a box whose size is given by the head node, multiplied by the number of documents too large to fit in a box given by the tail node but small enough to fit in a box given by the head node.

I wanted to confirm that the problem is solvable without special purpose software, so I coded it in Java 8. Although there are plenty of high quality open-source graph packages for Java, I wrote my own node, arc and network classes and my own coding of Dijkstra's shortest path algorithm just to prove a point about not needing external software. You are welcome to grab the source code (including the file of document sizes) from my Git repository if you like.

I ran both the three and four size cases and confirmed that my solutions had the same total surface areas that Erwin got, other than a factor of two (I count both top and bottom; he apparently counts just one of them). How long does it take to solve the problem using Dijkstra's algorithm? Including the time reading the data, the four box version takes about half a second on my decent but not workstation caliber PC. The three box version takes about 0.3 seconds, but of course gives a worse solution (since it is more tightly constrained). This is single-threaded, by the way. Both problem set up and Dijkstra's method are amenable to parallel threading, but that would be overkill given the run times.

So is it wrong to take a fancier modeling approach, along the lines of what Erwin did? Not at all. There are just trade-offs. The modeling approach produces more maintainable code (in the form of mathematical models, using some modeling language like GAMS or AMPL) that are also more easily modified if the use case changes. The brute force and basic network approaches I tried requires no extra software (so no need to pay for it, no need to learn it, ...) and works pretty well for a "one-off" situation where maintainability is not critical.

Mainly, though, I just wanted to make a point that we should not overlook simple (or even brute force) solutions to problems when the problem dimensions are small enough to make them practical ... especially with computers getting more and more powerful each year.

Tuesday, July 3, 2018

Usefulness of Computer Science: An Example

I thought I would follow up on my June 29 post, "Does Computer Science Help with OR?", by giving a quick example of how exposure to fundamentals of computer science recently helped me.

A current research project involves optimization models containing large numbers of what are basically set covering constraints, constraints of the form $$\sum_{i\in S} x_i \ge 1,$$ where the $x_i$ are binary variables and $S$ is some subset of the set of all possible indices. The constraints are generated on the fly (exactly how is irrelevant here).

In some cases, the same constraint may be generated more than once, since portions of the code run in parallel threads. Duplicates need to be weeded out before the constraints are added to the main integer programming model. Also, redundant constraints may be generated. By that, I mean we may have two cover constraints, summing over sets $S_1$ and $S_2$, where $S_1 \subset S_2$. When that happens, the first constraint implies the second one, so the second (weaker) constraint is redundant and should be dropped.

So there comes a "moment of reckoning" where all the constraints generated by all those parallel threads get tossed together, and duplicate or redundant ones need to be weeded out. That turns out to be a rather tedious, time-consuming operation, which brings me to how the constraints are represented. I'm coding in Java, which has various implementations of a Set interface to represent sets. The coding path of least resistance would be to toss the indices for each constraint into some class implementing that interface (I generally gravitate to HashSet). The Set interface defines an equals() method to test for equality and a containsAll() method to test whether another set is a subset of a given set. So this would be pretty straightforward to code.

The catch lies in performance. I have not found it documented anywhere, but I suspect that adding elements to a HashSet is $O(n)$ while checking subset status or equality is $O(n^2)$, where $n$ is the number of possible objects (indices). The reason I say $O(n^2)$ for the latter two operations is that, in the worst case, I suspect that Java takes each object from one subset and compares it to every object in the other set until it finds a match or runs out of things to which to compare. That means potentially $O(n)$ comparisons for each of $O(n)$ elements of the first set, getting us to $O(n^2)$.

A while back, I took the excellent (and free) online course "Algorithms, Part 1", offered by a couple of faculty from my alma mater Princeton University. I believe it was Robert Sedgewick who said at one point (and I'm paraphrasing here) that sorting is cheap, so if you have any inkling it might help, do it. The binary variables in my model represent selection or non-selection of a particular type of object, and I assigned a complete ordering to them in my code. By "complete ordering" I mean that, given two objects $i$ and $j$, I can tell (in constant time) which one is "preferable". Again, the details do not matter, nor does the plausibility (or implausibility) of the order I made up. It just matters that things are ordered.

So rather than just dump subscripts into HashSets, I created a custom class that stores them in a TreeSet, a type of Java set that maintains sort order using the ordering I created. The custom class also provides some useful functions. One of those functions is isSubsetOf(), which does pretty much what it sounds like: A.isSubsetOf(B) returns true if set $A$ is a subset of set $B$ and false if not.

In the isSubsetOf() method, I start with what are called iterators for the two sets $A$ and $B.$ Each starts out pointing to the smallest member of its set, "smallest" defined according to the ordering I specified. If the smallest member of $B$ is bigger than the smallest member of $A$, then the first element of $A$ cannot belong to $B$, and we have our answer: $A\not\subseteq B$. If the smallest element of $B$ is smaller than the smallest element of $A$, I iterate through element of $B$ until either I find a match to the smallest element of $A$ or run out of elements of $B$ (in which case, again, $A\not\subseteq B$). Suppose I do find a match. I bump the iterator for $A$ to find the second smallest element of $A$, then iterate through subsequent members of $B$ (picking up where I left off in $B$, which is important) until, again, I get a match or die trying. I keep doing this until I get an answer or run out of elements of $A$. At that point, I know that $A\subseteq B$.

What's the payoff for all this extra work? Since I look at each element of $A$ and each element of $B$ at most once, my isSubstOf() method requires $O(n)$ time, not $O(n^2)$ time. Using a TreeSet means the contents of each set have to be sorted at the time of creation, which is $O(n\log n)$, still better than $O(n^2)$. I actually did code it both ways (HashSet versus my custom class) and timed them on one or two moderately large instances. My way is in fact faster. Without having a bit of exposure to computer science (including the Princeton MOOC), though, it would never have occurred to me that I could speed up what was proving to be a bottleneck in my code.