Generics and Sets

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    Generics and Sets

    Greetings,

    Introduction

    This week I'll write a bit about generics (those funny angular brackets). I need
    an example and decided to use sets and some of their operations. This weeks'
    article discusses one set class and two interesting operations on the set:
    combinations and set partitioning. First a description of the algorithms is
    given and then we'll have some fun with the generic implementation of them.

    Combinations

    For a given set S = [x, y, z] there are eight sets representing all possible
    combinations of those three distinct elements:
    • []
    • [x]
    • [y]
    • [z]
    • [x, y]
    • [x, z]
    • [y, z]
    • [x, y, z]


    Given that set S, how can we find all these combinations? Here's how: suppose
    we can find all combinations of set S'= [y, z] which are: [], [y], [z] and [y, z]
    Of course these subsets are combinations of the original set S as well, but
    there's no 'x' to be found in them (of course not, because I left it out).

    Simply add that element 'x' to all the subsets that make up the combinations
    of set [y, z]; This is the result:

    [x], [x, y], [x, z] and [x, y, z]

    Together with the other four subsets we have constructed all eight combinations
    of set S= [x, y, z]

    But how did we find all combinations of set S'= [y, z]? We apply a similar
    recursive reasoning here: suppose we can find all combinations of set S''= [z];
    they are [] and [c]; they are part of all combinations of set S'= [y, z] and
    we add element 'y' to it: [y], [y, z].

    This may seem futile but here's how we find all combinations of set S''= [z]:
    we find all combinations of the empty set, which is the empty set itself: [],
    and we add element [z] to all the combinations (just one), which makes [z].

    For a set with n elements there are 2^n combinations. This can easily be seen:
    the empty set (zero elements) has one combination: the empty set itself. If
    we add an element to the set we end up with twice the number of combinations:
    the original combinations plus the original combinations with that new element
    added to every combination. A set of one element has two combinations, a set of
    two elements has four combinations etc. etc.

    The recursive algorithm described above generates them all.

    Partitions

    A partition of a set S are one or more disjunct non-empty subsets and the union
    of those sets form the set S again. For the set S= [x, y, z] the partitions are:
    • [x, y, z]
    • [x], [y, z]
    • [x, y], [z]
    • [x, z], [y]
    • [x], [y], [z]


    How do we find all partitions of a set? We apply a similar recursive reasoning:
    suppose we can find all partitions of set S'= [y, z] which is:
    • [y, z]
    • [y], [z]


    First we add the set [x] to every partition of set S':
    • [x], [y, z]
    • [x], [y], [z]


    Next we add element 'x' to every set in the partitions of S', one by one:
    • [x, y, z]
    • [x, y], [z]
    • [y], [x, z]


    We end up with the same five partitions again. Try to find all partitions of
    set S'= [y, z] given the partitions of set S''= [z]. The partitions of a single
    element set is the single element set itself, so all partitions of set S'' is
    the set S''= [z] itself.

    By definition all partitions of an empty set is the empty set itself.

    Sets

    As you might have noticed by now this article is about sets. The Java programming
    language has Set implementations available. I'm going to use the HashSet
    implementation for the example in this article. I am going to change the behaviour
    of the implementation a bit though: I want 'COW' sets. 'COW' is an acronym for
    'Copy On Write'. COW implies that the set copies itself when it needs to be
    altered; the copy is going to be altered instead. Let's get our hands dirty.
    Here are the constructors of our (generic) set:

    [code=java]
    public class S<T> extends HashSet<T> {

    private static final long serialVersionUI D = -887517932819850 7416L;

    public S(Collection<T> collection) { super(collectio n); }

    public S(T arg) { this.add(arg); }

    public S(T ... args) { super(Arrays.as List(args)); }

    ...
    [/code]

    I added the serialVerionUID to keep Eclipse's mouth shut. This class has three
    constructors: a form of a copy constructor taking a Collection<T> as its
    argument; a constructor that takes a single element T as its argument and
    a constructor that takes a (possibly empty) list of T arguements.

    The last constructor allows us to write new S<T>() i.e. no elements at all
    which results in the empty set.

    I included the second constructor again to keep Eclipse's mouth shut. I need
    sets of sets as in:

    [code=java]
    S<S<T>> set= new S<S<T>>(new S<T>());
    [/code]

    Suppose I didn't include the second contructor then the Java compiler has to
    deduce that given the last constructor a generic array of type S<T> needs to be
    constructed to put that single argument in. A type S<T> is not a type T so
    Eclipse warns about its own creativity. If I hadn't included that second
    constructor everything would still be fine but I just don't like warnings.

    The last constructor simply turns the varargs array to a List (which is a
    collection too) so a superclass constructor can handle it all.

    The Set interface defines 'add()' and 'addAll()' methods that return true or false
    indicating whether or not the element(s) could be added. The same applies to the
    'remove' and 'removeAll' methods. I don't want that behaviour, I want methods
    that do the same but return the new set instead. Here goes:

    [code=java]
    private S<T> unionHere(T arg) {

    add(arg);

    return this;
    }

    public S<T> union(T arg) {

    return new S<T>(this).unio nHere(arg);
    }

    public S<T> union(Collectio n<T> collection) {

    S<T> copy= new S<T>(this);

    copy.addAll(col lection);

    return copy;
    }

    public S<T> difference(T arg) {

    S<T> copy= new S<T>(this);

    copy.remove(arg );

    return copy;
    }

    public S<T> difference(Coll ection<T> collection) {

    S<T> copy= new S<T>(this);

    copy.removeAll( collection);

    return copy;
    }
    [/code]

    Note that I kept the first method private: I don't want the user to fiddle with
    the set itself; if they want to do that they can fiddle with the superclass
    methods that change the set itself.

    The public methods above simply copy the set and apply the wanted operation on
    the copy instead and return the copied set instead of returning a boolean value.

    Note that all methods are generic: they don't care about what type T actually is.

    So far so good, but I haven't done much yet. Here are the other set operations:
    intersection, difference and symmetric difference. First a few examples of the
    operations; A= [x, y, z], B=[y, z, t]
    • intersection: [y, z]
    • difference: [x]
    • symmetric difference: [x, t]


    The intersection of A and B are the elements that occur in both A and B. The
    difference of A and B are those elements that occur in A but not in B. The
    symmetric difference of A and B are those elements in A or B bot not in both.

    The symmetric difference plus the intersection of two sets is the union of those
    two sets. The symmetric difference of sets A and B is the union of the difference
    of A and B and the difference of B and A.

    Here's the source code:

    [code=java]
    public S<T> difference(Coll ection<T> collection) {

    S<T> copy= new S<T>(this);

    copy.removeAll( collection);

    return copy;
    }

    public S<T> symDifference(C ollection<T> collection) {

    return difference(coll ection).union(n ew S<T>(collection ).difference(th is));
    }

    public S<T> intersection(Co llection<T> collection) {

    S<T> copy= new S<T>(this);

    copy.retainAll( collection);

    return copy;
    }
    [/code]

    There's not much more to tell about those methods than that they first make a
    copy of the set and apply the operation on the copy of the set.

    Combinations

    The implementation of the 'combinations' method closely follows the description
    in the paragraph above. The method returns a set of sets. The sets being the
    individual combinations of the original set.

    First the method constructs a set that contains the empty set. If the original
    set isn't empty it walks over each element, removes it and generates all
    combinations of the reduced set. It copies the combinations to a 'result' set
    and copies them again after it had added that current element to the sets again.

    Here's the code:

    [code=java]
    public S<S<T>> combinations() {

    S<S<T>> result= new S<S<T>>(new S<T>());

    for (T current : this) {
    S<S<T>> comb= difference(curr ent).combinatio ns();
    result.addAll(c omb);
    for (S<T> set : comb)
    result.unionHer e(set.union(cur rent));
    }

    return result;
    }
    [/code]

    It comes in quite handy that my methods return the modified set itself instead
    of a true/false value. Not the generic type S<S<T>>, i.e. a set or sets of type T.

    The method itself is defined in a generic definition of class S. The generic
    type is T, so a (partial) specialization of S<S<T>> is valid too. In other
    words: if, say, a definition of the union method for a type T is valid, then
    automatically it is valid for a generic type S<T>. The above code makes use of
    Java's generic behaviour.

    I'm going to (ab)use that feature even more for the next algorithm implementation:

    Partitions

    This method uses the same implementation strategy: it recursively invokes itself
    with a smaller set and uses the return value to collect the partitions of the
    current set in an set of partitions. Here it is:

    [code=java]
    public S<S<S<T>>> partitions() {

    S<S<S<T>>> result= new S<S<S<T>>>();

    if (size() < 2)
    return result.unionHer e(new S<S<T>>(this)) ;

    T current= iterator().next ();
    S<S<S<T>>> part= difference(curr ent).partitions ();

    for (S<S<T>> set : part) {
    result.unionHer e(set.union(new S<T>(current))) ;

    for (S<T> elem : set) {
    S<S<T>> copy= set.difference( elem);
    result.unionHer e(copy.unionHer e(elem.union(cu rrent)));
    }
    }

    return result;
    }
    [/code]

    A partition is a set of sets, so all partitions together is a set of sets of
    sets; the last sets contain elements of type T.

    For sets containing at most one element a set containing a set that contains
    that (almost) empty set is returned. Otherwise the reasoning as explained in
    the paragraph above is applied and the filled 'result' set is returned.

    As you can see, in a generic definition of a method for type <T> a type S<T>,
    S<S<T>> and even S<S<S<T>>> are defined automatically because an S<T>, S<S<T>>
    and an S<S<S<T>>> are pure generic types themselves as well.

    This makes it possible to find intersections, combinations or whatever not just
    for sets of, say, Strings, i.e. S<String> but also for sets of sets of Strings,
    i.e. S<S<String>> or whatever elements in a set you want.

    Examples

    After all this programming I'd like to see some results; I tested the above
    methods like this:

    [code=java]
    public static void main(String[] args) {

    S<String> s= new S<String>("x", "y", "z");
    S<String> t= new S<String>("y", "z", "t");

    System.out.prin tln("union : "+s.union(t ));
    System.out.prin tln("intersecti on : "+s.intersectio n(t));
    System.out.prin tln("difference : "+s.difference( t));
    System.out.prin tln("symDiffere nce: "+s.symDifferen ce(t));
    System.out.prin tln("combinatio ns : "+s.combination s());
    System.out.prin tln("partitions : "+s.partitions( ));
    }
    [/code]

    And this was the (correct) output:

    Code:
    union        : [t, z, y, x]
    intersection : [z, y]
    difference   : [x]
    symDifference: [t, x]
    combinations : [[], [z, x], [z, y], [z], [y], [z, y, x], [x], [y, x]]
    partitions   : [[[z, x], [y]], [[z, y], [x]], [[z], [y], [x]], [[z, y, x]], [[z], [y, x]]]
    Concluding remarks

    The chapter described a bit of set juggling and fiddling with the generics
    feature of Java. I'm sure you'll have to read this little article a couple of
    times to fully appreciate the generic feature and the recursive implementation
    of both operations.

    I hope to see you next week and,

    kind regards,

    Jos
  • Shubhendu Sinha
    New Member
    • Jun 2011
    • 1

    #2
    partitions() algorithm

    Hi Jos

    Thanks for your post. I wish to implement partitions() in c++ to find all partitions of a set. Can you suggest me the best way. If you could also send me the source of partitions algorithm, i could try to write that in c++.

    Comment

    Working...