Purely Functional Data Structures & Algorithms : Union-Find

Sun, Aug 19, 2012

It’s been a while since I last posted in this series. Today we look at the disjoint-set data structure, specifically disjoint-set forests and the complementary algorithm : union-find.
In computing, a disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
Union: Join two subsets into a single subset.

My inspiration comes from Sedgewick and Wayne’s class over at Coursera : Algorithms, Part I. So check the class out if you are unfamiliar with this and interested in the details.
I’m always curious how data structures and algorithms translate from their imperative counterparts(usually in Java) which are the norm for most classes on the subject and in most textbooks.

I think that this is a very unexplored part of the field of study in comparison with the usual approach to algorithms and data structures. So here we go with another example.

As before, we are using Shen as our implementation language.

First we define our disjoint-set type.

\**\
\* Disjoint set data type (weighted and using path compression) demonstrating  *\
\* 5(m + n) worst-case find time *\
\**\
(datatype disjoint-set
 Count : number ; Ids : (vector number) ; Sizes : (vector number);
 =================================================================
 [Count Ids Sizes] : disjoint-set;)

Then we add a few utilities for creating new instances, retrieving the disjoint subsets count and finding the root of an object.

\* Create a new disjoint-set type *\
(define new
 { number --> disjoint-set }
 N -> [N (range 1 N) (vector-init 1 N)])

\* Return the number of disjoint sets *\
(define count
 { disjoint-set --> number }
 [Count Ids Sizes] -> Count)

\* Return id of root object *\
(define find-root
 { disjoint-set --> number --> number }
 [Count Ids Sizes] P -> (let Parent
                         \* Path Compression *\
                         (<-vector Ids (<-vector Ids P))
                         (if (= P Parent)
                             P
                             (find-root [Count Ids Sizes] Parent)))

Next we define functions to check if two objects are connected along with the quick-union function that will actually connect two objects.

\* Are objects P and Q in the set ? *\
(define connected
 { disjoint-set --> number --> number --> boolean }
 UF P Q -> (= (find-root UF P) (find-root UF Q)))

\* Replace sets containing P and Q with their union *\
(define quick-union
 { disjoint-set --> number --> number --> disjoint-set }
 [Count Ids Sizes] P Q
 -> (let UF [Count Ids Sizes]
         I (find-root UF P)
         J (find-root UF Q)
         SizeI (<-vector Sizes I)
         SizeJ (<-vector Sizes J)
         SizeSum (+ SizeI SizeJ)
         CIds (vector-copy Ids)
         CSizes (vector-copy Sizes)
      (if (= I J)
          [Count CIds CSizes]
          \* Always make smaller root point to larger one *\
          (do (if (< SizeI SizeJ)
                  (do (vector-> CIds I J) (vector-> CSizes J SizeSum))
                  (do (vector-> CIds J I) (vector-> CSizes I SizeSum)))
              [(- Count 1) CIds CSizes]))))

After running our test we get the following output.

(50+) (test 10)
creating union find with 10 objects ...
DONE
[10 <1 2 3 4 5 6 7 8 9 10> <1 1 1 1 1 1 1 1 1 1>]
All objects are disconnected :
1 and 9 connected ? false
4 and 6 connected ? false
3 and 1 connected ? false
7 and 8 connected ? false
... creating unions ...
DONE
[1 <4 8 7 7 8 8 8 8 8 8> <1 1 1 2 1 1 4 10 1 1>]
All objects should be connected as there is only 1 group :
1 and 9 connected ? true
4 and 6 connected ? true
3 and 1 connected ? true
7 and 8 connected ? true
run time: 0.0 secs
1 : number

All the code can be found here.