Set (abstract Data Type) - Multiset

A variation of the set is the multiset or bag, which is the same as a set data structure, but allows repeated ("equal") values (duplicates). The set of all bags over type T is given by the expression bag T. It is possible for objects in computer science to be considered "equal" under some equivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.

  • C++'s Standard Template Library provides the multiset class for the sorted multiset, which implements this multiset using a self-balancing binary search tree. SGI's STL provides the hash_multiset class, which implements this hash multiset using a hash table.
  • For Java, third-party libraries provide multiset functionality:
    • Apache Commons Collections provides the Bag and SortedBag interfaces, with implementing classes like HashBag and TreeBag.
    • Google Collections provides the Multiset interface, with implementing classes like HashMultiset and TreeMultiset.
  • Apple provides the NSCountedSet class as part of Cocoa, and the CFBag and CFMutableBag types as part of CoreFoundation.
  • Python's standard library includes collections.Counter, which is similar to a multiset.
  • Smalltalk includes the Bag class, which can be instantiated to use either identity or equality as predicate for inclusion test.

Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use an associative array mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).

Typical operations on bags:

  • contains(B, x): checks whether the element x is present (at least once) in the bag B
  • is_sub_bag(B1, B2): checks whether each element in the bag B1 occurs in B1 no more often than it occurs in the bag B2; sometimes denoted as B1B2.
  • count(B, x): returns the number of times that the element x occurs in the bag B; sometimes denoted as B # x.
  • scaled_by(B, n): given a natural number n, returns a bag which contains the same elements as the bag B, except that every element that occurs m times in B occurs n * m times in the resulting bag; sometimes denoted as n ⊗ B.
  • union(B1, B2): returns a bag that containing just those values that occur in either the bag B1 or the bag B2, except that the number of times a value x occurs in the resulting bag is equal to (B1 # x) + (B2 # x); sometimes denoted as B1B2.

Read more about this topic:  Set (abstract Data Type)