Sunday, 12 April 2015

Automatic Sorting for Your Collections

The .NET Framework has two collections that will guarantee your items are always sorted whenever you process the collection. Here's how to choose between the two and how to control the sort order (including supporting duplicate entries).

Even if you want the items in your collection to be in a sorted order, you probably don't need either the SortedList or SortedDictionary collections. If you only need to sort your collection once before processing it, or if you need to sort your collection multiple times but into a different order each time, then you shouldn't use SortedList or SortedDictionary (stick with the List collection). But if you have a collection that you always want to have sorted in the same order no matter how many times you add or remove items, then the SortedList or the SortedDictionary collections are your bestest friends.

Using the Sorted Collections
To use a SortedList or SortedDictionary, you first have to declare it, specifying the data type of the sort key and the data type of the item you're keeping in the collection (the syntax for a SortedList and a SortedDictionary are virtually identical, so I'll arbitrarily switch between SortedList and SortedDictionary in these examples). This example sets up a list of Employees, sorted by date:


Dim Employees As New SortedDictionary(Of DateTime, Employee)

Adding an item to the collection looks very much like adding an item to a Dictionary: You call the collection's Add method, passing the data you want to sort by and the item you want to hold in the collection. This example adds an Employee to my Employees collection, sorting by the date the employee joined the company:

Employees.Add(emp.JoinDate, emp)

To retrieve an item, you pass in a key, which, in this case, is one of the dates that an employee joined the company:

emp = Employees(JoinDate)

You can also use LINQ queries to retrieve items from the collection. This allows you to find items in the collection by something other than the key. In a LINQ query you reference the sort key through the collection's Key property and the item through the collection's Value property. This example finds the JoinDate for the employee whose LastName is "Vogel":

JoinDate = (From e In Employees
Where e.Value.LastName = "Vogel"
Select e.Key).FirstOrDefault

To remove an item from the collection, you pass a key to the Remove method:

Employees.Remove(JoinDate)

You can also do a bulk load of your sorted collection by passing it another collection that isn't (necessarily) sorted when you create your sorted collection. This example loads my Employees collection by passing it a Dictionary of employees that may (or may not) be sorted:

Dim emps As Dictionary(of DateTime, Employee)
...loading emps...
w SortedList(Of DateTime, Employee)(emps)
Dim Employees As N e

Choosing Between SortedList and SortedDictionary
Because the interfaces for the two collections are identical, choosing between them has more to do with their performance than their functionality.

First, retrieval time doesn't really matter because it's essentially identical for both collections. The memory usage of the two collections is also almost identical (if you're extremely worried about the amount of memory you're using then you should use SortedList).
The two collections do, however, differ in speed when adding/removing items … but, even then, only if you have thousands or tens of thousands of items in your collection. In those scenarios, then, SortedDictionary is faster for adds/removes. However, that faster speed costs you something when you do a bulk load when you create your sorted collection. If that initial collection is already sorted, then SortedList loads faster; if your initial collection is unsorted, then SortedDictionary will load faster (or so the documentation tells me).
Because you're probably not doing a bulk load, SortedDictionary is your best choice … though I doubt you'll be able to measure the difference in any meaningful way unless you have lots and lots of objects in your collection.

Tweaking the Sort Order
Out of the box, SortedList and SortedDictionary will do what you want if all three of these conditions are true:

  • Your key's data type implements the IComparable interface (as do String, Integer, DateTime and many other .NET data types)
  • You don't have any duplicate entries
  • You're happy with your data sorted in ascending order
However, if any of those conditions aren't true (that is, if your key is some custom class that doesn't implement the IComparable interface, you have duplicate entries or you want a custom sort), then you'll want to tweak your collection's sorting strategy (for more on implementing IComparable -- which you really should do in any class you create -- see my column on sort strategies).
The first step in tweaking your collection's sorting strategy is to create a class that implements the IComparer interface. To enable this class to work with any data type used as a collection's key, you should declare your class as a generic (this does assume that whatever data type you're using as your key implements the IComparable interface and, as a result, has a CompareTo method). Here's an example that implements a descending sort by inverting the results of the returned by the CompareTo method:

Public Class DescendingKeysComparer(Of T As IComparable)
  Implements IComparer(Of T)

  Public Function Compare(
    first As T, second As T) As Integer Implements IComparer(Of T).Compare
    Return first.CompareTo(second) * -1
  End Function
End Class

If your key's data type doesn't implement IComparable, then you'll have to write your own comparison code for this method. You can create any test you want, but you need to return 0 if you don't care about the order of the two items that are passed in (that is, as far as you're concerned, the two items are equal); return a positive number if you feel the first item is greater than the second item; return a negative number if you feel the first item is less than the second item. Listing 1 has an IComparer class that sorts Customer objects by their Id property.

Listing 1: A Class for Sorting Customer Classes 

Public Class CustomerComparer
  Implements IComparer(Of Customer)

  Public Function Compare(
    first As Customer, second As Customer) As Integer Implements IComparer(Of Customer).Compare
    If first.Id = second.Id Then
      return 0
    End If
    If first.Id > second.Id Then
      return 1
    End If
    If first.Id < second.Id Then
      return -1
    End If
  End Function
End Class

The SortedList and SortedDictionary don't like duplicate entries and raise an exception if the comparison method returns a zero. Therefore, if you want to support duplicate keys, you need a comparer that converts a zero result into a negative or positive result. The example inListing 2 converts zeroes to a negative number:

Listing 2: Comparer That Converts Zero into a Negative Number 

Public Class MatchingKeysComparer(Of T As IComparable)
  Implements IComparer(Of T)

  Public Function Compare1(
    first As T, second As T) As Integer Implements IComparer(Of T).Compare
  Dim res As Integer
  
    res = first.CompareTo(Second)
    If res = 0 Then
      Return -1
    Else
      Return res
    End If
  End Function
End Class

To have your SortedList/SortedDictionary use your comparer, just pass it in to your collection when you create the collection, as this example does:

Dim DelayedRecordsForTagId As New SortedDictionary(
  Of Date, SelectorData)(New MatchingKeysComparer(Of Date))

And there you go: Your data is always sorted and, in addition, it's now sorted the way you want.

No comments:

Post a Comment