Similar Questions
Similar questions have been asked before, but always have specific qualities to the data which allow a more targeted "split it up and just sort by this part" approach, which does not work when you don't know the structure of the data in the column - or even the column, frankly. In other words, not a generic, "Natural" sort order - something roughly equivalent to SELECT * FROM [parts] ORDER BY [part_category] DESC, [part_number] NATURAL DESC
My Situation
I have a DataView in C# that has a Sort
parameter for specifying the ORDER BY
that would be used by ADO, and a requirement to sort by a column using a 'natural' sort algorithm. I could in theory do just about anything from creating a different column to sort by (based on the column I'd like to have 'sorted naturally') to not sorting in SQL, but rather sorting the result set in code afterwards. I'm looking for the best balance of flexibility, efficiency, preparation effort, and maintainability. I would benefit somewhat from being able to sort such data after retrieval (in C#) or completely within a stored procedure.
In my mind, and according to customer statements so far, 'Natural' sort order will mean treating upper and lower case letters equivalently, and considering the magnitude of a number, rather than the ASCII value of its digits (that is x90
comes before x100
). Jeff Atwood had a pretty decent discussion of this, but it didn't address SQL sorting. That said, these are my thoughts:
- Incorporating the magnitude awareness while also retaining the ability to sort alpha characters ASCII-betically may also come in handy
- Non-alphanumeric characters would probably have to be sorted ASCII-betically regardless
- Decimal point awareness might be more effort than it's worth, since most of the time periods and commas in alphanumeric fields are treated as merely punctuation/separators, and only denote fractional portions when they're representing a float field
My Question
What is a reasonably flexible, reasonably generic, reasonably efficient, approach to implementing a natural sort algorithm for SQL? Weighing the pros and cons, which is the best approach? Is there another option?
- Is there a native SQL way to
ORDER BY [field] NATURAL DESC
or something? - PURE SQL function to create a 'sort equivalent' - Could be used to create some sort of second, possibly indexed, 'sort value' column, or called from a stored procedure, or specified in an 'ORDER BY' clause - but how to write it efficiently? (loops? is there a set based solution at all??)
- CLR SQL Function - usability benefits of pure SQL function, but using procedural language, like C# (algorithm should be no problem, but can it be made to go faster than a pure SQL sort [set based??] implementation?) Also, could be referenced and utilized in C# if efficient enough.
- Avoid SQL Server - since parsing an arbitrary number of numbers amid all sorts of other characters is really best suited for looping or recursion, and T-SQL is not well suited for looping or recursion (though TECHNICALLY supported, All I see is 'DON'T USE LOOPS!!!' and 'CTE's are even worse!!!')
- Some sort of comparator in SQL(??) - doesn't seem like SQL lends itself to that sort of sorting and I don't see a way to specify a comparator to use - so I guess this won't work...
I have values at least as varied as the following:
100s455t
200s400
d399487
S0000005.2
d400400
d99222
cg9876
D550-9-1
CL2009-3-27
f2g099
f2g100
f2g1000
f2g999
cg 8837
99s1000f
These should be sorted as follows:
99s1000f
100s455t
200s400
cg9876
cg 8837
CL2009-3-27
D550-9-1
d99222
d399487
d400400
f2g099
f2g100
f2g999
f2g1000
S0000005.2
All I see is 'DON'T USE LOOPS!!!' and 'CTE's are even worse!!!'
CTEs are not worse if used correctly, but don't use loops. – Hogan