How to sort a list of files correctly
Several times over the years I've found my self swearing over having to deal with incorrectly sorted file lists.
Say you have several files:
f1.txt, f2.txt, f3.txt, f4.txt, f5.txt, f6.txt, f7.txt, f8.txt, f9.txt, f10.txt.
In most cases normal alphabetic sorting is sufficient. Then you will get your files sorted as:
f1.txt, f10.txt, f2.txt, f3.txt, f4.txt, f5.txt, f6.txt, f7.txt, f8.txt, f9.txt.
But in other cases, like when dealing with chapters in a book you really want the order to be correct. Since it's quite annoying to have chapter 10 after chapter 1.
Theory
The idea behind the algorithm is that digit groups should be treated as integers and not as separate characters. In that way any number will be compared to other numbers in the correct (or expected) way.
Examples: "File123.txt" => ["File",int(123),".txt"] "File7.txt" => ["File",int(7),".txt"] "Folder5/File7of9.txt" => ["Folder",int(5),"/File",int(7),"of",int(9),".txt"]
The Algorithm
The first algorithm I wrote (and published) to solve this matter worked like a charm. But as always there are others who has made it before you and made it a little bit better ;-)
I rewrote my example base on the ideas and code found on these locations:
- http://nedbatchelder.com/blog/200712/human_sorting.html
- http://blog.pobblelabs.org/2007/12/11/exception-handling-slow/
When I played around with the code I found on the locations mentioned above I found out that (on my computer at least) I got a little better performance using a lambda defined function instead of a normal function.
I have absolutely no idea if that might be true on other platforms.
import re RE_DIGIT = re.compile(r'(\d+)') ALPHANUM_KEY = lambda s: [int(g) if g.isdigit() else g for g in RE_DIGIT.split(s)]
Here is an example of how to use the algorithm.
import random
values = [1,2,10,20]
file_list = []
for v in values:
for vv in values:
file_list.append("folder%d/file%d.txt" % (v,vv))
print "*** NORMAL SORT ***"
random.shuffle(file_list)
file_list.sort()
for f in file_list:
print f
print "*** ALPHANUM_KEY SORT ***"
random.shuffle(file_list)
file_list.sort(key=ALPHANUM_KEY)
for f in file_list:
print f
Output from the example:
*** NORMAL SORT *** folder1/file1.txt folder1/file10.txt folder1/file2.txt folder1/file20.txt folder10/file1.txt folder10/file10.txt folder10/file2.txt folder10/file20.txt folder2/file1.txt folder2/file10.txt folder2/file2.txt folder2/file20.txt folder20/file1.txt folder20/file10.txt folder20/file2.txt folder20/file20.txt *** ALPHANUM_KEY SORT *** folder1/file1.txt folder1/file2.txt folder1/file10.txt folder1/file20.txt folder2/file1.txt folder2/file2.txt folder2/file10.txt folder2/file20.txt folder10/file1.txt folder10/file2.txt folder10/file10.txt folder10/file20.txt folder20/file1.txt folder20/file2.txt folder20/file10.txt folder20/file20.txt
Comment this note:
No messages yet.