How to sort a list of files correctly

Several times over the years I've found my self swearing over having to deal with incorrectly sorted file lists.

Say you have several files:

f1.txt, f2.txt, f3.txt, f4.txt, f5.txt, f6.txt, f7.txt, f8.txt, f9.txt, f10.txt.

In most cases normal alphabetic sorting is sufficient. Then you will get your files sorted as:

f1.txt, f10.txt, f2.txt, f3.txt, f4.txt, f5.txt, f6.txt, f7.txt, f8.txt, f9.txt.

But in other cases, like when dealing with chapters in a book you really want the order to be correct. Since it's quite annoying to have chapter 10 after chapter 1.

Theory

The idea behind the algorithm is that digit groups should be treated as integers and not as separate characters. In that way any number will be compared to other numbers in the correct (or expected) way.

Examples:
"File123.txt" => ["File",int(123),".txt"]
"File7.txt" => ["File",int(7),".txt"]
"Folder5/File7of9.txt" => ["Folder",int(5),"/File",int(7),"of",int(9),".txt"]

The Algorithm

The first algorithm I wrote (and published) to solve this matter worked like a charm. But as always there are others who has made it before you and made it a little bit better ;-)

I rewrote my example base on the ideas and code found on these locations:

When I played around with the code I found on the locations mentioned above I found out that (on my computer at least) I got a little better performance using a lambda defined function instead of a normal function.

I have absolutely no idea if that might be true on other platforms.

import re
RE_DIGIT = re.compile(r'(\d+)')
ALPHANUM_KEY = lambda s: [int(g) if g.isdigit() else g for g in RE_DIGIT.split(s)]

Here is an example of how to use the algorithm.

import random
values = [1,2,10,20]
file_list = []
for v in values:
    for vv in values:
        file_list.append("folder%d/file%d.txt" % (v,vv))

print "*** NORMAL SORT ***"
random.shuffle(file_list)
file_list.sort()
for f in file_list:
    print f

print "*** ALPHANUM_KEY SORT ***"
random.shuffle(file_list)
file_list.sort(key=ALPHANUM_KEY)
for f in file_list:
    print f

Output from the example:

*** NORMAL SORT ***
folder1/file1.txt
folder1/file10.txt
folder1/file2.txt
folder1/file20.txt
folder10/file1.txt
folder10/file10.txt
folder10/file2.txt
folder10/file20.txt
folder2/file1.txt
folder2/file10.txt
folder2/file2.txt
folder2/file20.txt
folder20/file1.txt
folder20/file10.txt
folder20/file2.txt
folder20/file20.txt
*** ALPHANUM_KEY SORT ***
folder1/file1.txt
folder1/file2.txt
folder1/file10.txt
folder1/file20.txt
folder2/file1.txt
folder2/file2.txt
folder2/file10.txt
folder2/file20.txt
folder10/file1.txt
folder10/file2.txt
folder10/file10.txt
folder10/file20.txt
folder20/file1.txt
folder20/file2.txt
folder20/file10.txt
folder20/file20.txt

Comment this note:

Your name:
Your email (hidden):
Message:
Enter the validation code :
Private! (visible for webmaster only)

No messages yet.