Python 3.0 I/O performance issues

Posted December 10th @ 8:55 am by admin

Python 3.0 (aka Python 3000) was released about a week ago.

In the what’s new the python team claim:

The net result of the 3.0 generalizations is that Python 3.0 runs the pystone benchmark around 10% slower than Python 2.5. Most likely the biggest cause is the removal of special-casing for small integers. There’s room for improvement, but it will happen after 3.0 is released!

In this week however, many people notice big performance differences in the new version versus the old 2.5/2.6 series related to I/O.

Since the read and write performance are very important for my work (data analysis), I did some tests in my laptop (MacBook Pro 2.4Ghz).

I installed from source code Python 2.5.2 and Python 3.0. Similar script has been compared by running them 5 times and taking average times.

Binary Read Test

Loading of a big file (156Mb) into memory in one step.

#!python2.5
import time
 
f = open("bigfile.txt","rb")
start = time.time()
f.read()
stop = time.time()
f.close()
print "%.3f sec" % (stop-start)
#!python3.0
import time
 
f = open("bigfile.txt","rb")
start = time.time()
f.read()
stop = time.time()
f.close()
print("%.3f sec" % (stop-start))
Avg: 0.459 sec Avg: 0.761 sec (+66%)

Text Read Test

Iterating over a big file (156Mb) line by line.

import time
 
f = open("bigfile.txt","r")
start = time.time()
for l in f:
	pass
stop = time.time()
f.close()
print "%.3f sec" % (stop-start)
import time
 
f = open("bigfile.txt","rt")
start = time.time()
for l in f:
	pass
stop = time.time()
f.close()
print("%.3f sec" % (stop-start))
Avg: 0.713 sec Avg: 35.709 sec (+4,909%)

 

Binary Write Test

This test measure writing time for binary data 100Mb.

import time
 
l = "A" * 1024*1024
 
f = open("writefile.txt","wb")
start = time.time()
for i in xrange(100):
	f.write(l)
stop = time.time()
f.close()
print "%.3f sec" % (stop-start)
import time
 
l = "A" * 1024*1024
l=l.encode("iso-8859-1")
f = open("writefile.txt","wb")
start = time.time()
for i in range(100):
	f.write(l)
stop = time.time()
f.close()
print("%.3f sec" % (stop-start))
Avg: 2.501 sec 2.572 sec (+3%)

 

Text Write Test

Write performance for text files.

import time
 
l = "*" * 1024 + "\n"
f = open("writefile.txt","w")
start = time.time()
for i in xrange(100000):
	f.write(l)
stop = time.time()
f.close()
print "%.3f sec" % (stop-start)
import time
 
l = "*" * 1024 + "\n"
f = open("writefile.txt","wt")
start = time.time()
for i in range(100000):
	f.write(l)
stop = time.time()
f.close()
print("%.3f sec" % (stop-start))
Avg: 2.564 sec Avg: 5.315 (+107%)

 

Print

From Python 3.0 “print” became a function. Previously it was a statement.  What does it means? From a syntax point of view, are necessary small changes (for example add brackets around parameters). But the real bad news are on the execution time. In this test I compare the execution of the statement/function without parameters.

import timeit
import sys
cmd='for i in xrange(1000000): print '
t=timeit.Timer(cmd).timeit(1)
 
sys.stderr.write("%.3f sec\n" % t)
import timeit
import sys
cmd='for i in range(1000000): print()'
t=timeit.Timer(cmd).timeit(1)
 
sys.stderr.write("%.3f\n" % t)
Avg: 0.230 sec Avg: 10.956 sec (+4,655%)

 

In the next python release (3.0.1), is expected some speed improvement on many of the previous tests. I will redo my test when this new version will be available.

6 Comments

  1. Masklinn
    December 10, 2008 at 17:14

    I’m pretty sure opening files in text mode now mandatorily decodes them (or encodes them) to/from the default platform encoding (I think). In Python 2.x on the other hand, opening a file as text is the same (or almost, I think it’s only different under windows) as opening them in binary mode, meaning no decoding overhead, meaning the comparison is apples and oranges for text read and write.

    Which is why there’s very few IO difference in binary mode, but a huge one in text mode.

    For a fair comparison, you should use the stream factories of the codecs module.

  2. David
    December 10, 2008 at 18:34

    It is starting to look like the performance enhancements won’t go in until 3.1…there is some debate over whether increased performance is a “feature” or poor performance is a “bug”. In short, I wouldn’t hold my breath if I were you. I’d give it until February for something to move on the performance issue.

  3. Dmitry Chestnykh
    December 10, 2008 at 22:51

    io.open(file[, mode[, buffering[, encoding[, errors[, newline[, closefd=True]]]]]])

    For text mode: What if you specify encoding? Also, try setting line buffering (buffering=1).

  4. Harry
    December 10, 2008 at 23:02

    Did you mean #!python3.0 in the top right code box?

  5. admin
    December 11, 2008 at 00:07

    Harry wrote: “Did you mean #!python3.0 in the top right code box?”
    Yes, thanks. I corrected this mistake.

    In response to Masklinn: Yes, I use the standard way to open text files in the two versions of python. For optimizing the Py3k version I modified the open mode in the Text Read Test from “rt” to “rb” and now the script take 15.47 sec (+2,070%). So the encoding takes about half of the time, but this result is far from the speed of Python 2.5.

    Changing encoding or buffer policy to line buffering doesn’t affect the results of the original test.

  6. Dmitry Chestnykh
    December 11, 2008 at 07:54

    Here’s this bug in Python issue tracker: Issue 4561.

Sorry, comments for this entry are closed at this time.

Options:

Size

Colors