Skip to main content

Python lists : performance or resource usage

Lists are native part of the Python language and this part makes programming easy and speedy. But every Moon has a dark side and I would like to add some light to it. Problem of the list is heavy resource's usage. Everyone should keep in mind this during coding. Simple example from python tutorial:
myfile = open("myfile.txt")
myfile.readlines()
Python opens file and creates a list from each line in it. Simple script below provides some information about executing speed and memory usage:
#!/usr/bin/python
import datetime
import resource

currenttime = datetime.datetime.now()
print "="*20
print "Creating a file "
print "="*20
myfile = open("textfile.txt", "w")
simplerange = xrange(10000000)
try:
    for i in simplerange:
        myfile.write(unicode(datetime.datetime.now()))
        myfile.write('\n')
finally:
    myfile.close()
timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20


print "="*20
print "Open file using readlines"
print "="*20
myfile = open("textfile.txt", "r")
linesinlistfile = open("linesinthelist.txt", "w")
currenttime = datetime.datetime.now()
linesinlist = myfile.readlines()
for currentline in linesinlist:
    linesinlistfile.write(currentline)

myfile.close()
linesinlistfile.close()
myf = open("linesinthelist.txt", "r")

timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20
print "openfile using readline"
print "="*20
myfile = open("textfile.txt", "r")
readonelinefile = open("readonelinefile.txt", "w")

while 1: 
    currentline = myfile.readline()
    if not currentline: break
    readonelinefile.write(currentline)
        
myfile.close()
readonelinefile.close()
timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20
print "Resource usage"
print "="*20
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This script creates a simple text file with time string in it, reads it using readline() and readlines() functions.  Last part returns memory usage in kilobytes.  For correct data I've commented part of codes related to readline or readlines.
Result is below:

readline() readlines()
executing time 0:01:10.799743 0:00:04.562637
memory usage 3620 526464
If list is used then performance is good but memory usage is really bad. Is it possible to have a good performance and good speed ? Lets try. There are two problems are present :
  1. Big list requires a lot of memory
  2. Solution without list can not be cached and be quick
But we can use small list approximately 1000 elements: read 1000 strings, make list, work with it, take another portion of the data.
while 1:    
    linesinlist = myfile.readlines(1000)
    if not linesinlist:
        break
    for currentline in linesinlist:
        linesinlistfile.write(currentline)
 
and result is below:
====================
Open file using readlines
====================
0:00:04.383583
====================
Resource usage
====================
3636
It is not hard to make good application, you should feel like it only !

Comments

Popular posts from this blog

Update grub using dracut

Fixing grub using dracut Last kernel update was not successful to me. Centos can not boot with next messages:  [ 180.098802] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 180.610167] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 181.121619] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 181.633093] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 182.144831] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 182.656146] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 183.167306] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 183.678755] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts  Of course simples way  is creating  linux  usb stick  and fix it. But dracut
  debian,  amavis,  virus inside archive   One my client asked informed me, that amavis skips some files types. mail server configuration is really simple: Postfix as SMTP server and  amavis working as context filter. Also amavis runs spamassasin and clamd antivirus. Amavis gets files from attachment and unpack it. lha file is not detected. short investigation First I deceided to run amavis  in debug mode and verify how virus passed postix+amavis.  root@newserver:/var/lib/amavis# /etc/init.d/amavis stop [ ok ] Stopping amavis (via systemctl): amavis.service. root@newserver:/var/lib/amavis# /etc/init.d/amavis debug Trying to run amavisd-new in debug mode. Debug mode inform about loaded plugins: ' Nov 13 22:07:23.335 newserver. /usr/sbin/amavisd-new[40334]: Found decoder for .cpio at /bin/pax Nov 13 22:07:23.336 newserver. /usr/sbin/amavisd-new[40334]: Found decoder for .tar at /bin/pax Nov 13 22:07:23.336 newserver. /usr/sbin/amavisd-new[40334]

Postfix can not start via systemd (simple fix)

Solving problem related to systemd process I like postfix.   This is really smart and secure mail server. I'm helping above  dozen clients around the world and  tunning  postfix is really fun task. This morning I was downgrading postfix  to the stable version for one of the my friends and come across interesting issue.  root@newserver:/etc/init.d# systemctl status postfix ● postfix.service Loaded: masked (/dev/null; bad) Active: inactive (dead) since вт 2017-06-13 14:35:41 EEST; 1h 48min ago Main PID: 25145 (code=exited, status=0/SUCCESS) чер 13 14:47:09 newserver systemd[1]: Stopped postfix.service. чер 13 14:47:29 newserver systemd[1]: Stopped postfix.service. чер 13 14:58:22 newserver systemd[1]: Stopped postfix.service. чер 13 14:58:23 newserver systemd[1]: Stopped postfix.service. чер 13 15:05:20 newserver systemd[1]: Stopped postfix.service. чер 13 15:29:06 newserver systemd[1]: Stopped postfix.service. чер 13 15:29:06 newserver systemd[1]: Stopp