Wednesday, January 16, 2013

Python lists : performance or resource usage

Lists are native part of the Python language and this part makes programming easy and speedy. But every Moon has a dark side and I would like to add some light to it. Problem of the list is heavy resource's usage. Everyone should keep in mind this during coding. Simple example from python tutorial:
myfile = open("myfile.txt")
myfile.readlines()
Python opens file and creates a list from each line in it. Simple script below provides some information about executing speed and memory usage:
#!/usr/bin/python
import datetime
import resource

currenttime = datetime.datetime.now()
print "="*20
print "Creating a file "
print "="*20
myfile = open("textfile.txt", "w")
simplerange = xrange(10000000)
try:
    for i in simplerange:
        myfile.write(unicode(datetime.datetime.now()))
        myfile.write('\n')
finally:
    myfile.close()
timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20


print "="*20
print "Open file using readlines"
print "="*20
myfile = open("textfile.txt", "r")
linesinlistfile = open("linesinthelist.txt", "w")
currenttime = datetime.datetime.now()
linesinlist = myfile.readlines()
for currentline in linesinlist:
    linesinlistfile.write(currentline)

myfile.close()
linesinlistfile.close()
myf = open("linesinthelist.txt", "r")

timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20
print "openfile using readline"
print "="*20
myfile = open("textfile.txt", "r")
readonelinefile = open("readonelinefile.txt", "w")

while 1: 
    currentline = myfile.readline()
    if not currentline: break
    readonelinefile.write(currentline)
        
myfile.close()
readonelinefile.close()
timespend = datetime.datetime.now()- currenttime
print timespend
print "="*20
print "Resource usage"
print "="*20
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
This script creates a simple text file with time string in it, reads it using readline() and readlines() functions.  Last part returns memory usage in kilobytes.  For correct data I've commented part of codes related to readline or readlines.
Result is below:

readline() readlines()
executing time 0:01:10.799743 0:00:04.562637
memory usage 3620 526464
If list is used then performance is good but memory usage is really bad. Is it possible to have a good performance and good speed ? Lets try. There are two problems are present :
  1. Big list requires a lot of memory
  2. Solution without list can not be cached and be quick
But we can use small list approximately 1000 elements: read 1000 strings, make list, work with it, take another portion of the data.
while 1:    
    linesinlist = myfile.readlines(1000)
    if not linesinlist:
        break
    for currentline in linesinlist:
        linesinlistfile.write(currentline)
 
and result is below:
====================
Open file using readlines
====================
0:00:04.383583
====================
Resource usage
====================
3636
It is not hard to make good application, you should feel like it only !

Friday, January 11, 2013

Small python script for monitoring MySQL performance

I have few services which use MySQL as database server. I would like to have information about load in PNG  image  or  in Cacti app.
MySQL   has  performance information at 'SHOW STATUS' command.

Values  which  are monitored : 
 threads_running, threads_connected, thread_cached, slow_queries
 Of course,  it is really easy to add more variables.

Connection to MySQL is accomplished by MySQLdb  module. Typical example of usage is below :
import MySQLdb
mydb = MySQLdb.connect(host = 'hostname', 
                        user = 'username',
                        password = 'secret',
                        database = 'mysatabase'
)
mycursor = mydb.cursor()
mycursor.execute('SQL command')
sqlresult = cur.fetchall()

Storing data in rrd file is aviable via rrdtools package. This one is present in debian and Centos OS. example of creating file is below:
import rrdtool
rrdtool.create("myfile.rrd" ,
"DS:value1:datatype:heartbeat:lowerlimit:upperlimit ", 
"RRA:functionname:percentage:dataset:storedvalues")
This one function is more interested : "DS:value1:datatype:heartbeat:lowerlimit:upperlimit "means : value1 -- value which is stored in RRD heartbeat - howmuch time we wait before setting data to unknown lowerlimit:upperlimit - this is limits for a data "RRA:functionname:percentage:dataset:storedvalues" Functionname can be : AVERAGE the average of the data points is stored. MIN the smallest of the data points is stored. MAX the largest of the data points is stored. LAST the last data points is used. Percentage - how much unknown values can be but calculation be performed. dataset : how many values is used for calculation storedvalues : how many data are stored. Example: we are storing 5 min data and one day should be stored. In this case we need : 60/5 * 24 = 288 if we need information for week and with one hour interval then : 24 records for day (every hour) * 7 day = 168 Unite all together (Config parser is trivial and I skip it ):
#!/usr/bin/python
import MySQLdb
import sys
import rrdtool
from ConfigParser import SafeConfigParser

def main(conf_file="./mysqlmonitor.conf"):

    mydb, rrdfilename  = databaseconnect(conf_file)    

    cur = mydb.cursor()
    
    command = cur.execute('SHOW STATUS')
    res = cur.fetchall()

    for record in res:

        if record[0] == "Threads_running":
            threads_running = record[1]
            print "Threads_running:", threads_running
        if record[0] == "Threads_connected":
            threads_connected = record[1]
            print "Threads_connected:", threads_connected
        if record[0] == "Threads_cached":
            threads_cached  = record[1]
            print "Threads_cached:", threads_cached
        if record[0] == "Slow_queries":
            slow_queries = record[1]
            print slow_queries
            
                
    mydb.close()
    try:
        with open(rrdfilename) as rrdfile :
            rrdupdate(rrdfilename, threads_running, threads_connected, threads_cached, slow_queries)
    except IOError as e:
        print 'RRD file is not present creating'
        rrdcreate(rrdfile)

def rrdcreate(rrdfilename):
    """ function for creating RRD file"""
    ret = rrdtool.create(rrdfilename, "--step", "300", "--start", "0",
    "DS:threads_running:GAUGE:600:U:U",
    "DS:threads_connected:GAUGE:600:U:U",
    "DS:threads_cached:GAUGE:600:U:U",
    "DS:slow_queries:GAUGE:600:U:U",
    "RRA:AVERAGE:0.5:1:600",
    "RRA:AVERAGE:0.5:6:700",
    "RRA:AVERAGE:0.5:24:775",
    "RRA:MAX:0.5:1:600",
    "RRA:MAX:0.5:6:700",
    "RRA:MAX:0.5:444:797")


def rrdupdate(rrdfilename, threads_running, threads_connected, threads_cached, slow_queries):
    """ updating rrd data withnew information"""
    ret = rrdtool.update(rrdfilename, "N:%s:%s:%s:%s" %(threads_running, threads_connected, threads_cached, slow_queries))
    print "Updating"


def databaseconnect(conf_file=""):
    """ Log conf file and connect to database"""
    if not conf_file:
        sys.exit(-1)
    
    config = SafeConfigParser()
    config.read(conf_file)
    mydb = MySQLdb.connect(
        host = config.get('database','host'),
        user = config.get('database','user'),
        passwd = config.get('database','password'),
        db = 'INFORMATION_SCHEMA'
    )
    workingpath = config.get('files', 'rrd')
    return mydb, workingpath

if __name__ == "__main__":
    main()

Not bad for 3 hours of work!