Skip to main content

Postgresql partituoning (few simple steps)

History

I created a simple application for logistic purpose based on Django, PostgresSQL, and Postgis.  This is a service receiving a position of a truck and place it on the map. Service is really simple and uses a virtual machine with small performance.  The main problem of the current solution is a small performance of this cloud server.  Most of the cloud providers use LAN for attaching virtual server's drives. As result speed of disk I/O is really small.  


root@sergiy:/home/serg# fdisk -l 
Disk /dev/xvda1: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/xvda2: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@sergiy:/home/serg# hdparm -tT /dev/xvda1

/dev/xvda1:
 Timing cached reads:   18876 MB in  1.99 seconds = 9499.04 MB/sec
 Timing buffered disk reads:  98 MB in  3.11 seconds =  31.55 MB/sec
root@sergiy:/home/serg# hdparm -tT /dev/xvda1

/dev/xvda1:
 Timing cached reads:   19510 MB in  1.99 seconds = 9805.60 MB/sec
 Timing buffered disk reads:  58 MB in  3.05 seconds =  19.04 MB/sec


I'm sure that drive has a good speed but network's bottleneck makes this 19.04MB/sec.  The simplest way to solving this is preparing real server or add more memory for this virtual server.  But I'm not ready to make this money spree!
My application adds records to databases on time base. Records are never updated and it is really rare event adding records related to the past. Sometimes user makes SELECT based on time range related to the business period (day, week, month).  In this case, Postgres table partitioning is a good way to do.

Database analyzing and preparing. 

Big table with my data is named as car_points and has next structure:


geocar=# \d car_points
                                    Table "public.car_points"
   Column    |           Type           |                        Modifiers                        
-------------+--------------------------+---------------------------------------------------------
 id          | integer                  | not null default nextval('car_points_id_seq'::regclass)
 Speed       | integer                  | not null
 CreatedTime | timestamp with time zone | not null
 Course      | integer                  | not null
 Altitude    | integer                  | not null
 Sat         | integer                  | not null
 Point       | geometry(Point,4326)     | not null
 Car_id      | integer                  | not null
 SensorData  | jsonb                    | not null
Indexes:
    "car_points_pkey" PRIMARY KEY, btree (id)
    "car_points_0d7f622d" btree ("Car_id")
    "car_points_Point_id" gist ("Point")
    "idx_time" btree ("CreatedTime")

Column 'id' is required by django.  I dont know how to remove it. 'CreatedTime' is the column which is used as partitioning criteria. 'Point' column is part of PostGIS functionality (This made some headache for me).
   General information about partitioning is here.  Postgres information about partitioning.  I've decided to use inherit feature for my table.  Also, it is good to have a special trigger for creating the new partition.
First and really important! Make backup!

I hope you are doing the backup on regular basis. I'm sure that better to have two no needed backup that no backup in the critical moment.
Create a function for insert record in the correct table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
CREATE OR REPLACE FUNCTION createpar() RETURNS trigger
AS
$BODY$
DECLARE
   partition_date TEXT; 
   partition TEXT;
 --  point TEXT;
BEGIN
   -- 
   partition_date := to_char(NEW."CreatedTime", 'YYYY_WW');
   partition := TG_TABLE_NAME || '_' ||partition_date;
   IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname=partition)
   THEN
    RAISE NOTICE 'Creating partition: %',partition;
    -- Should add correct week check 
    EXECUTE 'CREATE TABLE ' || partition || ' (check (to_char("CreatedTime", ''YYYY_WW'') = ''' || partition_date || ''')) INHERITS (' || TG_TABLE_NAME || ');';
   END IF;
   EXECUTE 'INSERT INTO ' || partition || '(id, "Speed", "CreatedTime", "Course", "Altitude", "Sat", "Point", "Car_id", "SensorData" ) VALUES (' ||
   NEW."id" ||','||
   NEW."Speed" || ',''' ||
   NEW."CreatedTime"|| ''',' ||
   NEW."Course"||','||
   NEW."Altitude" || ','||
   NEW."Sat" || ',' ||   
    ' ''' || NEW."Point" || '''::geometry' ||','||
   NEW."Car_id" || ','''  ||
   NEW."SensorData" ||
''') RETURNING id;';
--   INSERT INTO partition  VALUES(NEW.*);
   RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

Source code of this function is here: postgrsfunction
I would like to clarify some code:
String #9-10  generates name for our partition  using next format : table name plus year and week number. If no partition for new record  then partition is created #16.  Function calculate year and week in to string and check if partition with  this name is presen.
After this our record is inserted in our new partioned table.  Unfortunately EXECUTE  can not insert postgis geometry corectly. (NEW.* has geometry column , but INSERT requires geometry as string with geometry tailing tag).  If you dont have PostGIS column you can use NEW.* directly (string#29)

After this I've add triger for every new  inserted record using this function.


geocar=#                                 
CREATE TRIGGER createpartitiontrigger BEFORE INSERT on car_points FOR EACH ROW EXECUTE PROCEDURE  createpar();
Few minutes later new partition is created:

geocar=# \d
                           List of relations
  Schema  |               Name                |   Type   |    Owner    
----------+-----------------------------------+----------+-------------
 public   | car_car                           | table    | cartracking
 public   | car_car_id_seq                    | sequence | cartracking
 public   | car_points                        | table    | cartracking
 public   | car_points_2017_43                | table    | cartracking
 public   | car_points_id_seq                 | sequence | cartracking
  Time to make simple analize:

geocar=# EXPLAIN ANALYZE select  count(*)  from car_points_2017_43  where "CreatedTime" > '2017-10-26';
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=882.14..882.15 rows=1 width=8) (actual time=89.705..89.706 rows=1 loops=1)
   ->  Seq Scan on car_points_2017_43  (cost=0.00..803.62 rows=31407 width=0) (actual time=7.821..58.318 rows=29811 loops=1)
         Filter: ("CreatedTime" > '2017-10-26 00:00:00+03'::timestamp with time zone)
 Planning time: 0.139 ms
 Execution time: 89.768 ms
(5 rows)

geocar=# EXPLAIN ANALYZE select  count(*)  from car_points  where "CreatedTime" > '2017-10-26';
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13748.21..13748.22 rows=1 width=8) (actual time=655.247..655.249 rows=1 loops=1)
   ->  Append  (cost=0.43..13656.98 rows=36490 width=0) (actual time=24.448..589.689 rows=68910 loops=1)
         ->  Index Only Scan using idx_time on car_points  (cost=0.43..12853.36 rows=5083 width=0) (actual time=24.446..347.378 rows=39099 loops=1)
               Index Cond: ("CreatedTime" > '2017-10-26 00:00:00+03'::timestamp with time zone)
               Heap Fetches: 39099
         ->  Seq Scan on car_points_2017_43  (cost=0.00..803.62 rows=31407 width=0) (actual time=0.040..116.462 rows=29811 loops=1)
               Filter: ("CreatedTime" > '2017-10-26 00:00:00+03'::timestamp with time zone)
 Planning time: 0.309 ms
 Execution time: 655.309 ms
(9 rows)

SELECT uses our new partition  for data  but index from parent table.  In this case It will be good create index for our table and move data from parent table to partituoned table.  Of course this requires data dumping and reloading but does not requires stopping database as MySQL !


I've used idea from StreamBright: https://medium.com/@StreamBright/creating-partitions-automatically-in-postgresql-7006d68c0fbb

Comments

Popular posts from this blog

Update grub using dracut

Fixing grub using dracut Last kernel update was not successful to me. Centos can not boot with next messages:  [ 180.098802] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 180.610167] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 181.121619] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 181.633093] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 182.144831] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 182.656146] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 183.167306] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts [ 183.678755] dracut-initqueue[376]: Warning: dracut-initqueue timeout - starting timeout scripts  Of course simples way  is creating  linux  usb stick  and fix it. But dracut
  debian,  amavis,  virus inside archive   One my client asked informed me, that amavis skips some files types. mail server configuration is really simple: Postfix as SMTP server and  amavis working as context filter. Also amavis runs spamassasin and clamd antivirus. Amavis gets files from attachment and unpack it. lha file is not detected. short investigation First I deceided to run amavis  in debug mode and verify how virus passed postix+amavis.  root@newserver:/var/lib/amavis# /etc/init.d/amavis stop [ ok ] Stopping amavis (via systemctl): amavis.service. root@newserver:/var/lib/amavis# /etc/init.d/amavis debug Trying to run amavisd-new in debug mode. Debug mode inform about loaded plugins: ' Nov 13 22:07:23.335 newserver. /usr/sbin/amavisd-new[40334]: Found decoder for .cpio at /bin/pax Nov 13 22:07:23.336 newserver. /usr/sbin/amavisd-new[40334]: Found decoder for .tar at /bin/pax Nov 13 22:07:23.336 newserver. /usr/sbin/amavisd-new[40334]

Postfix can not start via systemd (simple fix)

Solving problem related to systemd process I like postfix.   This is really smart and secure mail server. I'm helping above  dozen clients around the world and  tunning  postfix is really fun task. This morning I was downgrading postfix  to the stable version for one of the my friends and come across interesting issue.  root@newserver:/etc/init.d# systemctl status postfix ● postfix.service Loaded: masked (/dev/null; bad) Active: inactive (dead) since вт 2017-06-13 14:35:41 EEST; 1h 48min ago Main PID: 25145 (code=exited, status=0/SUCCESS) чер 13 14:47:09 newserver systemd[1]: Stopped postfix.service. чер 13 14:47:29 newserver systemd[1]: Stopped postfix.service. чер 13 14:58:22 newserver systemd[1]: Stopped postfix.service. чер 13 14:58:23 newserver systemd[1]: Stopped postfix.service. чер 13 15:05:20 newserver systemd[1]: Stopped postfix.service. чер 13 15:29:06 newserver systemd[1]: Stopped postfix.service. чер 13 15:29:06 newserver systemd[1]: Stopp