mongoDB VS PostgreSQL dml performance use python (pymongo & py-postgresql)

本文涉及的产品
云数据库 MongoDB,独享型 2核8GB
推荐场景:
构建全方位客户视图
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:
前面测试了mongodb和postgresql的插入性能对比, 参考如下 : 
本文将测试对比一下select, update, 以及select, insert, update混合场景的性能.
同样使用并行8个线程测试.
(因为使用的驱动问题, python测试结果性能较差, mongo改用motor驱动的异步调用后, 性能提升明显, 测试结果仅供参考, 如果是PG的话, 建议使用pgbench测试)
mongoDB : 
更新测试 :
# vi test.py
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
for i in range(0,1000000):
  collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果379秒 : 
[root@localhost ~]# python test.py
0
TID:0 1423070907.7699816
TID:1 1423070907.770696
TID:2 1423070907.7744105
TID:3 1423070907.7760801
TID:4 1423070907.779555
TID:7 1423070907.78037
TID:6 1423070907.7860947
TID:5 1423070907.78793
TID:6 1423071285.5971715
377.81107687950134
TID:7 1423071286.6500263
378.8696563243866
TID:0 1423071286.9464445
379.17646288871765
TID:1 1423071287.1227949
379.352098941803
TID:3 1423071287.1230247
379.3469445705414
TID:5 1423071287.2262568
379.4383268356323
TID:4 1423071287.2609653
379.4814102649689
TID:2 1423071287.4058232
379.6314127445221


查询测试 : 
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果361秒 : 
[root@localhost ~]# python test.py
2000000
TID:2 1423079875.4572093
TID:3 1423079875.4576375
TID:1 1423079875.4596934
TID:0 1423079875.4600854
TID:4 1423079875.4589622
TID:5 1423079875.4653761
TID:6 1423079875.463017
TID:7 1423079875.4694664
TID:7 1423080235.7239776
360.2545111179352
TID:3 1423080236.109339
360.6517014503479
TID:4 1423080236.1194305
360.66046833992004
TID:1 1423080236.260948
360.8012545108795
TID:2 1423080236.5218844
361.06467509269714
TID:5 1423080236.6404896
361.17511343955994
TID:0 1423080236.6446981
361.1846127510071
TID:6 1423080236.6607506
361.1977336406708


更新, 插入, 查询综合测试 : 
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.insert({'id':random.randrange(1000001,2000000), 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果1150秒 : 
[root@localhost ~]# python test.py
2000000
TID:0 1423080359.006871
TID:1 1423080359.0083587
TID:2 1423080359.009925
TID:4 1423080359.0124109
TID:3 1423080359.015088
TID:5 1423080359.0179524
TID:6 1423080359.0209677
TID:7 1423080359.0235417
TID:4 1423081508.11507
1149.1026592254639
TID:7 1423081508.1888452
1149.1653034687042
TID:2 1423081508.2641344
1149.2542095184326
TID:1 1423081508.3973265
1149.3889677524567
TID:0 1423081508.5400703
1149.5331993103027
TID:6 1423081508.594207
1149.573239326477
TID:5 1423081509.0279126
1150.0099601745605
TID:3 1423081509.0943637
1150.0792756080627


PostgreSQL : 
更新测试 :
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
db.execute("drop table if exists tt")
db.execute("create table tt(id int, username name, age int2, email text, qq text)")
ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
for i in range(0,1000000):
  ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    upd = db.prepare("update tt set age=$1 where id=$2")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      upd(random.randrange(0,100), random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果244秒 : 
TID:0 1423072792.0481002
TID:1 1423072792.050467
TID:3 1423072792.0514963
TID:2 1423072792.051693
TID:5 1423072792.059382
TID:4 1423072792.0605848
TID:7 1423072792.0643597
TID:6 1423072792.0657377
TID:2 1423073034.8827112
242.8310182094574
TID:5 1423073035.024978
242.96559596061707
TID:4 1423073035.2550452
243.1944603919983
TID:7 1423073035.5245414
243.46018171310425
TID:1 1423073036.0639975
244.0135304927826
TID:3 1423073036.3519847
244.30048847198486
TID:0 1423073036.5292883
244.48118805885315
TID:6 1423073036.5383787
244.47264099121094


查询测试 : 
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    sel = db.prepare("select * from tt where id=$1 limit 1")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果438秒 : 
postgres@localhost-> python test.py
[(1000000,)]
TID:2 1423081634.6041436
TID:1 1423081634.6072564
TID:5 1423081634.6098883
TID:0 1423081634.6110475
TID:6 1423081634.611464
TID:7 1423081634.6147678
TID:3 1423081634.617597
TID:4 1423081634.6184704
TID:7 1423082070.8112974
436.1965296268463
TID:4 1423082071.5796437
436.96117329597473
TID:5 1423082071.6695313
437.0596430301666
TID:0 1423082071.8521369
437.24108934402466
TID:1 1423082072.5634701
437.95621371269226
TID:2 1423082072.678791
438.0746474266052
TID:3 1423082072.9825838
438.3649866580963
TID:6 1423082072.9963892
438.3849251270294


更新, 插入, 查询综合测试 : 
import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres", 
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
    upd = db.prepare("update tt set age=$1 where id=$2")
    sel = db.prepare("select * from tt where id=$1")

    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      ins(random.randrange(1000001,2000000),'digoal.zhou',32,'digoal@126.com','276732431')
      upd(random.randrange(0,100), random.randrange(0,1000000))
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果938秒 : 
postgres@localhost-> python test.py
[(1000000,)]
TID:0 1423083626.888068
TID:2 1423083626.8912995
TID:1 1423083626.8920445
TID:3 1423083626.893638
TID:4 1423083626.8974612
TID:6 1423083626.9039218
TID:5 1423083626.9061637
TID:7 1423083626.908666

TID:5 1423084561.3804135
934.4742498397827
TID:6 1423084563.4344044
936.5304825305939
TID:2 1423084564.1677904
937.2764909267426
TID:0 1423084564.5768228
937.6887547969818
TID:4 1423084564.839536
937.9420747756958
TID:1 1423084564.9242597
938.0322151184082
TID:7 1423084565.0638845
938.1552186012268
TID:3 1423084565.345857
938.4522190093994


PostgreSQL使用pgbench的测试结果 : 
# 初始化数据
psql
truncate tt;
insert into tt select generate_series(1,1000000), 'digoal.zhou',32,'digoal@126.com','276732431';

更新
postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
\setrandom v_age 0 100
update tt set age=:v_age where id=:v_id;

测试结果32秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 31631.177435 (including connections establishing)
tps = 31637.366682 (excluding connections establishing)
statement latencies in milliseconds:
        0.002963        \setrandom v_id 0 1000000
        0.000671        \setrandom v_age 0 100
        0.232106        update tt set age=:v_age where id=:v_id;

查询
postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
select * from tt where id=:v_id;

测试结果15秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 66487.929382 (including connections establishing)
tps = 66514.422913 (excluding connections establishing)
statement latencies in milliseconds:
        0.002007        \setrandom v_id 0 1000000
        0.104391        select * from tt where id=:v_id;

插入, 更新, 查询
postgres@localhost-> vi test.sql
\setrandom v_newid 1000001 2000000
\setrandom v_id 0 1000000
\setrandom v_age 0 100
insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
update tt set age=:v_age where id=:v_id;
select * from tt where id=:v_id;

测试结果耗时69秒 : 
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 14429.720005 (including connections establishing)
tps = 14431.006796 (excluding connections establishing)
statement latencies in milliseconds:
        0.003031        \setrandom v_newid 1000001 2000000
        0.000816        \setrandom v_id 0 1000000
        0.000733        \setrandom v_age 0 100
        0.150249        insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
        0.180603        update tt set age=:v_age where id=:v_id;
        0.190850        select * from tt where id=:v_id;


[小结]
python 测试结果 : 
mongoDB 
插入100W记录耗时 - 364秒
更新100W记录耗时 - 379
100W记录索引检索100W次耗时 - 361
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 1150

PostgreSQL
插入100W记录耗时 - 226秒.
更新100W记录耗时 - 244 秒.
100W记录索引检索100W次耗时 - 438 秒.
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 938 秒.

PostgreSQL 使用pgbench测试结果 : 
插入100W记录耗时 - 16秒
更新100W记录耗时 - 32 秒.
100W记录索引检索100W次耗时 - 15 秒.
综合测试, 插入 100W记录 , 更新 100W记录 , 查询 100W次耗时 - 69 秒.

[参考]
1.  http://blog.163.com/digoal@126/blog/static/16387704020151435825593/
相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。   相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
目录
相关文章
|
1月前
|
数据采集 缓存 Java
Python vs Java:爬虫任务中的效率比较
Python vs Java:爬虫任务中的效率比较
|
1月前
|
存储 关系型数据库 MySQL
一个项目用5款数据库?MySQL、PostgreSQL、ClickHouse、MongoDB区别,适用场景
一个项目用5款数据库?MySQL、PostgreSQL、ClickHouse、MongoDB——特点、性能、扩展性、安全性、适用场景比较
|
22天前
|
存储 关系型数据库 MySQL
MySQL vs. PostgreSQL:选择适合你的开源数据库
在众多开源数据库中,MySQL和PostgreSQL无疑是最受欢迎的两个。它们都有着强大的功能、广泛的社区支持和丰富的生态系统。然而,它们在设计理念、性能特点、功能特性等方面存在着显著的差异。本文将从这三个方面对MySQL和PostgreSQL进行比较,以帮助您选择更适合您需求的开源数据库。
89 4
|
21天前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
Python Web框架比较:Django vs Flask vs Pyramid
28 1
|
1月前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
Python Web框架比较:Django vs Flask vs Pyramid
24 4
|
1月前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【10月更文挑战第10天】本文比较了Python中三个最受欢迎的Web框架:Django、Flask和Pyramid。Django以功能全面、文档完善著称,适合快速开发;Flask轻量灵活,易于上手;Pyramid介于两者之间,兼顾灵活性和安全性。选择框架时需考虑项目需求和个人偏好。
33 1
|
1月前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【10月更文挑战第6天】本文比较了Python中三个最受欢迎的Web框架:Django、Flask和Pyramid。Django功能全面,适合快速开发;Flask灵活轻量,易于上手;Pyramid介于两者之间,兼顾灵活性和可扩展性。文章分析了各框架的优缺点,帮助开发者根据项目需求和个人偏好做出合适的选择。
35 4
|
1月前
|
SQL 关系型数据库 数据库
使用 PostgreSQL 和 Python 实现数据库操作
【10月更文挑战第2天】使用 PostgreSQL 和 Python 实现数据库操作
|
1月前
|
关系型数据库 数据库 PostgreSQL
使用 PostgreSQL 和 Python 实现全文搜索
【10月更文挑战第2天】使用 PostgreSQL 和 Python 实现全文搜索
26 1
|
1月前
|
C++ Python
Python Tricks--- Object Comparisons:“is” vs “==”
Python Tricks--- Object Comparisons:“is” vs “==”