MonetDB multiplex-funnel

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:
本文介绍一下MonetDB的漏斗功能. 此功能也是需要基于MonetDB的discovery和remote database.

m-funnel这个功能有点像plproxy, 多了一个queue/funnel的限制, 并且没有plproxy 的路由算法, 一个纯粹的代理.
plproxy接受客户端请求, 客户端的请求是并行提交给数据节点.
而MonetDB multiplex-funnel则是并行的接收客户端请求, 但是请求最终是一个一个提交给后端的所有数据节点的. 
返回结果的话则是所有数据节点并行执行的结果返回, 如有4个后端monetdbd, 那么count(*)返回4条记录.
另外需要注意的是m-funnel还限制了单条SQL返回的结果集的大小, 默认是80K.
注意使用m-funnel会打乱事务特性, 例如一个客户端的事务请求, 事务的中间可能被其他客户端的SQL插入.
MonetDB multiplex-funnel - 德哥@Digoal - PostgreSQL research

MonetDB的multiplex-funnel举例 : 
使用monetdb创建m-funnel数据库
       -m pattern
              With the -m flag, instead of creating a database, a multiplex-funnel is created.  See section MULTIPLEX-
              FUNNEL in monetdbd(1).  The pattern argument is not fully the same as a pattern for connecting  or  dis-
              covery.   Each  parallel target for the multiplex-funnel is given as username+password@pattern sequence,
              separated by commas.  Here the pattern is an ordinary pattern as would  be  used  for  connecting  to  a
              database, and can hence also be just the name of a database.

注意pattern的格式,    username+password@pattern, 这里的pattern是数据库, 所以数据库需要在同一个广播域. 并且配置好tag(假设有重名的情况下)
MonetDB multiplex-funnel - 德哥@Digoal - PostgreSQL research
这里dbx是边界, 主机属于两个广播域, 所以forward必须=proxy, 因为从db0的主机无法直接连接dbx1和dbx2. 必须通过dbx所在的monetdbd进行代理.

使用举例 : 
配置hostname
150.sky-mobi.com
db-172-16-3-221

配置/etc/sysconfig/network
HOSTNAME=db-172-16-3-221

在所有monetdb节点配置DNS或/etc/hosts, 使解析正常.
本例配置/etc/hosts
172.16.3.221 db-172-16-3-221
172.16.3.150 150.sky-mobi.com

配置防火墙, 让monetdbd之间可以相互通讯
iptables -t filter -I INPUT -s 172.16.0.0/16 -j ACCEPT

创建monetdbd storage
monetdbd create /data01/mdb1

配置discovery
monetdbd set discovery=yes /data01/mdb1

可选, 配置forward = proxy (如果这是个边界m-funnel库的话, 默认就是proxy)
monetdbd set forward=proxy /data01/mdb1

启动monetdbd
monetdbd start /data01/mdb1

创建数据库, 解锁数据库
[root@150 postgresql-9.3.5]# monetdb create db1
created database in maintenance mode: db1
[root@150 postgresql-9.3.5]# monetdb create db2
created database in maintenance mode: db2
[root@150 postgresql-9.3.5]# monetdb release db1
taken database out of maintenance mode: db1
[root@150 postgresql-9.3.5]# monetdb release db2
taken database out of maintenance mode: db2

[root@db-172-16-3-221 ~]# monetdb create dbx1
created database in maintenance mode: dbx1
[root@db-172-16-3-221 ~]# monetdb create dbx2
created database in maintenance mode: dbx2
[root@db-172-16-3-221 ~]# monetdb release dbx1
taken database out of maintenance mode: dbx1
[root@db-172-16-3-221 ~]# monetdb release dbx2
taken database out of maintenance mode: dbx2

配置shared, tag(可选, 仅仅当数据库在广播域有重名是需要使用)
[root@150 postgresql-9.3.5]# monetdb set shared=150 db1
[root@150 postgresql-9.3.5]# monetdb set shared=150 db2
[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx1
[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx2


发现
[root@db-172-16-3-221 ~]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221

[root@150 mdb1]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221


创建第一个m-funnel, 这里假设dbx这个m-funnel是边界funnel.
首先测试一下库名是否正常.
[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx1/221
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx1'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>\q
[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx2/221
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on

使用这两个库名创建m-funnel, 注意用户密码我没改, 用默认的
[root@db-172-16-3-221 ~]# monetdb create -m monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221 dbx
created multiplex-funnel in maintenance mode: dbx
[root@db-172-16-3-221 ~]# monetdb release dbx
taken database out of maintenance mode: dbx

查看状态
[root@db-172-16-3-221 ~]# monetdb get all dbx
     name          prop     source           value
dbx              name      -        dbx
dbx              type      local    mfunnel
dbx              shared    default  yes
dbx              mfunnel   local    monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221

测试第一个m-funnel是否正常, 可以正常的像两个底层数据库发出SQL.
[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>create table abc(id int);
operation successful (22.611ms)
sql>insert into abc values(1);
2 affected row (7.259ms)
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    1 |
+------+
2 tuples (1.854ms)
sql>insert into abc values(2);
2 affected row (5.553ms)
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.014ms)

已经正常了, 插入和查询都从两个节点正常返回并合并结果.
配置第一个m-funnel的tag(可选, 仅仅当广播域中这个m-funnel的名字有重名才需要)
[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx
[root@db-172-16-3-221 ~]# monetdb get all dbx
     name          prop     source           value
dbx              name      -        dbx
dbx              type      local    mfunnel
dbx              shared    local    221
dbx              mfunnel   local    monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221


创建第二个m-funnel : 
先执行discover得到URI, 从而得知database name.(含shared设置的tag)
[root@150 postgresql-9.3.5]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx/221
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221

创建m-funnel,  注意用户密码我没改, 用默认的
这个m-funnel包含3个库, 其中2个是normal 库, 另一个是前面创建的funnel, 所以实际上就是4个normal库, 但是funnel里看到的是3个库.
[root@150 postgresql-9.3.5]# monetdb create -m monetdb+monetdb@db1/150,monetdb+monetdb@db2/150,monetdb+monetdb@dbx/221 db0
created multiplex-funnel in maintenance mode: db0
[root@150 postgresql-9.3.5]# monetdb release db0
taken database out of maintenance mode: db0

查看m-funnel的状态
[root@150 postgresql-9.3.5]# monetdb get all db0
     name          prop     source           value
db0              name      -        db0
db0              type      local    mfunnel
db0              shared    default  yes
db0              mfunnel   local    monetdb+monetdb@db1/150,monetdb+monetdb@db2/150,monetdb+monetdb@dbx/221

测试 :
[root@150 postgresql-9.3.5]# mclient db0
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>create table abc(id int);  -- 前面测试的时候创建了abc表, 所以这里创建会报错.
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use

-- 特别注意, 虽然报错了, 但是实际上只是2个节点报错, 其他2个节点创建成功了.
-- 这个测试用了auto commit, 所以导致部分提交成功, 部分提交失败了.
-- 要全局一致的话, 需要用start transaction
-- 但是用start transaction的话, 会扰乱, 因为并行的事情. 后面有测试
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.607ms)
sql>insert into abc values (3);  -- 从这个插入我们就能知道, 上面建表的语句在2个节点执行成果了.
4 affected row (5.192ms)
sql>select * from abc;
+------+
| id   |
+======+
|    3 |
|    3 |
|    1 |
|    2 |
|    3 |
|    1 |
|    2 |
|    3 |
+------+
8 tuples (1.459ms)


假设我们用了事务会怎么样呢?
可能出现部分提交失败, 部分提交成功或者全部提交成功的情况, 回滚的话则全部节点回滚.
所以鉴于此, 如果报错了, 就立刻回滚, 那么是全部回滚的.
sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>select * from abc;
node dbx/221 failed: node dbx1/221 failed: 25005!current transaction is aborted (please ROLLBACK)
sql>commit;  -- 此时commit, 部分节点提交成功, 部分节点提交失败.
node dbx/221 failed: node dbx1/221 failed: 2D000!COMMIT: failed
sql>rollback;  -- 事务已经结束了, 所以不能再回滚.
node db1/150 failed: 2DM30!ROLLBACK: not allowed in auto commit mode
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.676ms)
sql>insert into abc values (3);  -- 因为部分提交成功, 所以DB1和DB2创建abc表完成.
4 affected row (5.538ms)
sql>select * from abc;
+------+
| id   |
+======+
|    3 |
|    3 |
|    1 |
|    2 |
|    3 |
|    1 |
|    2 |
|    3 |
+------+
8 tuples (1.395ms)


全部回滚的测试 : 
sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>rollback;
auto commit mode: on
sql>\dt
TABLE  sys.abc
TABLE  sys.abc

部分提交成功和失败的测试 : 
sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>\dt
TABLE  sys.abc
TABLE  sys.abc
TABLE  sys.abc
TABLE  sys.abc

全部提交成功的测试 : 
sql>insert into abc values (10);
4 affected row (5.611ms)
sql>select * from abc where id=10;
+------+
| id   |
+======+
|   10 |
|   10 |
|   10 |
|   10 |
+------+
4 tuples (1.906ms)

如果需要全局事务一致的话, 必须使用start trasaction;

使用start transaction带来的困扰, 原因就是漏斗效应, 因为并行转换成了串行来执行, 所以中间会被其他SQL插入, 测试如下.
会话A
sql>start transaction;
auto commit mode: off

会话B
sql>start transaction;  -- 因为在m-funnel 中已经启动了事务, 所以再次启动失败
node db1/150 failed: 25001!START TRANSACTION: cannot start a transaction within a transaction
sql>select * from abc;
node db1/150 failed: 25005!current transaction is aborted (please ROLLBACK)

会话A
sql>select * from abc;  -- 回退
node db1/150 failed: 25005!current transaction is aborted (please ROLLBACK)

-- 任何一个会话执行了rollback都可以, 就结束这个事务了.


[参考]
1. man monetdbd
MULTIPLEX-FUNNELS
       Monetdbd  implements multiplex-funnel capabilities.  As the name suggests two techniques are combined, the mul-
       tiplexer and the funnel.

       The funnel capability limits the access to the database to one client at a time.  That is, if multiple  clients
       connect  to  the  funnel, their queries will be serialized such that they are executed one after the other.  An
       effect of this approach is that clients no longer have an exclusive channel to the database, which  means  that
       individual queries from one client may have been interleaved with queries from others.  This most notably makes
       SQL transaction blocks unreliable with a funnel.  The funnel, hence, is meant to scale down a large  amount  of
       clients that perform short-running (read-only) queries, as typically seen in web-based query loads.

       When a funnel is defined to use multiple databases, the funnel adds a multiplexer to its query channel.  A mul-
       tiplex-funnel sends each query to all of the defined databases.  This behavior can be quite confusing at first,
       but  proves  to be useful in typical sharding configurations, where in particular simple selection queries have
       to be performed on each of the shards.  The multiplexer combines the answers from all defined databases in  one
       single  answer  that it sends back to the client.  However, this combining is without any smart logic, that is,
       the multiplexer does not evaluate the query it is running, but just combines all answers it receives  from  the
       databases.   This  results  in  e.g.  as many return tuples for a SELECT COUNT(*) query, as there are databases
       defined.

       Due to the two above mentioned characteristics, a multiplex-funnel has some limitations.  As mentioned  before,
       transactions over multiple queries are likely not to result in the desired behavior.  This is due to each query
       to the funnel is required to be self-contained.  Further, since for  each  query,  the  results  from  multiple
       servers  have  to  be  combined  into one, that query must only return a single response, i.e.  multi-statement
       queries are most likely causing the funnel to respond with an error, or return garbled results.  Last, the size
       of  each query is limited to currently about 80K.  While this size should be sufficient for most queries, it is
       likely not enough for e.g. COPY INTO statements.  Apart from the data transfer  implications,  such  statements
       should not be used with the funnel, as the results will be undefined due to the limited query buffer.  Applica-
       tions using the funnel should aim for short and single-statement queries that require no transactions.

       See the create command in the monetdb(1) man-page for details on how to setup a multiplex-funnel.

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
8月前
|
存储 SQL 大数据
大数据处理之魔幻的NULL
大数据处理之魔幻的NULL
56 0
|
8月前
|
存储 设计模式 算法
Blink Tree 比 B+Tree 性能猛多少???
Blink 树和 B+ 树都是一种类似于B树的数据结构,用于在磁盘上存储和索引数据以实现高效查找和操作。它们的主要区别在于内部节点和叶子节点的结构以及指针的使用方式。总的来说,Blink 树的特点是将内部节点和叶子节点合并为一个节点,减少了树的高度;而 B+ 树通过叶子节点之间的有序链表提高了范围查询和顺序遍历的性能。Blink tree 真的牛啊!# 如果键已经存在,更新值else:# 如果根节点已满,进行分裂else:# 如果是叶子节点,直接插入index = 0index += 1。
492 1
|
算法
白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端
白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端
107 0
|
自然语言处理 搜索推荐 索引
白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐
白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐
146 0
|
存储 安全 Linux
|
算法 关系型数据库 MySQL
长路漫漫, 从Blink-tree 到Bw-tree (上)
> 天不生我 bw-tree, 索引万古如长夜 > ### 背景 在前面的文章 [路在脚下, 从BTree 到Polar Index](https://zhuanlan.zhihu.com/p/374000358)中提到, 我们已经将InnoDB 里面Btree 替换成Blink Tree, 高并发压力下, 在标准的TPCC 场景中最高能够有239%的性能提升, 然后我们对InnoDB 的fi
429 1
|
安全 开发工具 git
Merging 和 Rebasing 的大比拼
Merging 和 Rebasing 的大比拼
172 0
Merging 和 Rebasing 的大比拼
|
存储 机器学习/深度学习 算法
MonetDB/X100: Hyper-Pipelining Query Execution 论文解读
这是关于MonetDB执行引擎的一篇paper,总体分为了2个部分,第一部分主要讲了下modern cpu的工作原理并给出了一个TPC-H Q1的例子,从这部分中我们便可以清晰的看到为什么向量化的执行方式如此有意义。第二部分则主要介绍了MonetDB X/100这个新的向量化执行引擎。这篇paper被引用的极为广泛,启发了后续很多列存数据库在执行引擎上的设计思路。个人感觉第一部分尤其有意义,如果想入门下列存/向量化执行,看看第一部分应该就有些概念了。
856 0
MonetDB/X100: Hyper-Pipelining Query Execution 论文解读
|
SQL 存储 分布式计算
【详谈 Delta Lake 】系列技术专题 之 特性(Features)
本文翻译自大数据技术公司 Databricks 针对数据湖 Delta Lake 的系列技术文章。众所周知,Databricks 主导着开源大数据社区 Apache Spark、Delta Lake 以及 ML Flow 等众多热门技术,而 Delta Lake 作为数据湖核心存储引擎方案给企业带来诸多的优势。本系列技术文章,将详细展开介绍 Delta Lake。
【详谈 Delta Lake 】系列技术专题 之 特性(Features)
|
SQL 存储 数据采集
【详谈 Delta Lake 】系列技术专题 之 基础和性能(Fundamentals and Performance)
本文翻译自大数据技术公司 Databricks 针对数据湖 Delta Lake 的系列技术文章。众所周知,Databricks 主导着开源大数据社区 Apache Spark、Delta Lake 以及 ML Flow 等众多热门技术,而 Delta Lake 作为数据湖核心存储引擎方案给企业带来诸多的优势。本系列技术文章,将详细展开介绍 Delta Lake。
【详谈 Delta Lake 】系列技术专题 之 基础和性能(Fundamentals and Performance)

热门文章

最新文章