标签
PostgreSQL , ddos , 拒绝服务 , 锁 , SLOT
背景
连接数据库的过程中,需要数据库有足够的SLOT(连接槽,通过max_connections配置),认证。如果把连接槽位占用,或者在认证过程加锁(使得认证过程被锁),则可以制造DDOS攻击。
占用连接槽的攻击与防范方法:
认证过程加锁的攻击方法,本文会提到。
https://paquier.xyz/postgresql-2/postgres-12-dos-prevention/
BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: lalbin(at)scharp(dot)org
Subject: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Date: 2018-04-30 20:41:11
Message-ID: 152512087100.19803.12733865831237526317@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-bugs pgsql-hackers
The following bug has been logged on the website:
Bug reference: 15182
Logged by: Lloyd Albin
Email address: lalbin(at)scharp(dot)org
PostgreSQL version: 10.3
Operating system: OpenSUSE
Description:
Over the last several weeks our developers caused a Denial of Service Attack
against ourselves by accident. When looking at the log files, I noticed that
we had authentication timeouts during these time periods. In researching the
problem I found this is due to locks being held on shared system catalog
items, aka system catalog items that are shared between all databases on the
same cluster/server. This can be caused by beginning a long running
transaction that queries pg_stat_activity, pg_roles, pg_database, etc and
then another connection that runs either a REINDEX DATABASE, REINDEX SYSTEM,
or VACUUM FULL. This issue is of particular importance to database resellers
who use the same cluster/server for multiple clients, as two clients can
cause this issue to happen inadvertently or a single client can either cause
it to happen maliciously or inadvertently. Note: The large cloud providers
give each of their clients their own cluster/server so this will not affect
across cloud clients but can affect an individual client. The problem is
that traditional hosting companies will have all clients from one or more
web servers share the same PostgreSQL cluster/server. This means that one or
two clients could inadvertently stop all the other clients from being able
to connect to their databases until the first client does either a COMMIT or
ROLLBACK of their transaction which they could hold open for hours, which is
what happened to us internally.
In Connection 1 we need to BEGIN a transaction and then query a shared
system item; pg_authid, pg_database, etc; or a view that depends on a shared
system item; pg_stat_activity, pg_roles, etc. Our developers were accessing
pg_roles.
Connection 1 (Any database, Any User)
BEGIN;
SELECT * FROM pg_stat_activity;
Connection 2 (Any database will do as long as you are the database owner)
REINDEX DATABASE postgres;
Connection 3 (Any Database, Any User)
psql -h sqltest-alt -d sandbox
All future Connection 3's will hang for however long the transaction in
Connection 1 runs. In our case this was hours and denied everybody else the
ability to log into the server until Connection 1 was committed. psql will
just hang for hours, even overnight in my testing, but our apps would get
the "Canceling authentication due to timeout" after 1 minute.
Connection 2 can also do any of these commands to also cause the same
issue:
REINDEX SYSTEM postgres;
VACUUM FULL pg_authid;
vacuumdb -f -h sqltest-alt -d lloyd -U lalbin
Even worse is that the VACUUM FULL pg_authid; can be started by an
unprivileged user and it will wait for the AccessShareLock by connection 1
to be released before returning the error that you don't have permission to
perform this action, so even an unprivileged user can cause this to happen.
The privilege check needs to happen before the waiting for the
AccessExclusiveLock happens.
This bug report has been simplified and shorted drastically. To read the
full information about this issue please see my blog post:
http://lloyd.thealbins.com/Canceling%20authentication%20due%20to%20timeout
Lloyd Albin
Database Administrator
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Fred Hutchinson Cancer Research Center
复现方法如上。
防范
1、对于连接占用DDOS攻击的防范(1,设置认证超时参数。2,不要在公网监听。3,设置网络层防火墙。)
2、对于锁攻击(通常是无意识攻击),建议在操作大锁的SQL前,加锁超时,或者语句超时(尽量减少等待时长)。 (lock_timeout, statement_timeout都可以)
参考
《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》
《PostgreSQL 设置单条SQL的执行超时 - 防雪崩》
https://paquier.xyz/postgresql-2/postgres-12-dos-prevention/