PostgreSQL rename 代码修正风波

本文涉及的产品
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介:

PostgreSQL rename 代码修正风波

作者

digoal

日期

2016-08-30

标签

PostgreSQL , rename , recovery.conf


背景

PostgreSQL的数据目录,包括所有相关的文件,建议的权限是700,owner是启动数据库集群的操作系统用户。

如果权限不对,或者OWNER不对,在打开文件时可能出错,会带来安全隐患,并导致一些不必要的麻烦。

例子

比如PostgreSQL fsync_fname_ext调用,默认会以读写的方式打开文件。

The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR.

These request opening the file read-only, write-only, or read/write, respectively.

        以O_RDWR打开,如果文件权限不正确可能导致权限不足,报错  
    /*  
     * Some OSs require directories to be opened read-only whereas other  
     * systems don't allow us to fsync files opened read-only; so we need both  
     * cases here.  Using O_RDWR will cause us to fail to fsync files that are  
     * not writable by our userid, but we assume that's OK.  
     */  
    flags = PG_BINARY;  
    if (!isdir)  
        flags |= O_RDWR;  
    else  
        flags |= O_RDONLY;  

安全隐患

例如recovery.conf文件,在激活时,需要重命名为recovery.done

相关代码

PostgreSQL的rename封装,重命名前,需要先以O_RDWR模式打开文件(fsync_fname_ext)

/*  
 * durable_rename -- rename(2) wrapper, issuing fsyncs required for durability  
 *  
 * This routine ensures that, after returning, the effect of renaming file  
 * persists in case of a crash. A crash while this routine is running will  
 * leave you with either the pre-existing or the moved file in place of the  
 * new file; no mixed state or truncated files are possible.  
 *  
 * It does so by using fsync on the old filename and the possibly existing  
 * target filename before the rename, and the target file and directory after.  
 *  
 * Note that rename() cannot be used across arbitrary directories, as they  
 * might not be on the same filesystem. Therefore this routine does not  
 * support renaming across directories.  
 *  
 * Log errors with the caller specified severity.  
 *  
 * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not  
 * valid upon return.  
 */  
int  
durable_rename(const char *oldfile, const char *newfile, int elevel)  
{  
        int                     fd;  

        /*  
         * First fsync the old and target path (if it exists), to ensure that they  
         * are properly persistent on disk. Syncing the target file is not  
         * strictly necessary, but it makes it easier to reason about crashes;  
         * because it's then guaranteed that either source or target file exists  
         * after a crash.  
         */  
        if (fsync_fname_ext(oldfile, false, false, elevel) != 0)  
                return -1;  

        fd = OpenTransientFile((char *) newfile, PG_BINARY | O_RDWR, 0);  
        if (fd < 0)  
        {  
                if (errno != ENOENT)  
                {  
                        ereport(elevel,  
                                        (errcode_for_file_access(),  
                                         errmsg("could not open file \"%s\": %m", newfile)));  
                        return -1;  
                }  
        }  

...  

        /* Time to do the real deal... */  
        if (rename(oldfile, newfile) < 0)  
        {  
                ereport(elevel,  
                                (errcode_for_file_access(),  
                                 errmsg("could not rename file \"%s\" to \"%s\": %m",  
                                                oldfile, newfile)));  
                return -1;  
        }  

重命名recovery.conf,调用了这个rename封装的durable_rename

#define RECOVERY_COMMAND_FILE   "recovery.conf"  
#define RECOVERY_COMMAND_DONE   "recovery.done"  


        /*  
         * Rename the config file out of the way, so that we don't accidentally  
         * re-enter archive recovery mode in a subsequent crash.  
         */  
        unlink(RECOVERY_COMMAND_DONE);  
        durable_rename(RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE, FATAL);  

fsync_fname_ext中也同步了这个操作,如果是个文件则以O_RDWR打开

/*  
 * fsync_fname_ext -- Try to fsync a file or directory  
 *  
 * Ignores errors trying to open unreadable files, or trying to fsync  
 * directories on systems where that isn't allowed/required, and logs other  
 * errors at a caller-specified level.  
 */  
static void  
fsync_fname_ext(const char *fname, bool isdir, int elevel)  
{  
    int         fd;  
    int         flags;  
    int         returncode;  

        这里使用O_RDWR打开,可能导致权限不足,报错  
    /*  
     * Some OSs require directories to be opened read-only whereas other  
     * systems don't allow us to fsync files opened read-only; so we need both  
     * cases here.  Using O_RDWR will cause us to fail to fsync files that are  
     * not writable by our userid, but we assume that's OK.  
     */  
    flags = PG_BINARY;  
    if (!isdir)  
        flags |= O_RDWR;  
    else  
        flags |= O_RDONLY;  

    /*  
     * Open the file, silently ignoring errors about unreadable files (or  
     * unsupported operations, e.g. opening a directory under Windows), and  
     * logging others.  
     */  
    fd = OpenTransientFile((char *) fname, flags, 0);    
    if (fd < 0)  
    {  
        if (errno == EACCES || (isdir && errno == EISDIR))  
            return;  
        ereport(elevel,  
                (errcode_for_file_access(),  
                 errmsg("could not open file \"%s\": %m", fname)));    // 权限不够时报错
        return;  
    }  

    returncode = pg_fsync(fd);  

    /*  
     * Some OSes don't allow us to fsync directories at all, so we can ignore  
     * those errors. Anything else needs to be logged.  
     */  
    if (returncode != 0 && !(isdir && errno == EBADF))  
        ereport(elevel,  
                (errcode_for_file_access(),  
                 errmsg("could not fsync file \"%s\": %m", fname)));  

    (void) CloseTransientFile(fd);  
}  

OpenTransientFile是open的封装,在durable_rename中调用时传入的FLAG也包含了O_RDWR

/*  
 * Like AllocateFile, but returns an unbuffered fd like open(2)  
 */  
int  
OpenTransientFile(FileName fileName, int fileFlags, int fileMode)  
{  
        int                     fd;  

        DO_DB(elog(LOG, "OpenTransientFile: Allocated %d (%s)",  
                           numAllocatedDescs, fileName));  

        /* Can we allocate another non-virtual FD? */  
        if (!reserveAllocatedDesc())  
                ereport(ERROR,  
                                (errcode(ERRCODE_INSUFFICIENT_RESOURCES),  
                                 errmsg("exceeded maxAllocatedDescs (%d) while trying to open file \"%s\"",  
                                                maxAllocatedDescs, fileName)));  

        /* Close excess kernel FDs. */  
        ReleaseLruFiles();  

        fd = BasicOpenFile(fileName, fileFlags, fileMode);  

        if (fd >= 0)  
        {  
                AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];  

                desc->kind = AllocateDescRawFD;  
                desc->desc.fd = fd;  
                desc->create_subid = GetCurrentSubTransactionId();  
                numAllocatedDescs++;  

                return fd;  
        }  

        return -1;                                      /* failure */  
}  

BasicOpenFile是OpenTransientFile底层调用, 通过open打开文件

/*  
 * BasicOpenFile --- same as open(2) except can free other FDs if needed  
 *  
 * This is exported for use by places that really want a plain kernel FD,  
 * but need to be proof against running out of FDs.  Once an FD has been  
 * successfully returned, it is the caller's responsibility to ensure that  
 * it will not be leaked on ereport()!  Most users should *not* call this  
 * routine directly, but instead use the VFD abstraction level, which  
 * provides protection against descriptor leaks as well as management of  
 * files that need to be open for more than a short period of time.  
 *  
 * Ideally this should be the *only* direct call of open() in the backend.  
 * In practice, the postmaster calls open() directly, and there are some  
 * direct open() calls done early in backend startup.  Those are OK since  
 * this module wouldn't have any open files to close at that point anyway.  
 */  
int  
BasicOpenFile(FileName fileName, int fileFlags, int fileMode)  
{  
        int                     fd;  

tryAgain:  
        fd = open(fileName, fileFlags, fileMode);  

        if (fd >= 0)  
                return fd;                              /* success! */  

        if (errno == EMFILE || errno == ENFILE)  
        {  
                int                     save_errno = errno;  

                ereport(LOG,  
                                (errcode(ERRCODE_INSUFFICIENT_RESOURCES),  
                                 errmsg("out of file descriptors: %m; release and retry")));  
                errno = 0;  
                if (ReleaseLruFile())  
                        goto tryAgain;  
                errno = save_errno;  
        }  

        return -1;                                      /* failure */  
}  

rename的不靠谱设计?

重命名时不需要检查被重命名文件的owner,任意用户在目录所属owner为当前用户时,就可以对文件进行重命名

man 2 rename

The  renaming  has no safeguards.    
If the user has permission to rewrite file names, the command will perform the action without any questions.    
For example, the result can be quite drastic when the command is run as root in the /lib directory.    
Always make a backup before running the command, unless you truly know what you are doing.    

例子,普通用户重命名超级用户创建的文件

[root@   ~]# cd /tmp  
[root@   tmp]# touch abc  
[root@   tmp]# chmod 600 abc  
[root@   tmp]# ll abc  
-rw------- 1 root root 0 Aug 29 23:33 abc  
[root@   tmp]# su - digoal  
Last login: Mon Aug 29 23:18:41 CST 2016 on pts/1  
[digoal@   ~]$ cd /tmp  
[digoal@   tmp]$ ll abc  
-rw------- 1 root root 0 Aug 29 23:33 abc  
[digoal@   tmp]$ mv abc d  
mv: cannot move ‘abc’ to ‘d’: Operation not permitted  
[digoal@   tmp]$ mv abc e  
mv: cannot move ‘abc’ to ‘e’: Operation not permitted  
[digoal@   tmp]$ mv abc a  
mv: cannot move ‘abc’ to ‘a’: Operation not permitted  
[digoal@   tmp]$ exit  
logout  

[root@   tmp]# cd /home/digoal  
[root@   digoal]# touch abc  
[root@   digoal]# chmod 600 abc  
[root@   digoal]# ll abc  
-rw------- 1 root root 0 Aug 29 23:33 abc  
[root@   digoal]# su - digoal  
Last login: Mon Aug 29 23:33:04 CST 2016 on pts/1  
[digoal@   ~]$ ll abc  
-rw------- 1 root root 0 Aug 29 23:33 abc  
[digoal@   ~]$ mv abc abcd  
[digoal@   ~]$ ll abcd  
-rw------- 1 root root 0 Aug 29 23:33 abcd  

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
3月前
|
关系型数据库 数据库 网络虚拟化
Docker环境下重启PostgreSQL数据库服务的全面指南与代码示例
由于时间和空间限制,我将在后续的回答中分别涉及到“Python中采用lasso、SCAD、LARS技术分析棒球运动员薪资的案例集锦”以及“Docker环境下重启PostgreSQL数据库服务的全面指南与代码示例”。如果你有任何一个问题的优先顺序或需要立即回答的,请告知。
74 0
|
SQL 缓存 网络协议
PostgreSQL 和openGauss错误代码整理(三)
PostgreSQL 和openGauss错误代码整理
816 0
|
SQL 关系型数据库 数据库
PostgreSQL 和openGauss错误代码整理(二)
PostgreSQL 和openGauss错误代码整理
428 0
|
SQL XML 关系型数据库
PostgreSQL 和openGauss错误代码整理(一)
PostgreSQL 和openGauss错误代码整理
517 0
|
弹性计算 关系型数据库 PostgreSQL
PostgreSQL PostGIS 性能提升 - by new GEOS代码
标签 PostgreSQL , PostGIS , geos 背景 http://lin-ear-th-inking.blogspot.com/2019/02/betterfaster-stpointonsurface-for.html 使用GEOS新的代码,提升PostGIS重计算的函数性能。 The improved ST_PointOnSurface runs 13 times
784 0
|
关系型数据库 物联网 PostgreSQL
PostgreSQL技术周刊第16期:PostgreSQL 优化器代码概览
PostgreSQL(简称PG)的开发者们:云栖社区已有5000位PG开发者,发布了3000+PG文章(文章列表),沉淀了700+的PG精品问答(问答列表)。 PostgreSQL技术周刊会为大家介绍最新的PG技术与动态、预告活动、最热问答、直播教程等,欢迎大家订阅PostgreSQL技术周刊。
3550 0
|
SQL 算法 关系型数据库
PostgreSQL 优化器代码概览
## 简介 PostgreSQL 的开发源自上世纪80年代,它最初是 Michael Stonebraker 等人在美国国防部支持下创建的POSTGRE项目。上世纪末,Andrew Yu 等人在它上面搭建了第一个SQL Parser,这个版本称为Postgre95,也是加州大学伯克利分校版本的PostgreSQL的基石[1]。
1670 0
|
Web App开发 监控 Java
|
关系型数据库 PostgreSQL

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版