文件与目录差异对比方法-阿里云开发者社区

环境：

Python 2.6.6

linux系统

用到的模块：filecmp

filecmp提供了：单文件对比，多文件对比，目录对比

单文件对比：采用filecmp.cmp（f1，f2[,shallow]）方法，比较文件名为f1和f2的文件的内容，相同返回True,不相同返回False，shallow默认是True，意思是只根据os.stat()方法返回的文件基本信息进行对比，比如最后访问时间、修改时间、状态改变时间等，会忽略文件内容的对比。当shallow为False时，则os.stat()与文件内容同时进行校验。

例子：

 
        >>> 
        import 
        filecmp 
       
        >>> filecmp.
        cmp
        (
        "/root/dir1/f1"
        ,
        "/root/dir2/f1"
        ) 
       
        True 
       
        >>> filecmp.
        cmp
        (
        "/root/dir1/f1"
        ,
        "/root/dir2/f5"
        ) 
       
        False

多文件对比：采用filecmp.cmpfiles(dir1, dir2, common[, shallow])方法，对比dir1与dir2目录给定的文件清单。该方法返回文件名的三个列表，分别为匹配、不匹配、错误。匹配为包含匹配的文件的列表，不匹配反之，错误列表包括了目录不存在文件、不具备读权限或其他原因导致的不能比较的文件清单。

例子：

先建立一些文件：

代码：

 
   
    
      
      
        >>> filecmp.cmpfiles(
        "/root/dir1"
        ,
        "/root/dir2"
        ,[
        'f1'
        ,
        'f2'
        ,
        'f3'
        ,
        'f4'
        ,
        'f5'
        ]) 
       
 
        ([
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ], [], [
        'f5'
        ]) 
       
 
    
 
   
 

目录对比：通过dircmp(a, b[, ignore[, hide]])类创建一个目录比较对象，其中a和b是参加比较的目录名。ignore代表文件名忽略的列表，并默认为['RCS', 'CVS', 'tags']；hide代表隐藏的列表，默认为[os.curdir，os.pardir]。dircmp类可以获得目录比较的详细信息，如只有在a目录中包括的文件、a与b都存在的子目录、匹配的文件等，同时支持递归。

例子：

 
   
    
      
      
        #!/usr/bin/env python 
       
 
        #-*—coding:utf-8-*- 
       
 
        #2017,9,7 
       

           
       
 
        import 
        filecmp 
       

           
       
 
        a 
        = 
        "/root/dir1" 
        #定义左目录 
       
 
        b 
        = 
        "/root/dir2" 
        #定义右目录 
       

           
       
 
        dirobj 
        = 
        filecmp.dircmp(a,b,[
        '1.py'
        ]) 
        #目录比较，忽略1.py文件。 
       

           
       
 
        dirobj.report() 
        #比较当前指定目录中的内容 
       
 
        dirobj.report_full_closure() 
        #递归比较所有指定目录的内容 
       
 
        dirobj.report_partial_closure() 
        #比较当前指定目录以及第一级目录中的内容 
       

           
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "left_list:" 
        + 
        str
        (dirobj.left_list) 
        #左目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "right_list:"
        +
        str
        (dirobj.right_list) 
        #右目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "commom:"
        +
        str
        (dirobj.common) 
        #两边共同存在的目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "left_only:"
        +
        str
        (dirobj.left_only) 
        #只在左目录中的文件或者目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "right_only:"
        +
        str
        (dirobj.right_only)  
        #只在右目录中的文件或者目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "common_dirs:"
        +
        str
        (dirobj.common_dirs)
        #两边目录都存在的子目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "common_files:"
        +
        str
        (dirobj.common_files) 
        #两边目录都存在的文件 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "common_funny:"
        +
        str
        (dirobj.common_funny) 
        #两边目录都存在的目录 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "same_files:"
        +
        str
        (dirobj.same_files) 
        #匹配相同的文件 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "diff_files:" 
        + 
        str
        (dirobj.diff_files) 
        #不匹配的文件 
       
 
        print 
        "_"
        *
        50 
       
 
        print 
        "funny_files:" 
        + 
        str
        (dirobj.diff_files) 
        #两边目录中都存在，但是无法比较的文件 
       
 
    
 
   
 

执行结果：

 
   
    
      
      
        diff 
        /root/dir1 
        /root/dir2 
       
 
        Only 
        in 
        /root/dir2 
        : [
        'f5'
        ] 
       
 
        Identical files : [
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        diff 
        /root/dir1 
        /root/dir2 
       
 
        Only 
        in 
        /root/dir2 
        : [
        'f5'
        ] 
       
 
        Identical files : [
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        diff 
        /root/dir1 
        /root/dir2 
       
 
        Only 
        in 
        /root/dir2 
        : [
        'f5'
        ] 
       
 
        Identical files : [
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        __________________________________________________ 
       
 
        left_list:[
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        __________________________________________________ 
       
 
        right_list:[
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        , 
        'f5'
        ] 
       
 
        __________________________________________________ 
       
 
        commom:[
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        __________________________________________________ 
       
 
        left_only:[] 
       
 
        __________________________________________________ 
       
 
        right_only:[
        'f5'
        ] 
       
 
        __________________________________________________ 
       
 
        common_dirs:[] 
       
 
        __________________________________________________ 
       
 
        common_files:[
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        __________________________________________________ 
       
 
        common_funny:[] 
       
 
        __________________________________________________ 
       
 
        same_files:[
        'f1'
        , 
        'f2'
        , 
        'f3'
        , 
        'f4'
        ] 
       
 
        __________________________________________________ 
       
 
        diff_files:[] 
       
 
        __________________________________________________ 
       
 
        funny_files:[] 
       
 
    
 
   
 

实践：效验源与备份目录差异

源代码：

 
        #!/usr/bin/env python 
       
        #coding:utf-8 
       
        #2017,9,7 
       
        import 
        os 
       
        import 
        sys 
       
        import 
        filecmp 
       
        import 
        re 
       
        import 
        shutil 
       
        holderlist
        =
        [] 
       
        def 
        compareme(dir1,dir2): 
       
        dircomp 
        = 
        filecmp.dircmp(dir1,dir2) 
       
        only_in_one 
        = 
        dircomp.left_only 
        #源目录新文件或目录（只在左目录中的文件或者目录） 
       
        diff_in_one 
        = 
        dircomp.diff_files 
        #不匹配文件，源目录文件已经发生变化 
       
        dirpath 
        = 
        os.path.abspath(dir1) 
        #获取源目录的绝对路径。 
       
        #将更新文件名或者目录追加到holderlist 
       
        [holderlist.append(os.path.abspath(os.path.join(dir1,x))) 
        for 
        x 
        in 
        only_in_one] 
       
        [holderlist.append(os.path.abspath(os.path.join(dir1,x))) 
        for 
        x 
        in 
        diff_in_one] 
       
        if 
        len
        (dircomp.common_dirs) > 
        0
        : 
       
        for 
        item 
        in 
        dircomp.common_dirs: 
       
        return 
        holderlist 
       
        def 
        main(): 
       
        if 
        len
        (sys.argv) > 
        2
        : 
       
        dir1 
        = 
        sys.argv[
        1
        ] 
       
        dir2 
        = 
        sys.argv[
        2
        ] 
       
        else
        : 
       
        print 
        "usage:"
        ,sys.argv[
        0
        ],
        "datadir backupdir" 
       
        sys.exit() 
       
        source_files 
        = 
        compareme(dir1,dir2) 
        #对比源目录与备份目录 
       
        dir1 
        = 
        os.path.abspath(dir1) 
       
        if 
        not 
        dir2.endswith(
        '/'
        ): 
       
        dir2 
        = 
        dir2
        +
        '/' 
       
        dir2 
        = 
        os.path.abspath(dir2) 
       
        destination_files 
        = 
        [] 
       
        createdir_bool 
        =
        False 
       
        for 
        item 
        in 
        source_files: 
        #遍历返回的差异文件或者目录清单 
       
        destination_dir 
        = 
        re.sub(dir1,dir2,item) 
        #将源目录差异路径清单对应替换成备份目录 
       
        destination_files.append(destination_dir) 
       
        if 
        os.path.isdir(item):           
        #如果差异路径为目录且不存在，则在备份目录中创建 
       
        if 
        not 
        os.path.exists(destination_dir): 
       
        os.makedirs(destination_dir) 
       
        createdir_bool 
        = 
        True 
        #再一次调用compareme函数标记 
       
        if 
        createdir_bool: 
       
        destination_files 
        = 
        [] 
       
        source_files
        =
        [] 
       
        source_files
        =
        compareme(dir1,dir2) 
       
        for 
        item 
        in 
        source_files: 
       
        destination_dir 
        = 
        re.sub(dir1,dir2,item) 
       
        destination_files.append(destination_dir) 
       
        print 
        "updata item:" 
       
        print 
        source_files 
        #输出更新列表清单 
       
        copy_pair 
        = 
        zip
        (source_files,destination_files) 
        #将源目录与备份目录文件拆分成元组 
       
        for 
        item 
        in 
        copy_pair: 
       
        if 
        os.path.isfile(item[
        0
        ]): 
        #判断是否为文件，是则进行复制 
       
        shutil.copyfile(item[
        0
        ],item[
        1
        ]) 
       
        if 
        __name__
        =
        =
        '__main__'
        : 
       
        main()

执行结果：

总结\注意\拓展：

总结：

本次学习不仅学会了文件备份的效验，而且学习到了一个语法：

 
        [holderlist.append(os.path.abspath(os.path.join(dir1,x))) 
        for 
        x 
        in 
        only_in_one] 
       
        #正常写法 
       
        for 
        x 
        in 
        only_in_one: 
       
        holderlist.append(os.path.abspath(os.path.join(dir1,x)))

注意：

在写程序的时候一定要注意缩进，不然容易报错，我在写下面这句的时候就老报语法错误，做后让别人打一遍，我复制一下才成功:

 
        return 
        holderlist

拓展：

re.sub是re模块重要的组成部分，并且功能也非常强大，主要功能实现正则的替换

re.sub定义：
sub(pattern, repl, string, count=0, flags=0)

解释：

pattern：为表示正则中的模式字符串，
repl为replacement，被替换的内容，repl可以是字符串，也可以是函数。
string为正则表达式匹配的内容。
count：由于正则表达式匹配到的结果是多个，使用count来限定替换的个数（顺序为从左向右），默认值为0，替换所有的匹配到的结果。
flag是匹配模式，可以使用按位或’|’表示同时生效，也可以在正则表达式字符串中指定。

例子：

 
        >>>
        import 
        re 
       
        >>>re.sub(r
        '\w+'
        ,
        '10'
        ,
        "ji 43 af,geq"
        ,
        2
        ,flags
        =
        re.I) 
       
        >>>
        '10 10 af,geq'