文件比对shell脚本实战(多线程并发shell)-阿里云开发者社区

需求

1、在A、B两个AP的log中筛出某个关键字，将比较结果输出

输入:

A log

2023-02-01 17:13:51.988  INFO 48500 --- [pool-1-thread-1707] c.n.fileloader.service.RabbitMQService   : [PARAM-PRINT] 文件名:A1450AOIH05.TXT 开始行:1007  剩余解析数量:0
2023-02-01 17:13:51.997  INFO 48500 --- [pool-1-thread-2942] c.n.fileloader.service.RabbitMQService   : [PARAM-PRINT] 文件名:L2111SEAI01.TXT 开始行:7267  剩余解析数量:1

B log

2023-02-01 17:13:51.988  INFO 48500 --- [pool-1-thread-1707] c.n.fileloader.service.RabbitMQService   : [PARAM-PRINT] 文件名:A1450AOIH05.TXT 开始行:1008  剩余解析数量:0
2023-02-01 17:13:51.997  INFO 48500 --- [pool-1-thread-2942] c.n.fileloader.service.RabbitMQService   : [PARAM-PRINT] 文件名:L2111SEAI01.TXT 开始行:7267  剩余解析数量:1

输出

如果两个日志文件中的指定key的value相等则为true，否则为false

A1450AOIH05 1007 1008 false

2、A和B 中是否有关键字 “开始解析”

            A B 
C1310MACR02 0 0
C1310MACR02 0 1
C1310MACR02 1 1

3、B 中是否有关键字 “本次解析结束”

          B 
C1310MACR02 0
indexname 1

生成index脚本

#!/bin/bash
# author: ninesun
# date: 2023年2月8日08:55:21
# desc: generate fl redis key
indexprefix=${1-'/202302/08'}
echo '' > /tmp/indexfile
echo '' > /tmp/startline.txt
echo '' > /tmp/lineurl.txt
echo "start generate ..."
# acf
cd /dfs/acf/INDEX/${indexprefix}
for i in `ls | grep -E "^A|^C"`;do echo START_LINE:/INDEX${indexprefix}/$i >> /tmp/startline.txt ;done
for i in `ls | grep -E "^A|^C"`;do echo LINE_URL:/INDEX/202302/02/$i >> /tmp/lineurl.txt ;done
# oc
cd /dfs/oc/INDEX/${indexprefix}
for i in `ls | grep -E "^L"`;do echo LINE_URL:/INDEX${indexprefix}/$i >> /tmp/lineurl.txt ;done
for i in `ls | grep -E "^L"`;do echo START_LINE:/INDEX${indexprefix}/$i >> /tmp/startline.txt ;done
cat /tmp/lineurl.txt | cut -d / -f5| grep -Ev '^$' >> /tmp/indexfile
qty=$(wc -l /tmp/indexfile)
echo "end generate ...,total index: ${qty}"

生成的记过类似于这个. 下面的

shell脚本实现

根据index比较新旧两个log的差异

#!/bin/bash
# date: 2023年2月1日17:48:37
# author: ninesun
# para: 1.old-fl.log 2.new-fl.log 3. indexFile
set -e
pushd `dirname $0` > /dev/null
SCRIPT_PATH=`pwd -P`
popd > /dev/null
SCRIPT_FILE=`basename $0`
if [ $# -eq 3 ];then 
    oldLog=${1-'/dev/null'}
    newLog=$2
    indexFile=$3
else 
    echo "参数有误"
    exit 8
fi
oldtar=/tmp/old-filter
newtar=/tmp/nwe-filter
startparsekey="开始解析:"
parseendkey="本次解析结束"
part1key="文件名:"
part1key2=" 开始行"
strparseoldtar=/tmp/old-strparse
strparsenewtar=/tmp/nwe-strparse
parseendtar=/tmp/parseend-tar
parseendresult=/tmp/parseend-result
echo '' >${oldtar}
echo '' >${newtar}
echo '' >${strparsenewtar}
echo '' >${strparseoldtar}
echo '' >${parseendtar}
echo '' > /tmp/fl-com-result
echo '' > /tmp/fl-com-result-parse
echo '' > ${parseendresult}
echo "begin filter" 
date +"%F %T"
while read line;do
    # 遍历A old log
    oldv=$(grep -E ${part1key}${line} ${oldLog}  | tail -n1 | awk '{print $(NF-2),$(NF-1)}' | tr -dc '0-9.a-zA-Z' | sed 's/txt/txt,/gi')
    # 判空
    [[ -n ${oldv} ]] && echo ${oldv} >> ${oldtar} || echo "${line},NA" >>${oldtar}  
    # 遍历B new log
    newv=$(grep -E ${part1key}${line}  ${newLog}  | tail -n1  |awk '{print $(NF-2),$(NF-1)}' | tr -dc '0-9.a-zA-Z' | sed 's/txt/txt,/gi')
    # 判空
    [[ -n ${newLog} ]] && echo ${newv} >> ${newtar} ||  echo "${line},NA" >>${newtar} 
    start_parse_line=$(echo ${line} | cut -d . -f1)
done < ${indexFile}
date +"%F %T"
echo "END filter"
function merge () {
    for old in `cat ${oldtar}`;do
        for new in `cat ${newtar}`;do
            #echo "old:$old new:$new"
            indexOld=`echo ${old} | cut -d , -f1`
            indexOldVal=`echo ${old} | cut -d , -f2`
            indexNew=`echo ${new} | cut -d , -f1`
            indexNewVal=`echo ${new} | cut -d , -f2`
            #echo "${indexOld} ${indexOldVal} ${indexNew} ${indexNewVal}"
             if [[ ${indexOld} == ${indexNew} && ${indexOldVal} == ${indexNewVal} ]];then
                echo "${indexOld},${indexOldVal},${indexNewVal},true"  >>/tmp/fl-com-result
                break;
             fi                
             if [[ ${indexOld} == ${indexNew} && ${indexOldVal} != ${indexNewVal} ]];then
                 echo "${indexOld},${indexOldVal},${indexNewVal},false"  >>/tmp/fl-com-result
                 break;
             fi
        done
    done
}
function begin-parse () {
    grep -E ${startparsekey} ${oldLog}  | awk '{print $NF}'  | awk -F : '{print $NF}' >> ${strparseoldtar}
    grep -E ${startparsekey} ${newLog}  | awk '{print $NF}'  | awk -F : '{print $NF}' >> ${strparsenewtar} 
    for old in `cat ${strparseoldtar}`;do
         #echo "old:$old"
         # grep ${old} ${strparsenewtar} |wc -l
         if [[ `grep ${old} ${strparsenewtar} |wc -l` -eq 1 ]];then
            echo "${old},1,1"  >>/tmp/fl-com-result-parse
         else
             echo "${old},1,0"  >>/tmp/fl-com-result-parse
         fi
    done
}
function parseend(){
    grep -E ${parseendkey} ${newLog}  | awk '{print $NF}'  | awk -F / '{print $NF}' >> ${parseendtar}   
    for line in `cat ${parseendtar}`;do 
       #echo $line
      if [[ `grep ${line} ${indexFile} |wc -l` -eq 1 ]];then
        echo "${line},1" >> ${parseendresult}
      else
        echo "${line},0" >> ${parseendresult}
      fi   
    done  
}
echo "----------------------------------------------"
echo "part1 begin"
date +"%F %T"
merge
date +"%F %T"
echo "part1 end,please check,path is : /tmp/fl-com-result"
echo "----------------------------------------------"
echo "part2 begin"
date +"%F %T"
begin-parse
date +"%F %T"
echo "part2 end,please check,path is : /tmp/fl-com-result-parse"
echo "----------------------------------------------"
echo "part3 begin"
date +"%F %T"
parseend
date +"%F %T"
echo "part3 end,please check,path is : /tmp/parseend-result"

输出字段解释

index,Alog 行号，Blog行号，是否一致

运行脚本：

第1、2个参数是输入的AP日志

第3个参数是变量的indexList(你理解为一个关键字就可以)

bash fl-compare.sh info.log.2023-02-02.12.log info.log.2023-02-02.0.log indexfile

begin filter
2023-02-06 15:07:13
2023-02-06 15:11:31
END filter
----------------------------------------------
part1 begin
2023-02-06 15:11:31
2023-02-06 16:21:23
part1 end,please check,path is : /tmp/fl-com-result
----------------------------------------------
part2 begin
2023-02-06 16:21:23
2023-02-06 16:22:26
part2 end,please check,path is : /tmp/fl-com-result-parse
----------------------------------------------
part3 begin
2023-02-06 16:22:26
2023-02-06 16:22:37
part3 end,please check,path is : /tmp/parseend-result

多线程并发shell实现

测试的log大概100行，而正式区的log大约在100W行左右，甚至更多。

当在正式区单个日志大概158W，也就是300 W 的级别文本比较。

这时候如果你的机器是多个核心，就可以充分利用并发处理来加快速度。

以下代码为优化后的多线程并发脚本

#!/bin/bash
# date: 2023年2月1日17:48:37
# author: ninesun
# para: 1.old-fl.log 2.new-fl.log 3. indexFile
set -e
pushd `dirname $0` > /dev/null
SCRIPT_PATH=`pwd -P`
popd > /dev/null
SCRIPT_FILE=`basename $0`
if [ $# -eq 4 ];then 
    oldLog=${1-'/dev/null'}
    newLog=$2
    indexFile=$3
    threadCount=${4-'5'} # 默认五个线程
else 
    echo "参数有误"
    exit 8
fi
rm -rf /tmp/fl.fifo
fifoname=/tmp/fl.fifo
mkfifo ${fifoname}
exec 8<> ${fifoname}
echo "thread count is total ${threadCount}"
for line in `seq ${threadCount}`;do
        echo >&8
done
oldtar=/tmp/old-filter
newtar=/tmp/nwe-filter
startparsekey="开始解析:"
parseendkey="本次解析结束"
part1key="文件名:"
part1key2=" 开始行"
strparseoldtar=/tmp/old-strparse
strparsenewtar=/tmp/nwe-strparse
parseendtar=/tmp/parseend-tar
parseendresult=/tmp/parseend-result
echo '' >${oldtar}
echo '' >${newtar}
echo '' >${strparsenewtar}
echo '' >${strparseoldtar}
echo '' >${parseendtar}
echo '' > /tmp/fl-com-result
echo '' > /tmp/fl-com-result-parse
echo '' > ${parseendresult}
echo "begin filter" 
date +"%F %T"
while read line;do
     read -u 8 # 从文件描述符8中读取一行
     {
            # 遍历A old log
        oldv=$(grep -E ${part1key}${line} ${oldLog}  | tail -n1 | awk '{print $(NF-2),$(NF-1)}' | tr -dc '0-9.a-zA-Z' | sed 's/txt/txt,/gi')
        # 判空
        [[ -n ${oldv} ]] && echo ${oldv} >> ${oldtar} || echo "${line},NA" >>${oldtar}  
        # 遍历B new log
        newv=$(grep -E ${part1key}${line}  ${newLog}  | tail -n1  |awk '{print $(NF-2),$(NF-1)}' | tr -dc '0-9.a-zA-Z' | sed 's/txt/txt,/gi')
        # 判空
        [[ -n ${newLog} ]] && echo ${newv} >> ${newtar} ||  echo "${line},NA" >>${newtar} 
        start_parse_line=$(echo ${line} | cut -d . -f1)
        echo >&8 
     } &
done < ${indexFile}
# exec 8>&-
wait
date +"%F %T"
echo "END filter"
function merge () {
    for old in `cat ${oldtar}`;do
        read -u 8
        {
        for new in `cat ${newtar}`;do
                    #echo "old:$old new:$new"
                    indexOld=`echo ${old} | cut -d , -f1`
                    indexOldVal=`echo ${old} | cut -d , -f2`
                    indexNew=`echo ${new} | cut -d , -f1`
                    indexNewVal=`echo ${new} | cut -d , -f2`
                    #echo "${indexOld} ${indexOldVal} ${indexNew} ${indexNewVal}"
                    if [[ ${indexOld} == ${indexNew} && ${indexOldVal} == ${indexNewVal} ]];then
                        echo "${indexOld},${indexOldVal},${indexNewVal},true"  >>/tmp/fl-com-result
                        break;
                    fi                
                    if [[ ${indexOld} == ${indexNew} && ${indexOldVal} != ${indexNewVal} ]];then
                        echo "${indexOld},${indexOldVal},${indexNewVal},false"  >>/tmp/fl-com-result
                        break;
                    fi
                done
            echo >&8                
        }&        
    done
    exec 8>&-
    wait
}
function begin-parse () {
    grep -E ${startparsekey} ${oldLog}  | awk '{print $NF}'  | awk -F : '{print $NF}' >> ${strparseoldtar}
    grep -E ${startparsekey} ${newLog}  | awk '{print $NF}'  | awk -F : '{print $NF}' >> ${strparsenewtar} 
    for old in `cat ${strparseoldtar}`;do
         #echo "old:$old"
         # grep ${old} ${strparsenewtar} |wc -l
         if [[ `grep ${old} ${strparsenewtar} |wc -l` -eq 1 ]];then
            echo "${old},1,1"  >>/tmp/fl-com-result-parse
         else
             echo "${old},1,0"  >>/tmp/fl-com-result-parse
         fi
    done
}
function parseend(){
    grep -E ${parseendkey} ${newLog}  | awk '{print $NF}'  | awk -F / '{print $NF}' >> ${parseendtar}   
    for line in `cat ${parseendtar}`;do 
       #echo $line
      if [[ `grep ${line} ${indexFile} |wc -l` -eq 1 ]];then
        echo "${line},1" >> ${parseendresult}
      else
        echo "${line},0" >> ${parseendresult}
      fi   
    done  
}
echo "----------------------------------------------"
echo "part1 begin"
date +"%F %T"
merge
date +"%F %T"
echo "part1 end,please check,path is : /tmp/fl-com-result"
echo "----------------------------------------------"
echo "part2 begin"
date +"%F %T"
begin-parse
date +"%F %T"
echo "part2 end,please check,path is : /tmp/fl-com-result-parse"
echo "----------------------------------------------"
echo "part3 begin"
date +"%F %T"
parseend
date +"%F %T"
echo "part3 end,please check,path is : /tmp/parseend-result"

性能比较

测试机器配置:

48c256g

以下测试是并发20的case

20代表并发数

bash fl-compare-multiprocess.sh info.log.2023-02-02.0.log info.log.2023-02-02.12.log indexfile 20

cpu大致能打到50%.

运行输出

thread count is total 20
begin filter
2023-02-06 15:35:07
2023-02-06 15:35:23
END filter
----------------------------------------------
part1 begin
2023-02-06 15:35:23
2023-02-06 15:41:07
part1 end,please check,path is : /tmp/fl-com-result
----------------------------------------------
part2 begin
2023-02-06 15:41:07
2023-02-06 15:42:25
part2 end,please check,path is : /tmp/fl-com-result-parse
----------------------------------------------
part3 begin
2023-02-06 15:42:25
2023-02-06 15:42:25
part3 end,please check,path is : /tmp/parseend-result

40代表并发数测试

bash fl-compare-multiprocess.sh info.log.2023-02-02.0.log info.log.2023-02-02.12.log indexfile 40

cpu大致能打到90%往上

从prometheus监控看到的结果

重新开启一个窗口看进程数变化

]#while true;do ps -ef | grep "fl-compare-multiprocess.sh info"  | grep -v grep |wc -l; sleep 1s;done
21
20
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
....

优化前后性能比对

20个thread的情况下大致是10倍的速度提升.

参考

parallel工具

文件比对shell脚本实战(多线程并发shell)

需求

生成index脚本

shell脚本实现

多线程并发shell实现

性能比较

优化前后性能比对

参考

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

文件比对shell脚本实战(多线程并发shell)

需求

生成index脚本

shell脚本实现

多线程并发shell实现

性能比较

优化前后性能比对

参考

热门文章

最新文章

相关课程

相关电子书