屏蔽词过滤器 1

简介: 屏蔽词过滤器

黑发不知勤学早,白首方恨读书迟。 ——颜真卿

我们可能想使用本地配置词库、过滤器的方式去做全局屏蔽词处理

这里针对三种参数情况

1.requestParam传参:http://localhost:8080/test?keywords=屏蔽词2号

2.requestBody传参:请求体内传json格式的数据,请求头的Content-Typeapplication/json

3.pathvariable传参:http://localhost:8080/test/屏蔽词3号

这三种应该概括了绝大多数情况下参数传递与接收

代码如下:

首先是过滤器

package com.ruben.simplescaffold.filter;
import com.alibaba.fastjson.JSON;
import com.ruben.simplescaffold.filter.wrappers.RequestWrapper;
import com.ruben.simplescaffold.filter.wrappers.ResponseWrapper;
import com.ruben.simplescaffold.utils.sensitive.SensitiveWordUtils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import javax.annotation.Resource;
import javax.servlet.*;
import javax.servlet.annotation.WebFilter;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
/**
 * 屏蔽词过滤器
 *
 * @author <achao1441470436@gmail.com>
 * @since 2021/8/6 17:10
 */
@Slf4j
@Component
@WebFilter(filterName = "SensitiveWordFilter", urlPatterns = "/**")
public class SensitiveWordFilter implements Filter {
    @Resource
    private SensitiveWordUtils sensitiveWordUtils;
    /**
     * The <code>doFilter</code> method of the Filter is called by the container
     * each time a request/response pair is passed through the chain due to a
     * client request for a resource at the end of the chain. The FilterChain
     * passed in to this method allows the Filter to pass on the request and
     * response to the next entity in the chain.
     * <p>
     * A typical implementation of this method would follow the following
     * pattern:- <br>
     * 1. Examine the request<br>
     * 2. Optionally wrap the request object with a custom implementation to
     * filter content or headers for input filtering <br>
     * 3. Optionally wrap the response object with a custom implementation to
     * filter content or headers for output filtering <br>
     * 4. a) <strong>Either</strong> invoke the next entity in the chain using
     * the FilterChain object (<code>chain.doFilter()</code>), <br>
     * 4. b) <strong>or</strong> not pass on the request/response pair to the
     * next entity in the filter chain to block the request processing<br>
     * 5. Directly set headers on the response after invocation of the next
     * entity in the filter chain.
     *
     * @param request  The request to process
     * @param response The response associated with the request
     * @param chain    Provides access to the next filter in the chain for this
     *                 filter to pass the request and response to for further
     *                 processing
     * @throws IOException      if an I/O error occurs during this filter's
     *                          processing of the request
     * @throws ServletException if the processing fails for any other reason
     */
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        ResponseWrapper responseWrapper = new ResponseWrapper((HttpServletResponse) response);
        HttpServletRequest httpServletRequest = (HttpServletRequest) request;
        long startTime = System.nanoTime();
        // 过滤
        RequestWrapper requestWrapper = new RequestWrapper(httpServletRequest, sensitiveWordUtils, null, null);
        String bodyString = RequestWrapper.BodyHelper.getBodyString(requestWrapper);
        String uri = URLDecoder.decode(requestWrapper.getRequestURI(), StandardCharsets.UTF_8.displayName());
        if (sensitiveWordUtils.isContainSensitiveWord(uri, SensitiveWordUtils.MAX_MATCH_TYPE)) {
            requestWrapper.getRequestDispatcher(sensitiveWordUtils.replaceSensitiveWord(uri, SensitiveWordUtils.MAX_MATCH_TYPE, "0")).forward(requestWrapper, response);
            return;
        }
        // 执行
        chain.doFilter(requestWrapper, responseWrapper);
        // 获取response返回的内容并重新写入response
        String result = responseWrapper.getResponseData(response.getCharacterEncoding());
        response.getOutputStream().write(result.getBytes());
        log.info("method:{}", requestWrapper.getMethod());
        log.info("uri:{}", requestWrapper.getRequestURI());
        log.info("parameterMap:{}", JSON.toJSONString(requestWrapper.getParameterMap()));
        log.info("bodyString:{}", bodyString);
        log.info("responseCode:{}", responseWrapper.getStatus());
        log.info("result:{}", result);
        log.info("timeCost:{}", (System.nanoTime() - startTime) / (1000.0 * 1000.0) + "ms");
    }
}

中间穿插了几个类:

第一个是屏蔽词过滤处理的工具类

package com.ruben.simplescaffold.utils.sensitive;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
/**
 * 敏感词过滤
 *
 * @author chenming
 * @version 1.0
 * @since 2014年4月20日 下午4:17:15
 */
@Slf4j
@Component
public class SensitiveWordUtils {
    public static final String REPLACE_CHAR = "*"; // 替换后的字符
    public static int MIN_MATCH_TYPE = 1;      //最小匹配规则
    public static int MAX_MATCH_TYPE = 2;      //最大匹配规则
    @SuppressWarnings("rawtypes")
    private static Map sensitiveWordMap = null;
    @Value("${sensitive.path}")
    private String sensitivePath;
    public SensitiveWordUtils() {
    }
    /**
     * 初始化敏感词库
     */
    @PostConstruct
    void init() {
        sensitiveWordMap = SensitiveWordInit.getInstance().initKeyWord(sensitivePath);
    }
    /**
     * 判断文字是否包含敏感字符
     *
     * @param txt       文字
     * @param matchType 匹配规则 1:最小匹配规则,2:最大匹配规则
     * @return 若包含返回true,否则返回false
     * @author chenming
     * @since 2014年4月20日 下午4:28:30
     */
    public boolean isContainSensitiveWord(String txt, int matchType) {
        boolean flag = false;
        for (int i = 0; i < txt.length(); i++) {
            int matchFlag = this.CheckSensitiveWord(txt, i, matchType); //判断是否包含敏感字符
            if (matchFlag > 0) {    //大于0存在,返回true
                flag = true;
            }
        }
        return flag;
    }
    /**
     * 获取文字中的敏感词
     *
     * @param txt       文字
     * @param matchType 匹配规则&nbsp;1:最小匹配规则,2:最大匹配规则
     * @return 匹配的敏感词
     * @author chenming
     * @since 2014年4月20日 下午5:10:52
     */
    public Set<String> getSensitiveWord(String txt, int matchType) {
        Set<String> sensitiveWordList = new HashSet<>();
        for (int i = 0; i < txt.length(); i++) {
            int length = CheckSensitiveWord(txt, i, matchType);    //判断是否包含敏感字符
            if (length > 0) {    //存在,加入list中
                sensitiveWordList.add(txt.substring(i, i + length));
                i = i + length - 1;    //减1的原因,是因为for会自增
            }
        }
        return sensitiveWordList;
    }
    /**
     * 替换敏感字字符
     *
     * @param txt         字符串
     * @param matchType   匹配规则
     * @param replaceChar 替换字符,默认*
     * @author chenming
     * @since 2014年4月20日 下午5:12:07
     */
    public String replaceSensitiveWord(String txt, int matchType, String replaceChar) {
        String resultTxt = txt;
        Set<String> set = getSensitiveWord(txt, matchType);     //获取所有的敏感词
        Iterator<String> iterator = set.iterator();
        String word;
        String replaceString;
        while (iterator.hasNext()) {
            word = iterator.next();
            replaceString = getReplaceChars(replaceChar, word.length());
            resultTxt = resultTxt.replaceAll(word, replaceString);
        }
        return resultTxt;
    }
    /**
     * 获取替换字符串
     *
     * @param replaceChar 要替换的字符串
     * @param length      长度
     * @return 替换后的字符串
     * @author chenming
     * @since 2014年4月20日 下午5:21:19
     */
    private String getReplaceChars(String replaceChar, int length) {
        StringBuilder resultReplace = new StringBuilder(replaceChar);
        for (int i = 1; i < length; i++) {
            resultReplace.append(replaceChar);
        }
        return resultReplace.toString();
    }
    /**
     * 检查文字中是否包含敏感字符,检查规则如下:<br>
     *
     * @param txt        需要检测的词
     * @param beginIndex 开始下标
     * @param matchType  匹配规则
     * @return 如果存在,则返回敏感词字符的长度,不存在返回0
     * @author chenming
     * @since 2014年4月20日 下午4:31:03
     */
    @SuppressWarnings({"rawtypes"})
    public int CheckSensitiveWord(String txt, int beginIndex, int matchType) {
        boolean flag = false;    //敏感词结束标识位:用于敏感词只有1位的情况
        int matchFlag = 0;     //匹配标识数默认为0
        char word;
        Map nowMap = sensitiveWordMap;
        for (int i = beginIndex; i < txt.length(); i++) {
            word = txt.charAt(i);
            nowMap = (Map) nowMap.get(word);     //获取指定key
            if (nowMap != null) {     //存在,则判断是否为最后一个
                matchFlag++;     //找到相应key,匹配标识+1
                if ("1".equals(nowMap.get("isEnd"))) {       //如果为最后一个匹配规则,结束循环,返回匹配标识数
                    flag = true;       //结束标志位为true
                    if (SensitiveWordUtils.MIN_MATCH_TYPE == matchType) {    //最小规则,直接返回,最大规则还需继续查找
                        break;
                    }
                }
            } else {     //不存在,直接返回
                break;
            }
        }
        if (matchFlag < 2 || !flag) {        //长度必须大于等于1,为词
            matchFlag = 0;
        }
        return matchFlag;
    }
}

这个工具类需要初始化加载词库,因此我们还有一个初始化的类

package com.ruben.simplescaffold.utils.sensitive;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.*;
/**
 * 初始化敏感词库,将敏感词加入到HashMap中,构建DFA算法模型
 *
 * @author : chenming
 * @since 2014年4月20日 下午2:27:06
 */
public class SensitiveWordInit {
    private volatile static SensitiveWordInit lazyMan;//volatile避免指令重排
    private static boolean SINGLE_SIGN = false;//红绿等解决通过反射创建对象(反编译可以破解该方法)
    @SuppressWarnings("rawtypes")
    public HashMap sensitiveWordMap;
    private SensitiveWordInit() {
        synchronized (SensitiveWordInit.class) {
            if (!SINGLE_SIGN) {
                SINGLE_SIGN = true;
            } else {
                throw new RuntimeException("不要试图使用反射破坏单例");
            }
        }
    }
    //双重检测锁模式的  懒汉式单例    DCL懒汉式
    public static SensitiveWordInit getInstance() {
        if (lazyMan == null) {
            lazyMan = new SensitiveWordInit();//不是一个原子性操作
        }
        return lazyMan;
    }
    /**
     * @author chenming
     * @since 2014年4月20日 下午2:28:32
     */
    @SuppressWarnings("rawtypes")
    public Map initKeyWord(String filePath) {
        try {
            //读取敏感词库
            Set<String> keyWordSet = readSensitiveWordFile(filePath);
            //将敏感词库加入到HashMap中
            addSensitiveWordToHashMap(keyWordSet);
            //spring获取application,然后application.setAttribute("sensitiveWordMap",sensitiveWordMap);
        } catch (Exception e) {
            e.printStackTrace();
            return Collections.emptyMap();
        }
        return sensitiveWordMap;
    }
    /**
     * 读取敏感词库,将敏感词放入HashSet中,构建一个DFA算法模型:<br>
     * 中 = {
     * isEnd = 0
     * 国 = {<br>
     * isEnd = 1
     * 人 = {isEnd = 0
     * 民 = {isEnd = 1}
     * }
     * 男  = {
     * isEnd = 0
     * 人 = {
     * isEnd = 1
     * }
     * }
     * }
     * }
     * 五 = {
     * isEnd = 0
     * 星 = {
     * isEnd = 0
     * 红 = {
     * isEnd = 0
     * 旗 = {
     * isEnd = 1
     * }
     * }
     * }
     * }
     *
     * @param keyWordSet 敏感词库
     * @author chenming
     * @since 2014年4月20日 下午3:04:20
     */
    @SuppressWarnings({"rawtypes", "unchecked"})
    private void addSensitiveWordToHashMap(Set<String> keyWordSet) {
        sensitiveWordMap = new HashMap(keyWordSet.size());     //初始化敏感词容器,减少扩容操作
        String key;
        Map nowMap;
        Map<String, String> newWorMap;
        //迭代keyWordSet
        for (String s : keyWordSet) {
            key = s;    //关键字
            nowMap = sensitiveWordMap;
            for (int i = 0; i < key.length(); i++) {
                char keyChar = key.charAt(i);       //转换成char型
                Object wordMap = nowMap.get(keyChar);       //获取
                if (wordMap != null) {        //如果存在该key,直接赋值
                    nowMap = (Map) wordMap;
                } else {     //不存在则,则构建一个map,同时将isEnd设置为0,因为他不是最后一个
                    newWorMap = new HashMap<>();
                    newWorMap.put("isEnd", "0");     //不是最后一个
                    nowMap.put(keyChar, newWorMap);
                    nowMap = newWorMap;
                }
                if (i == key.length() - 1) {
                    nowMap.put("isEnd", "1");    //最后一个
                }
            }
        }
    }
    /**
     * 读取敏感词库中的内容,将内容添加到set集合中
     *
     * @throws FileNotFoundException,NullPointerException 抛出空指针或者文件未找到异常
     * @author chenming
     * @since 2014年4月20日 下午2:31:18
     */
    private Set<String> readSensitiveWordFile(String filePath) throws Exception {
        Set<String> set;
        InputStream inputStream;
        try {
            inputStream = new FileInputStream(filePath);
        } catch (FileNotFoundException e) {
            inputStream = Objects.requireNonNull(ClassLoader.getSystemResourceAsStream("sensitive.txt"));
        }
        try (InputStreamReader read = new InputStreamReader(inputStream, StandardCharsets.UTF_8)) {
            set = new HashSet<>();
            BufferedReader bufferedReader = new BufferedReader(read);
            String txt;
            while ((txt = bufferedReader.readLine()) != null) {    //读取文件,将文件内容放入到set中
                set.add(txt);
            }
        }
        //关闭文件流
        return set;
    }
}

这里注意,首先我们会去找配置文件中配置的sensitive.path

找不到则加载resources下的sensitive.txt

屏蔽词过滤器 2:https://developer.aliyun.com/article/1379262

相关文章
|
Java 容器
28JavaWeb基础 - 过滤器
28JavaWeb基础 - 过滤器
53 0
|
网络协议
Wireshark 捕获和显示过滤器
Wireshark 捕获和显示过滤器
151 0
|
14天前
过滤器链加载原理
过滤器链加载原理
22 0
过滤器链加载原理
|
14天前
|
存储 缓存 安全
常用过滤器介绍
常用过滤器介绍
22 0
|
6月前
|
Java 数据安全/隐私保护
Filter概述、执行流程、拦截路径配置及过滤器链
Filter概述、执行流程、拦截路径配置及过滤器链
84 0
|
API 微服务
全局过滤器 GlobalFilter
全局过滤器 GlobalFilter
265 0
过滤器简介--操作步骤--过滤器生命周期--过滤器匹配规则-- 过滤器链
过滤器简介--操作步骤--过滤器生命周期--过滤器匹配规则-- 过滤器链
65 0
|
监控 Java 数据库连接
过滤器的应用
在上一篇博客中,我们简单的学习了一下面向切面编程,而过滤器就是对这一思想的应用。那如何在项目中使用呢?
|
前端开发 Java 应用服务中间件
GetWay网关过滤器工厂与全局过滤器
GatewayFilter是网关中提供的一种过滤器,可以对进入网关的请求和微服务返回的响应做处理:
353 0