Adblock plus规则管理类FilterManager

简介: Adblock plus的文档http://adblockplus.org/en/documentation这里介绍了很多信息,其中http://adblockplus.org/en/faq_internal#filters介绍了如何快速查找规则,我也按照这种方式实现了一个HashMap来管理这些规...

Adblock plus的文档
http://adblockplus.org/en/documentation
这里介绍了很多信息,其中
http://adblockplus.org/en/faq_internal#filters
介绍了如何快速查找规则,我也按照这种方式实现了一个HashMap来管理这些规则,

#ifndef FILTERMANAGER_H

#define FILTERMANAGER_H

#include "PlatformString.h"

#include <wtf/Vector.h>

#include "StringHash.h"

#include <wtf/HashMap.h>

#include <wtf/HashSet.h>

#include "KURL.h"

//#define ADB_NO_QT_DEBUG

namespace WebCore {

       /*

     匹配类型,目前暂时只支持,scriptimagestylesheet,以及third_party,

        */

       #define FILTER_TYPE_SCRIPT 0x0001

       #define FILTER_TYPE_IMAGE 0X0002

       #define FILTER_TYPE_BACKGROUND 0x0004

       #define FILTER_TYPE_STYLESHEET 0X0008

       #define FILTER_TYPE_OBJECT 0X0010

       #define FILTER_TYPE_XBL 0X0020 //不会支持

       #define FILTER_TYPE_PING 0X0040

       #define FILTER_TYPE_XMLHTTPREQUEST 0x0080

       #define FILTER_TYPE_OBJECT_SUBREQUEST 0X0100

       #define FILTER_TYPE_DTD 0X0200

       #define FILTER_TYPE_SUBDOCUMENT 0X0400

       #define FILTER_TYPE_DOCUMENT 0X0800

       #define FILTER_TYPE_ELEMHIDE 0X1000

       #define FILTER_TYPE_THIRD_PARTY 0x2000

//     #define FILTER_TYPE_DOMAIN 0X4000

//     #define FILTER_TYPE_MATCH_CASE 0X8000

//     #define FILTER_TYPE_COLLAPSE 0x10000

       typedef unsigned int FilterType;

       typedef Vector<String> StringVector;

       class FilterRule;

       class HideRule;

       class FilterRuleList;

       class HideRuleList;

       //只应该有一个实例,

       /*

        这里需要考虑的是保证该类是多线程安全的,正常查询可以保证

        只是动态删除以及添加时如何保证多线程安全,内部适用map来管理各种规则

        或者hash来管理。

        */

       class FilterManager {

              //typedef HashMap<String,FilterRuleList* , CaseFoldingHash > FilterRuleMap;

              typedef HashMap<String,HideRuleList* ,CaseFoldingHash> HideRuleMap;

              typedef Vector<FilterRule *> FilterRuleVector;

              class FilterRuleMap: public HashMap<String,FilterRuleList* , CaseFoldingHash > {

            HashSet<unsigned int > unMatchRules;

              public:

                     ~FilterRuleMap();

             //prepare to start find

            inline void prepareStartFind() { this->unMatchRules.clear();}

            // release resource

            //inline void endFind() {}

            bool doFilter(const KURL & mainURL,const String & key,const KURL & url,FilterType t);

              };

       private:

              HideRuleMap hiderules;

              FilterRuleMap m_ShortcutWhiteRules; //white list, can use shortcut

              FilterRuleVector m_UnshortcutWhiteRules;

              FilterRuleMap m_ShortcutFilterRules;

              FilterRuleVector m_UnshortcutFilterRules;

              FilterRuleVector m_AllFilterRules;

              Vector<HideRule * > m_AllHideRules;

       private:

              /*

               从文件读取规则,string要是有qt的隐含共享就好了,webkit使用的string

               就是隐含共享,可以直接传值

               */

              FilterManager(const String & filename);

              //规则集合

              FilterManager(const StringVector & rules);

       public:

              static FilterManager* getManager(const String & filename);

              static FilterManager * getManager(const StringVector & rules);

              ~FilterManager();

              bool addRule(String rule);

              //哪个规则,运行时不能隐藏,只能删除

              bool hideRule(int id);

              /*

               是否应该过滤,

               目前暂不考虑类型匹配,因为类型信息无法获取

               因为很多规则无法明确知道,比如background,必须来自css的请求,目前无法确知

               */

              /*

               * Besides of translating filters into regular expressions Adblock Plus also

tries to extract text information from them. What it needs is a unique

string of eight characters (a “shortcut”) that must be present in every

address matched by the filter (the length is arbitrary, eight just seems

reasonable here). For example, if you have a filter |http://ad.* then

Adblock Plus has the choice between “http://a”, “ttp://ad” and “tp://ad.”,

any of these strings will always be present in whatever this filter will

match. Unfortunately finding a shortcut for filters that simply don’t have

eight characters unbroken by wildcards or for filters that have been

specified as regular expressions is impossible.

All shortcuts are put into a lookup table, Adblock Plus can find the filter

by its shortcut very efficiently. Then, when a specific address has to be

tested Adblock Plus will first look for known shortcuts there (this can be

done very fast, the time needed is almost independent from the number of

shortcuts). Only when a shortcut is found the string will be tested against

the regular expression of the corresponding filter. However, filters

without a shortcut still have to be tested one after another which is slow.

To sum up: which filters should be used to make a filter list fast? You

should use as few regular expressions as possible, those are always slow.

You also should make sure that simple filters have at least eight

characters of unbroken text (meaning that these don’t contain any

characters with a special meaning like *), otherwise they will be just as

slow as regular expressions. But with filters that qualify it doesn’t

matter how many filters you have, the processing time is always the same.

That means that if you need 20 simple filters to replace one regular

expression then it is still worth it. Speaking of which — the deregifier is

very recommendable.

               */

        bool shouldFilter(const KURL & mainURL,const KURL & url, FilterType t=0);

              //使用webkit内部的指针管理办法来管理返回值?

              //根据域名来确定适用的css规则,如果不支持的css规则,暂时忽略.

              String cssrules(const String & domain);

       private:

              void addRule(FilterRule * r);

              void addRule(HideRule * r);

       };

}

#endif // FILTERMANAGER_H

  

目录
相关文章
|
5月前
|
安全 前端开发 测试技术
安全开发-PHP应用&模版引用&Smarty渲染&MVC模型&数据联动&RCE安全&TP框架&路由访问&对象操作&内置过滤绕过&核心漏洞
安全开发-PHP应用&模版引用&Smarty渲染&MVC模型&数据联动&RCE安全&TP框架&路由访问&对象操作&内置过滤绕过&核心漏洞
调用百度文字转语音接口实现自动报时的解决方案
调用百度文字转语音接口实现自动报时的解决方案
104 0
|
XML Java API
网站是怎么屏蔽脏话的呢:简单学会SpringBoot项目敏感词、违规词过滤方案
一个社区最重要的就是交流氛围与审查违规,而这两者都少不了对于敏感词进行过滤的自动维护措施。基于这样的措施,我们才能基本保证用户在使用社区的过程中,不至于被敏感违规词汇包围,才能够正常的进行发布帖子和评论,享受美好的社区氛围。目前,对于 springboot 项目也有较为成熟的敏感词过滤方案。
491 0
网站是怎么屏蔽脏话的呢:简单学会SpringBoot项目敏感词、违规词过滤方案
Revit二次开发—ISelectionFilter接口过滤用户选择
Revit二次开发—ISelectionFilter接口过滤用户选择
【Nest教程】为项目增加个自定义过滤器
【Nest教程】为项目增加个自定义过滤器
204 0
【Nest教程】为项目增加个自定义过滤器
|
SQL 数据采集 存储
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(四)
我们现在来到了纳税服务系统的信息发布管理模块,首先我们跟着原型图来进行需求分析
132 0
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(四)
|
前端开发 Java
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(一)
我们现在来到了纳税服务系统的信息发布管理模块,首先我们跟着原型图来进行需求分析
189 0
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(一)
|
前端开发 Java
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(二)
我们现在来到了纳税服务系统的信息发布管理模块,首先我们跟着原型图来进行需求分析
216 0
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(二)
|
SQL 数据库
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(三)
我们现在来到了纳税服务系统的信息发布管理模块,首先我们跟着原型图来进行需求分析
249 0
纳税服务系统六(信息发布管理模块)【Ueditor、异步信息交互、抽取BaseService、条件查询、分页】(三)
|
Web App开发
AdBlock - 广告屏蔽插件
AdBlock是一款chrome中非常著名的广告屏蔽插件,其在chrome浏览器中的安装量已经高达4000万,如此多的安装用户已经足以证明其的强大之处,对于AdBlock本身来说,其号称能够屏蔽整个互联网上的广告,这是一款非常自信的说话,但不得不承认,AdBlock已经实现了!虽然这么多年,AdBlock Plus已经有一种超越AdBlock的趋势,但是两者还是势均力敌的状态。
915 0