【鲁棒】使用概率轨迹的鲁棒集成聚类研究(Matlab代码实现)

简介: 【鲁棒】使用概率轨迹的鲁棒集成聚类研究(Matlab代码实现)

💥💥💞💞欢迎来到本博客❤️❤️💥💥


🏆博主优势:🌞🌞🌞博客内容尽量做到思维缜密,逻辑清晰,为了方便读者。


⛳️座右铭:行百里者,半于九十。


📋📋📋本文目录如下:🎁🎁🎁


目录


💥1 概述


📚2 运行结果编辑


🎉3 文献来源


🌈4 Matlab代码实现


💥1 概述

摘要:


尽管近年来已经开发了许多成功的集成聚类方法,但大多数现有方法仍然存在两个局限性。首先,它们大多忽视了不确定环节的问题,这可能会误导整个共识进程。其次,它们通常缺乏整合全球信息以完善本地联系的能力。为了解决这两个限制,本文提出了一种基于稀疏图表示和概率轨迹分析的新型集成聚类方法。特别是,我们提出了精英邻居选择策略,通过局部自适应阈值识别不确定链接,并构建具有少量可能可靠链接的稀疏图。我们认为,与使用所有图链接相比,少量可能可靠的链接可以带来明显更好的共识结果,而不管它们的可靠性如何。利用新的转移概率矩阵驱动的随机游走过程来探索图中的全局信息。通过分析随机游走者的概率轨迹,从稀疏图中推导出一种新颖而密集的相似性度量,并在此基础上进一步提出了两个共识函数。在多个真实数据集上的实验结果证明了我们方法的有效性和效率。


原文摘要:


Abstract:

Although many successful ensemble clustering approaches have been developed in recent years, there are still two limitations to most of the existing approaches. First, they mostly overlook the issue of uncertain links, which may mislead the overall consensus process. Second, they generally lack the ability to incorporate global information to refine the local links. To address these two limitations, in this paper, we propose a novel ensemble clustering approach based on sparse graph representation and probability trajectory analysis. In particular, we present the elite neighbor selection strategy to identify the uncertain links by locally adaptive thresholds and build a sparse graph with a small number of probably reliable links. We argue that a small number of probably reliable links can lead to significantly better consensus results than using all graph links regardless of their reliability. The random walk process driven by a new transition probability matrix is utilized to explore the global information in the graph. We derive a novel and dense similarity measure from the sparse graph by analyzing the probability trajectories of the random walkers, based on which two consensus functions are further proposed. Experimental results on multiple real-world datasets demonstrate the effectiveness and efficiency of our approach.


📚2 运行结果


4f1ab40a558a4ce084bc9d724868e21f.png


主函数代码:

clear all;
close all;
clc;
%% Load the base clustering pool.
% We have generated a pool of 200 candidate base clusterings for each dataset. 
% Please uncomment the dataset that you want to use and comment the other ones.
dataName = 'MF';
% dataName = 'IS';
% dataName = 'MNIST';
% dataName = 'ODR';
% dataName = 'LS';
% dataName = 'PD';
% dataName = 'USPS';
% dataName = 'FC';
% dataName = 'KDD99_10P';
% dataName = 'KDD99';
members = [];
gt = [];
load(fullfile('..','data',['bc_pool_',dataName,'.mat']),'members','gt');
[N, poolSize] = size(members);
trueK = numel(unique(gt));
%% Settings
% Ensemble size M
M = 10;
% How many times the PTA and PTGP algorithms will be run.
cntTimes = 20; 
% You can set cntTimes to a greater (or smaller) integer if you want to run
% the algorithms more (or less) times.
% For each run, M base clusterings will be randomly drawn from the pool.
% Each row in bcIdx corresponds to an ensemble of M base clusterings.
bcIdx = zeros(cntTimes, M);
for i = 1:cntTimes
    tmp = randperm(poolSize);
    bcIdx(i,:) = tmp(1:M);
end
%% Run PTA and PTGP repeatedly.
% Test different numbers of clusters.
clsNums = [2:20, 25:5:50];
clsNums = unique([clsNums,trueK]);
% In general, you can also simply set the number of clusters to the true number of classses.
% clsNums = trueK;
% Scores
outDir = fullfile('..','results');
mkdir(outDir);
nmiScoresBestK_PTA = zeros(cntTimes, 3);
nmiScoresTrueK_PTA = zeros(cntTimes, 3);
nmiScoresBestK_PTGP = zeros(cntTimes, 1);
nmiScoresTrueK_PTGP = zeros(cntTimes, 1);
for runIdx = 1:cntTimes
    disp('**************************************************************');
    disp(['Run ', num2str(runIdx),':']);
    disp('**************************************************************');
    %% Construct the ensemble of M base clusterings
    % baseCls is an N x M matrix, each row being a base clustering.
    baseCls = members(:,bcIdx(runIdx,:));
    %% Produce microclusters
    disp('Produce microclusters ... ');
    tic; [mcBaseCls, mcLabels] = computeMicroclusters(baseCls); toc;
    tilde_N = size(mcBaseCls,1);
    disp('--------------------------------------------------------------');
    %% Compute the microcluster based co-association matrix.
    disp('Compute the MCA matrix ... ');
    tic; MCA = computeMCA(mcBaseCls); toc;
    disp('--------------------------------------------------------------');
    %% Set parameters K and T.
    para.K = floor(sqrt(tilde_N)/2);
    para.T = floor(sqrt(tilde_N)/2);
    %% Compute PTS
    disp('Compute PTS ... ');
    tic; PTS = computePTS_fast(MCA,mcLabels,para); toc;
    disp('--------------------------------------------------------------');
    %% Perform PTA
    disp('Run the PTA algorithm ... '); 
    [mcResultsAL,mcResultsCL,mcResultsSL] = runPTA(PTS, clsNums);
    % The i-th column in results_al\results_cl\results_sl represents the
    % consensus clustering with clsNums(i) clusters by PTA-AL\CL\SL.
    disp('--------------------------------------------------------------');
    %% Perform PTGP 
    disp('Run the PTGP algorithm ... '); 
    mcResultsPTGP = runPTGP(mcBaseCls, PTS, clsNums);     
    disp('--------------------------------------------------------------'); 
    %% Display the clustering results.
    disp('Map microclusters back to objects ... '); tic;
    resultsAL = mapMicroclustersBackToObjects(mcResultsAL, mcLabels);
    resultsCL = mapMicroclustersBackToObjects(mcResultsCL, mcLabels);
    resultsSL = mapMicroclustersBackToObjects(mcResultsSL, mcLabels);
    resultsPTGP = mapMicroclustersBackToObjects(mcResultsPTGP, mcLabels);toc;
    disp('--------------------------------------------------------------');
    disp('##############################################################'); 
    scoresAL = computeNMI(resultsAL,gt);
    scoresCL = computeNMI(resultsCL,gt);
    scoresSL = computeNMI(resultsSL,gt);
    scoresPTGP = computeNMI(resultsPTGP,gt);
    trueKidx = find(clsNums==trueK);
    nmiScoresBestK_PTA(runIdx,:) = [max(scoresAL),max(scoresCL),max(scoresSL)];
    nmiScoresTrueK_PTA(runIdx,:) = [scoresAL(trueKidx),scoresCL(trueKidx),scoresSL(trueKidx)];
    nmiScoresBestK_PTGP(runIdx) = max(scoresPTGP);
    nmiScoresTrueK_PTGP(runIdx) = scoresPTGP(trueKidx);
    disp(['The Scores at Run ',num2str(runIdx)]);
    disp('    ---------- The NMI scores w.r.t. best-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresBestK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresBestK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresBestK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresBestK_PTGP(runIdx))]);
    disp('    ---------- The NMI scores w.r.t. true-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresTrueK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresTrueK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresTrueK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresTrueK_PTGP(runIdx))]);
    disp('##############################################################'); 
    %% Save results
    save(fullfile(outDir,['results_',dataName,'_M',num2str(M),'_',num2str(cntTimes),'runs.mat']),'bcIdx','nmiScoresBestK_PTA','nmiScoresTrueK_PTA','nmiScoresBestK_PTGP','nmiScoresTrueK_PTGP');  
end
disp('**************************************************************');
disp(['   ** Average Performance over ',num2str(cntTimes),' runs on the ',dataName,' dataset **']);
disp(['Data size:     ', num2str(N)]);
disp(['Ensemble size: ', num2str(M)]);
disp('   ---------- Average NMI scores w.r.t. best-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresBestK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresBestK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresBestK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresBestK_PTGP))]);
disp('   ---------- Average NMI scores w.r.t. true-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresTrueK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresTrueK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresTrueK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresTrueK_PTGP))]);
disp('**************************************************************');
disp('**************************************************************');


clear all;
close all;
clc;
%% Load the base clustering pool.
% We have generated a pool of 200 candidate base clusterings for each dataset. 
% Please uncomment the dataset that you want to use and comment the other ones.
dataName = 'MF';
% dataName = 'IS';
% dataName = 'MNIST';
% dataName = 'ODR';
% dataName = 'LS';
% dataName = 'PD';
% dataName = 'USPS';
% dataName = 'FC';
% dataName = 'KDD99_10P';
% dataName = 'KDD99';
members = [];
gt = [];
load(fullfile('..','data',['bc_pool_',dataName,'.mat']),'members','gt');
[N, poolSize] = size(members);
trueK = numel(unique(gt));
%% Settings
% Ensemble size M
M = 10;
% How many times the PTA and PTGP algorithms will be run.
cntTimes = 20; 
% You can set cntTimes to a greater (or smaller) integer if you want to run
% the algorithms more (or less) times.
% For each run, M base clusterings will be randomly drawn from the pool.
% Each row in bcIdx corresponds to an ensemble of M base clusterings.
bcIdx = zeros(cntTimes, M);
for i = 1:cntTimes
    tmp = randperm(poolSize);
    bcIdx(i,:) = tmp(1:M);
end
%% Run PTA and PTGP repeatedly.
% Test different numbers of clusters.
clsNums = [2:20, 25:5:50];
clsNums = unique([clsNums,trueK]);
% In general, you can also simply set the number of clusters to the true number of classses.
% clsNums = trueK;
% Scores
outDir = fullfile('..','results');
mkdir(outDir);
nmiScoresBestK_PTA = zeros(cntTimes, 3);
nmiScoresTrueK_PTA = zeros(cntTimes, 3);
nmiScoresBestK_PTGP = zeros(cntTimes, 1);
nmiScoresTrueK_PTGP = zeros(cntTimes, 1);
for runIdx = 1:cntTimes
    disp('**************************************************************');
    disp(['Run ', num2str(runIdx),':']);
    disp('**************************************************************');
    %% Construct the ensemble of M base clusterings
    % baseCls is an N x M matrix, each row being a base clustering.
    baseCls = members(:,bcIdx(runIdx,:));
    %% Produce microclusters
    disp('Produce microclusters ... ');
    tic; [mcBaseCls, mcLabels] = computeMicroclusters(baseCls); toc;
    tilde_N = size(mcBaseCls,1);
    disp('--------------------------------------------------------------');
    %% Compute the microcluster based co-association matrix.
    disp('Compute the MCA matrix ... ');
    tic; MCA = computeMCA(mcBaseCls); toc;
    disp('--------------------------------------------------------------');
    %% Set parameters K and T.
    para.K = floor(sqrt(tilde_N)/2);
    para.T = floor(sqrt(tilde_N)/2);
    %% Compute PTS
    disp('Compute PTS ... ');
    tic; PTS = computePTS_fast(MCA,mcLabels,para); toc;
    disp('--------------------------------------------------------------');
    %% Perform PTA
    disp('Run the PTA algorithm ... '); 
    [mcResultsAL,mcResultsCL,mcResultsSL] = runPTA(PTS, clsNums);
    % The i-th column in results_al\results_cl\results_sl represents the
    % consensus clustering with clsNums(i) clusters by PTA-AL\CL\SL.
    disp('--------------------------------------------------------------');
    %% Perform PTGP 
    disp('Run the PTGP algorithm ... '); 
    mcResultsPTGP = runPTGP(mcBaseCls, PTS, clsNums);     
    disp('--------------------------------------------------------------'); 
    %% Display the clustering results.
    disp('Map microclusters back to objects ... '); tic;
    resultsAL = mapMicroclustersBackToObjects(mcResultsAL, mcLabels);
    resultsCL = mapMicroclustersBackToObjects(mcResultsCL, mcLabels);
    resultsSL = mapMicroclustersBackToObjects(mcResultsSL, mcLabels);
    resultsPTGP = mapMicroclustersBackToObjects(mcResultsPTGP, mcLabels);toc;
    disp('--------------------------------------------------------------');
    disp('##############################################################'); 
    scoresAL = computeNMI(resultsAL,gt);
    scoresCL = computeNMI(resultsCL,gt);
    scoresSL = computeNMI(resultsSL,gt);
    scoresPTGP = computeNMI(resultsPTGP,gt);
    trueKidx = find(clsNums==trueK);
    nmiScoresBestK_PTA(runIdx,:) = [max(scoresAL),max(scoresCL),max(scoresSL)];
    nmiScoresTrueK_PTA(runIdx,:) = [scoresAL(trueKidx),scoresCL(trueKidx),scoresSL(trueKidx)];
    nmiScoresBestK_PTGP(runIdx) = max(scoresPTGP);
    nmiScoresTrueK_PTGP(runIdx) = scoresPTGP(trueKidx);
    disp(['The Scores at Run ',num2str(runIdx)]);
    disp('    ---------- The NMI scores w.r.t. best-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresBestK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresBestK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresBestK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresBestK_PTGP(runIdx))]);
    disp('    ---------- The NMI scores w.r.t. true-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresTrueK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresTrueK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresTrueK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresTrueK_PTGP(runIdx))]);
    disp('##############################################################'); 
    %% Save results
    save(fullfile(outDir,['results_',dataName,'_M',num2str(M),'_',num2str(cntTimes),'runs.mat']),'bcIdx','nmiScoresBestK_PTA','nmiScoresTrueK_PTA','nmiScoresBestK_PTGP','nmiScoresTrueK_PTGP');  
end
disp('**************************************************************');
disp(['   ** Average Performance over ',num2str(cntTimes),' runs on the ',dataName,' dataset **']);
disp(['Data size:     ', num2str(N)]);
disp(['Ensemble size: ', num2str(M)]);
disp('   ---------- Average NMI scores w.r.t. best-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresBestK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresBestK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresBestK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresBestK_PTGP))]);
disp('   ---------- Average NMI scores w.r.t. true-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresTrueK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresTrueK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresTrueK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresTrueK_PTGP))]);
disp('**************************************************************');
disp('**************************************************************');


🎉3 文献来源

部分理论来源于网络,如有侵权请联系删除。


[1]Dong Huang, Jian-Huang Lai, Chang-Dong Wang. Robust Ensemble Clustering Using Probability Trajectories. Ransactions on Knowledge and Data Engineering, 2016, vol.28, no.5, pp.1312-1326.


🌈4 Matlab代码实现


相关文章
|
2月前
|
分布式计算 测试技术 Spark
通过Langchain实现大模型完成测试用例生成的代码(可集成到各种测试平台)
通过Langchain实现大模型完成测试用例生成的代码(可集成到各种测试平台)
1042 0
|
8月前
|
Java 数据库连接 数据库
利用spring将mybatils集成,从而减少代码
利用spring将mybatils集成,从而减少代码
|
2月前
|
SQL Java 流计算
Flink CDC在代码里面集成cdc的时候,是不是也要用上面这个胖包flink-sql-connector-mysql-cdc,不要去用瘦包flink-connector-mysql-cdc? com.ververica flink-sql-connector-mysql-cdc 2.4.0
Flink CDC在代码里面集成cdc的时候,是不是也要用上面这个胖包flink-sql-connector-mysql-cdc,不要去用瘦包flink-connector-mysql-cdc? com.ververica flink-sql-connector-mysql-cdc 2.4.0
61 1
|
13天前
|
传感器 算法
ANC主动降噪理论及Matlab代码实现
ANC主动降噪理论及Matlab代码实现
|
2月前
|
监控 Java 测试技术
持续集成与持续部署:原理、实践与代码示例
持续集成与持续部署:原理、实践与代码示例
49 3
|
2月前
|
Dart 前端开发 Android开发
【Flutter前端技术开发专栏】Flutter与原生代码的集成与交互
【4月更文挑战第30天】本文探讨了如何在Flutter中集成和交互原生代码,以利用特定平台的API和库。当需要访问如蓝牙、特定支付SDK或复杂动画时,集成原生代码能提升效率和性能。集成方法包括:使用Platform Channel进行通信,借助现有Flutter插件,以及Android和iOS的Embedding。文中通过一个电池信息获取的例子展示了如何使用`MethodChannel`在Dart和原生代码间传递调用。这些技术使开发者能充分利用原生功能,加速开发进程。
【Flutter前端技术开发专栏】Flutter与原生代码的集成与交互
|
2月前
|
测试技术
发票查验,发票采集,免验证码,批量查验,系统集成,代码分享之一
发票查验-免验证码,支持批量查验,系统集成,代码分享,有测试环境,从发票采集到发票查验再到查验结果应用的闭环实现
84 0
Matlab|【免费】基于半不变量的概率潮流计算
Matlab|【免费】基于半不变量的概率潮流计算
|
2月前
|
机器学习/深度学习 并行计算 算法
MATLAB|【免费】概率神经网络的分类预测--基于PNN的变压器故障诊断
MATLAB|【免费】概率神经网络的分类预测--基于PNN的变压器故障诊断
基于混沌集成决策树的电能质量复合扰动识别(matlab代码)
基于混沌集成决策树的电能质量复合扰动识别(matlab代码)

热门文章

最新文章

相关实验场景

更多