【鲁棒】使用概率轨迹的鲁棒集成聚类研究(Matlab代码实现)

简介: 【鲁棒】使用概率轨迹的鲁棒集成聚类研究(Matlab代码实现)

💥💥💞💞欢迎来到本博客❤️❤️💥💥


🏆博主优势:🌞🌞🌞博客内容尽量做到思维缜密,逻辑清晰,为了方便读者。


⛳️座右铭:行百里者,半于九十。


📋📋📋本文目录如下:🎁🎁🎁


目录


💥1 概述


📚2 运行结果编辑


🎉3 文献来源


🌈4 Matlab代码实现


💥1 概述

摘要:


尽管近年来已经开发了许多成功的集成聚类方法,但大多数现有方法仍然存在两个局限性。首先,它们大多忽视了不确定环节的问题,这可能会误导整个共识进程。其次,它们通常缺乏整合全球信息以完善本地联系的能力。为了解决这两个限制,本文提出了一种基于稀疏图表示和概率轨迹分析的新型集成聚类方法。特别是,我们提出了精英邻居选择策略,通过局部自适应阈值识别不确定链接,并构建具有少量可能可靠链接的稀疏图。我们认为,与使用所有图链接相比,少量可能可靠的链接可以带来明显更好的共识结果,而不管它们的可靠性如何。利用新的转移概率矩阵驱动的随机游走过程来探索图中的全局信息。通过分析随机游走者的概率轨迹,从稀疏图中推导出一种新颖而密集的相似性度量,并在此基础上进一步提出了两个共识函数。在多个真实数据集上的实验结果证明了我们方法的有效性和效率。


原文摘要:


Abstract:

Although many successful ensemble clustering approaches have been developed in recent years, there are still two limitations to most of the existing approaches. First, they mostly overlook the issue of uncertain links, which may mislead the overall consensus process. Second, they generally lack the ability to incorporate global information to refine the local links. To address these two limitations, in this paper, we propose a novel ensemble clustering approach based on sparse graph representation and probability trajectory analysis. In particular, we present the elite neighbor selection strategy to identify the uncertain links by locally adaptive thresholds and build a sparse graph with a small number of probably reliable links. We argue that a small number of probably reliable links can lead to significantly better consensus results than using all graph links regardless of their reliability. The random walk process driven by a new transition probability matrix is utilized to explore the global information in the graph. We derive a novel and dense similarity measure from the sparse graph by analyzing the probability trajectories of the random walkers, based on which two consensus functions are further proposed. Experimental results on multiple real-world datasets demonstrate the effectiveness and efficiency of our approach.


📚2 运行结果


4f1ab40a558a4ce084bc9d724868e21f.png


主函数代码:

clear all;
close all;
clc;
%% Load the base clustering pool.
% We have generated a pool of 200 candidate base clusterings for each dataset. 
% Please uncomment the dataset that you want to use and comment the other ones.
dataName = 'MF';
% dataName = 'IS';
% dataName = 'MNIST';
% dataName = 'ODR';
% dataName = 'LS';
% dataName = 'PD';
% dataName = 'USPS';
% dataName = 'FC';
% dataName = 'KDD99_10P';
% dataName = 'KDD99';
members = [];
gt = [];
load(fullfile('..','data',['bc_pool_',dataName,'.mat']),'members','gt');
[N, poolSize] = size(members);
trueK = numel(unique(gt));
%% Settings
% Ensemble size M
M = 10;
% How many times the PTA and PTGP algorithms will be run.
cntTimes = 20; 
% You can set cntTimes to a greater (or smaller) integer if you want to run
% the algorithms more (or less) times.
% For each run, M base clusterings will be randomly drawn from the pool.
% Each row in bcIdx corresponds to an ensemble of M base clusterings.
bcIdx = zeros(cntTimes, M);
for i = 1:cntTimes
    tmp = randperm(poolSize);
    bcIdx(i,:) = tmp(1:M);
end
%% Run PTA and PTGP repeatedly.
% Test different numbers of clusters.
clsNums = [2:20, 25:5:50];
clsNums = unique([clsNums,trueK]);
% In general, you can also simply set the number of clusters to the true number of classses.
% clsNums = trueK;
% Scores
outDir = fullfile('..','results');
mkdir(outDir);
nmiScoresBestK_PTA = zeros(cntTimes, 3);
nmiScoresTrueK_PTA = zeros(cntTimes, 3);
nmiScoresBestK_PTGP = zeros(cntTimes, 1);
nmiScoresTrueK_PTGP = zeros(cntTimes, 1);
for runIdx = 1:cntTimes
    disp('**************************************************************');
    disp(['Run ', num2str(runIdx),':']);
    disp('**************************************************************');
    %% Construct the ensemble of M base clusterings
    % baseCls is an N x M matrix, each row being a base clustering.
    baseCls = members(:,bcIdx(runIdx,:));
    %% Produce microclusters
    disp('Produce microclusters ... ');
    tic; [mcBaseCls, mcLabels] = computeMicroclusters(baseCls); toc;
    tilde_N = size(mcBaseCls,1);
    disp('--------------------------------------------------------------');
    %% Compute the microcluster based co-association matrix.
    disp('Compute the MCA matrix ... ');
    tic; MCA = computeMCA(mcBaseCls); toc;
    disp('--------------------------------------------------------------');
    %% Set parameters K and T.
    para.K = floor(sqrt(tilde_N)/2);
    para.T = floor(sqrt(tilde_N)/2);
    %% Compute PTS
    disp('Compute PTS ... ');
    tic; PTS = computePTS_fast(MCA,mcLabels,para); toc;
    disp('--------------------------------------------------------------');
    %% Perform PTA
    disp('Run the PTA algorithm ... '); 
    [mcResultsAL,mcResultsCL,mcResultsSL] = runPTA(PTS, clsNums);
    % The i-th column in results_al\results_cl\results_sl represents the
    % consensus clustering with clsNums(i) clusters by PTA-AL\CL\SL.
    disp('--------------------------------------------------------------');
    %% Perform PTGP 
    disp('Run the PTGP algorithm ... '); 
    mcResultsPTGP = runPTGP(mcBaseCls, PTS, clsNums);     
    disp('--------------------------------------------------------------'); 
    %% Display the clustering results.
    disp('Map microclusters back to objects ... '); tic;
    resultsAL = mapMicroclustersBackToObjects(mcResultsAL, mcLabels);
    resultsCL = mapMicroclustersBackToObjects(mcResultsCL, mcLabels);
    resultsSL = mapMicroclustersBackToObjects(mcResultsSL, mcLabels);
    resultsPTGP = mapMicroclustersBackToObjects(mcResultsPTGP, mcLabels);toc;
    disp('--------------------------------------------------------------');
    disp('##############################################################'); 
    scoresAL = computeNMI(resultsAL,gt);
    scoresCL = computeNMI(resultsCL,gt);
    scoresSL = computeNMI(resultsSL,gt);
    scoresPTGP = computeNMI(resultsPTGP,gt);
    trueKidx = find(clsNums==trueK);
    nmiScoresBestK_PTA(runIdx,:) = [max(scoresAL),max(scoresCL),max(scoresSL)];
    nmiScoresTrueK_PTA(runIdx,:) = [scoresAL(trueKidx),scoresCL(trueKidx),scoresSL(trueKidx)];
    nmiScoresBestK_PTGP(runIdx) = max(scoresPTGP);
    nmiScoresTrueK_PTGP(runIdx) = scoresPTGP(trueKidx);
    disp(['The Scores at Run ',num2str(runIdx)]);
    disp('    ---------- The NMI scores w.r.t. best-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresBestK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresBestK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresBestK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresBestK_PTGP(runIdx))]);
    disp('    ---------- The NMI scores w.r.t. true-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresTrueK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresTrueK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresTrueK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresTrueK_PTGP(runIdx))]);
    disp('##############################################################'); 
    %% Save results
    save(fullfile(outDir,['results_',dataName,'_M',num2str(M),'_',num2str(cntTimes),'runs.mat']),'bcIdx','nmiScoresBestK_PTA','nmiScoresTrueK_PTA','nmiScoresBestK_PTGP','nmiScoresTrueK_PTGP');  
end
disp('**************************************************************');
disp(['   ** Average Performance over ',num2str(cntTimes),' runs on the ',dataName,' dataset **']);
disp(['Data size:     ', num2str(N)]);
disp(['Ensemble size: ', num2str(M)]);
disp('   ---------- Average NMI scores w.r.t. best-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresBestK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresBestK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresBestK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresBestK_PTGP))]);
disp('   ---------- Average NMI scores w.r.t. true-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresTrueK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresTrueK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresTrueK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresTrueK_PTGP))]);
disp('**************************************************************');
disp('**************************************************************');


clear all;
close all;
clc;
%% Load the base clustering pool.
% We have generated a pool of 200 candidate base clusterings for each dataset. 
% Please uncomment the dataset that you want to use and comment the other ones.
dataName = 'MF';
% dataName = 'IS';
% dataName = 'MNIST';
% dataName = 'ODR';
% dataName = 'LS';
% dataName = 'PD';
% dataName = 'USPS';
% dataName = 'FC';
% dataName = 'KDD99_10P';
% dataName = 'KDD99';
members = [];
gt = [];
load(fullfile('..','data',['bc_pool_',dataName,'.mat']),'members','gt');
[N, poolSize] = size(members);
trueK = numel(unique(gt));
%% Settings
% Ensemble size M
M = 10;
% How many times the PTA and PTGP algorithms will be run.
cntTimes = 20; 
% You can set cntTimes to a greater (or smaller) integer if you want to run
% the algorithms more (or less) times.
% For each run, M base clusterings will be randomly drawn from the pool.
% Each row in bcIdx corresponds to an ensemble of M base clusterings.
bcIdx = zeros(cntTimes, M);
for i = 1:cntTimes
    tmp = randperm(poolSize);
    bcIdx(i,:) = tmp(1:M);
end
%% Run PTA and PTGP repeatedly.
% Test different numbers of clusters.
clsNums = [2:20, 25:5:50];
clsNums = unique([clsNums,trueK]);
% In general, you can also simply set the number of clusters to the true number of classses.
% clsNums = trueK;
% Scores
outDir = fullfile('..','results');
mkdir(outDir);
nmiScoresBestK_PTA = zeros(cntTimes, 3);
nmiScoresTrueK_PTA = zeros(cntTimes, 3);
nmiScoresBestK_PTGP = zeros(cntTimes, 1);
nmiScoresTrueK_PTGP = zeros(cntTimes, 1);
for runIdx = 1:cntTimes
    disp('**************************************************************');
    disp(['Run ', num2str(runIdx),':']);
    disp('**************************************************************');
    %% Construct the ensemble of M base clusterings
    % baseCls is an N x M matrix, each row being a base clustering.
    baseCls = members(:,bcIdx(runIdx,:));
    %% Produce microclusters
    disp('Produce microclusters ... ');
    tic; [mcBaseCls, mcLabels] = computeMicroclusters(baseCls); toc;
    tilde_N = size(mcBaseCls,1);
    disp('--------------------------------------------------------------');
    %% Compute the microcluster based co-association matrix.
    disp('Compute the MCA matrix ... ');
    tic; MCA = computeMCA(mcBaseCls); toc;
    disp('--------------------------------------------------------------');
    %% Set parameters K and T.
    para.K = floor(sqrt(tilde_N)/2);
    para.T = floor(sqrt(tilde_N)/2);
    %% Compute PTS
    disp('Compute PTS ... ');
    tic; PTS = computePTS_fast(MCA,mcLabels,para); toc;
    disp('--------------------------------------------------------------');
    %% Perform PTA
    disp('Run the PTA algorithm ... '); 
    [mcResultsAL,mcResultsCL,mcResultsSL] = runPTA(PTS, clsNums);
    % The i-th column in results_al\results_cl\results_sl represents the
    % consensus clustering with clsNums(i) clusters by PTA-AL\CL\SL.
    disp('--------------------------------------------------------------');
    %% Perform PTGP 
    disp('Run the PTGP algorithm ... '); 
    mcResultsPTGP = runPTGP(mcBaseCls, PTS, clsNums);     
    disp('--------------------------------------------------------------'); 
    %% Display the clustering results.
    disp('Map microclusters back to objects ... '); tic;
    resultsAL = mapMicroclustersBackToObjects(mcResultsAL, mcLabels);
    resultsCL = mapMicroclustersBackToObjects(mcResultsCL, mcLabels);
    resultsSL = mapMicroclustersBackToObjects(mcResultsSL, mcLabels);
    resultsPTGP = mapMicroclustersBackToObjects(mcResultsPTGP, mcLabels);toc;
    disp('--------------------------------------------------------------');
    disp('##############################################################'); 
    scoresAL = computeNMI(resultsAL,gt);
    scoresCL = computeNMI(resultsCL,gt);
    scoresSL = computeNMI(resultsSL,gt);
    scoresPTGP = computeNMI(resultsPTGP,gt);
    trueKidx = find(clsNums==trueK);
    nmiScoresBestK_PTA(runIdx,:) = [max(scoresAL),max(scoresCL),max(scoresSL)];
    nmiScoresTrueK_PTA(runIdx,:) = [scoresAL(trueKidx),scoresCL(trueKidx),scoresSL(trueKidx)];
    nmiScoresBestK_PTGP(runIdx) = max(scoresPTGP);
    nmiScoresTrueK_PTGP(runIdx) = scoresPTGP(trueKidx);
    disp(['The Scores at Run ',num2str(runIdx)]);
    disp('    ---------- The NMI scores w.r.t. best-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresBestK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresBestK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresBestK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresBestK_PTGP(runIdx))]);
    disp('    ---------- The NMI scores w.r.t. true-k: ----------    ');
    disp(['PTA-AL : ',num2str(nmiScoresTrueK_PTA(runIdx,1))]);
    disp(['PTA-CL : ',num2str(nmiScoresTrueK_PTA(runIdx,2))]);
    disp(['PTA-SL : ',num2str(nmiScoresTrueK_PTA(runIdx,3))]);
    disp(['PTGP   : ',num2str(nmiScoresTrueK_PTGP(runIdx))]);
    disp('##############################################################'); 
    %% Save results
    save(fullfile(outDir,['results_',dataName,'_M',num2str(M),'_',num2str(cntTimes),'runs.mat']),'bcIdx','nmiScoresBestK_PTA','nmiScoresTrueK_PTA','nmiScoresBestK_PTGP','nmiScoresTrueK_PTGP');  
end
disp('**************************************************************');
disp(['   ** Average Performance over ',num2str(cntTimes),' runs on the ',dataName,' dataset **']);
disp(['Data size:     ', num2str(N)]);
disp(['Ensemble size: ', num2str(M)]);
disp('   ---------- Average NMI scores w.r.t. best-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresBestK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresBestK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresBestK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresBestK_PTGP))]);
disp('   ---------- Average NMI scores w.r.t. true-k: ----------   ');
disp(['PTA-AL : ',num2str(mean(nmiScoresTrueK_PTA(:,1)))]);
disp(['PTA-CL : ',num2str(mean(nmiScoresTrueK_PTA(:,2)))]);
disp(['PTA-SL : ',num2str(mean(nmiScoresTrueK_PTA(:,3)))]);
disp(['PTGP   : ',num2str(mean(nmiScoresTrueK_PTGP))]);
disp('**************************************************************');
disp('**************************************************************');


🎉3 文献来源

部分理论来源于网络,如有侵权请联系删除。


[1]Dong Huang, Jian-Huang Lai, Chang-Dong Wang. Robust Ensemble Clustering Using Probability Trajectories. Ransactions on Knowledge and Data Engineering, 2016, vol.28, no.5, pp.1312-1326.


🌈4 Matlab代码实现


相关文章
|
1月前
|
算法 调度
电网两阶段鲁棒优化调度模型(含matlab程序)
电网两阶段鲁棒优化调度模型(含matlab程序)
|
1月前
|
算法 调度
电动汽车集群并网的分布式鲁棒优化调度matlab
电动汽车集群并网的分布式鲁棒优化调度matlab
|
1月前
|
算法 调度
基于CCG算法的IEEE33配电网两阶段鲁棒优化调度matlab
基于CCG算法的IEEE33配电网两阶段鲁棒优化调度matlab
|
1月前
|
安全 新能源 调度
【两阶段鲁棒】计及需求响应的多能互补微网两阶段鲁棒优化matlab
【两阶段鲁棒】计及需求响应的多能互补微网两阶段鲁棒优化matlab
|
1月前
|
算法
计及需求侧响应日前—日内两阶段鲁棒备用优化(matlab代码)
计及需求侧响应日前—日内两阶段鲁棒备用优化(matlab代码)
|
9月前
|
机器学习/深度学习 传感器 算法
【红外图像】利用红外图像处理技术对不同制冷剂充装的制冷系统进行性能评估(Matlab代码实现)
【红外图像】利用红外图像处理技术对不同制冷剂充装的制冷系统进行性能评估(Matlab代码实现)
|
9月前
|
机器学习/深度学习 传感器 算法
【视频去噪】基于全变异正则化最小二乘反卷积是最标准的图像处理、视频去噪研究(Matlab代码实现)
【视频去噪】基于全变异正则化最小二乘反卷积是最标准的图像处理、视频去噪研究(Matlab代码实现)
|
1月前
|
数据安全/隐私保护
耐震时程曲线,matlab代码,自定义反应谱与地震波,优化源代码,地震波耐震时程曲线
地震波格式转换、时程转换、峰值调整、规范反应谱、计算反应谱、计算持时、生成人工波、时频域转换、数据滤波、基线校正、Arias截波、傅里叶变换、耐震时程曲线、脉冲波合成与提取、三联反应谱、地震动参数、延性反应谱、地震波缩尺、功率谱密度
基于混合整数规划的微网储能电池容量规划(matlab代码)
基于混合整数规划的微网储能电池容量规划(matlab代码)
|
1月前
|
算法 调度
含多微网租赁共享储能的配电网博弈优化调度(含matlab代码)
含多微网租赁共享储能的配电网博弈优化调度(含matlab代码)

热门文章

最新文章