chimerge连续数据离散化算法[MATLAB代码]
chimerge算法[1]用于将连续数据进行离散化,以便于后续的数据处理,比如在某些决策树算法中就需要用离散化后的数据(比如ID3决策树),这里贴出笔者很久以前的MATLAB代码,希望对大家有帮助。
% Author: FesianXu @ UESTC
% Description: The data discrete method based on chi test
% Date: 2017/4/3
%%%
clc
clear all
close all
%% read IRIS datasets
path = 'G:\数据分析集合\Iris dataset\iris.data' ;
[attrib1, attrib2, attrib3, attrib4, class_str] = textread(path, '%f%f%f%f%s', 'delimiter', ',') ;
class_int = zeros(length(attrib1),1) ;
class_int(strcmp(class_str, 'Iris-setosa')) = 1 ;
class_int(strcmp(class_str, 'Iris-versicolor')) = 2 ;
class_int(strcmp(class_str, 'Iris-virginica')) = 3 ;
attrib = [attrib1'; attrib2'; attrib3'; attrib4'; class_int'] ;
attrib = attrib' ;
clear attrib1 attrib2 attrib3 attrib4 class_int class_str path
len = length(attrib(:,1)) ;
att_map = cell(4,1) ;
%% compute four labels's chi value
for i = 1:4pair_att_label = [attrib(:,i), attrib(:,5)] ;pair_att_label = sortrows(pair_att_label, 1) ;inner = 1 ;tmp_map = zeros(1,4) ; % att_value, label1_fre, label2_fre, label3_frefor j = 1:lennum = pair_att_label(j, 1) ;loc = find(tmp_map(:,1) == num) ;if isempty(loc)tmp_map(inner, 1) = num ;tmp_map(inner, pair_att_label(j, 2)+1) = tmp_map(inner, pair_att_label(j, 2)+1)+1 ;inner = inner+1 ;elsetmp_map(loc, pair_att_label(j, 2)+1) = tmp_map(loc, pair_att_label(j, 2)+1)+1 ;endendatt_map{i} = tmp_map ;
end
clear num loc len pair_att_label inner i j tmp_map attrib
%% prepare dataset format att_map
max_interval = 6 ;
for i = 1:4data = att_map{i} ;corrent_interval = length(data(:,1)) ;while corrent_interval > max_intervalchi2_mat = zeros(corrent_interval-1, 1) ;for j = 1:corrent_interval-1chi2 = chi2test(data(j,2:4), data(j+1,2:4)) ;chi2_mat(j) = chi2 ;end[minv, index] = min(chi2_mat) ;merge_loc_1 = index ;merge_loc_2 = index+1 ;data(merge_loc_1, 2:4) = data(merge_loc_1, 2:4)+data(merge_loc_2, 2:4) ; % 合并data(merge_loc_2, :) = [] ; % 除去合并的数据corrent_interval = corrent_interval-1 ;endfprintf('第%d属性的分割点如下:\r\n', i) for loop = 1:6fprintf('No %d: %f\r\n', loop, data(loop, 1))end
end
%%%
% Author: FesianXu @ UESTC
% Description: compute the chi2 value
% Date: 2017/4/3
%%%
function chi2 = chi2test(u, v)
len = length(u) ;
N = sum(u)+sum(v) ;
chi2 = 0 ;
box = [u; v] ;
for i = 1:2for j = 1:lenEij = sum(box(i,:))*sum(box(:, j))/N ;if Eij == 0tmp_chi = 0 ;elsetmp_chi = (box(i,j)-Eij)^2/Eij ;endchi2 = chi2 + tmp_chi ;end
end
Reference
[1]. Kerber, R., 1992, July. Chimerge: Discretization of numeric attributes. In Proceedings of the tenth national conference on Artificial intelligence (pp. 123-128). AAAI Press.
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
