摘要 目前提出的用于检测变量间相关关系的方法,如最大信息系数(Maximal Information Coefficient, MIC),多应用于成对变量,却很少用于三元变量或更高元变量间的相关性检测。基于此,该文提出能够检测多元变量间相关关系的新方法最大信息熵(Maximal Information Entropy, MIE)。对于k元变量,首先基于任意两变量间的MIC值构造最大信息矩阵,然后根据最大信息矩阵计算最大信息熵来度量变量间的相关度。仿真实验结果表明MIE能够检测三元变量间的1维流形依赖关系,真实数据集上的实验验证了MIE的实用性。
Abstract:Many measures, e.g., Maximal Information Coefficient (MIC), are presented to identify interesting correlations for pairs of variables, but few for triplets or even for higher dimension variable set. Based on that, the Maximal Information Entropy (MIE) is proposed for measuring the general correlation of a multivariable data set. For k variables, firstly, the maximal information matrix is constructed according to the MIC scores of any pairs of variables; then, maximal information entropy, which measures the correlation degree of the concerned k variables, is calculated based on the maximal information matrix. The simulation experimental results show that MIE can detect one-dimensional manifold dependence of triplets. The applications to real datasets further verify the feasibility of MIE.