13th Technical Meeting on Plasma Control Systems, Data Management and Remote Experiments in Fusion Research

Name: 13th Technical Meeting on Plasma Control Systems, Data Management and Remote Experiments in Fusion Research
Start: 2021-07-05T08:30:00+02:00
End: 2021-07-08T17:40:00+02:00
Location: No location set

5–8 Jul 2021

Europe/Vienna timezone

The meeting will take place virtually. Information on remote participation will be sent to all in due time.

Contact

M.Barbarino@iaea.org

Correlation based method for sorting and filtering relevant features for unsupervised machine learning

7 Jul 2021, 16:10

10m

Oral Data Acquisition and Signal Processing Data Acquisition and signal processing 1

Dr Felix Hernandez-del-Olmo (UNED)

The operation of fusion devices produces huge amounts of data with high dimensionality that allows developing sophisticated machine learning models to tackle specific problems. In such high-dimensional data space, the feature selection plays an important role to extract useful information.

A high number of features in the data requires dimensionality-reduction techniques before the successful application of data-driven methods. Usually, feature selection techniques are used to select the number of input variables and to identify irrelevant and redundant attributes from data. The right choice of inputs variables is an essential issue before developing a model. It helps in a double sense. On the one hand, it reduces the computational cost of modeling. On the other hand, it improves its performance, efficiency and understanding.

In this paper, a new automatic method to extract the main features in a very high dimensional input space is proposed. Our method is based on correlation measures. It allows reducing the number of features, finding out the most relevant ones. We have simulated series data with 10000 points. The points correspond to random samples of different Gaussian distributions. A 30-dimensional space is simulated whose first 10 components are 10 time series N(µ,σ) with a range of values of µ from [-1000, 1000] and a range of values of σ from [1, 1000], the second 10 components are linear combinations of the previous 10 and the last 10 components are non-linear combinations of the first 10 ones. In this way, signals of 10000 samples within a feature space of dimension 30 are generated. A total number of 500 resamples of these signals have been created to test the method.

The objective of this work is develop a method that sorts, in an automatic and unsupervised way, the most relevant features from the original set of features. To this end, the method computes the correlations among the 30 components of the signals in order to sort them from the less correlated features to the most correlated ones. Once the features are ordered according to their correlations, it is possible to filter out the most correlated dimensions while keeping just a few of the less correlated features.

Speaker's Affiliation	Felix Hernandez-del-Olmo, Department of Artificial Intelligence, UNED, Madrid, Spain
Member State or IGO	Spain

Dr Felix Hernandez-del-Olmo (UNED) Dr Natividad Duro (UNED) Dr Elena Gaudioso (UNED) Dr Raquel Dormido (UNED) Jesús Vega (CIEMAT)

There are no materials yet.

13th Technical Meeting on Plasma Control Systems, Data Management and Remote Experiments in Fusion Research

Contact

Correlation based method for sorting and filtering relevant features for unsupervised machine learning

Speaker

Description

Authors

Presentation materials

Choose timezone

13th Technical Meeting on Plasma Control Systems, Data Management and Remote Experiments in Fusion Research

Contact

Speaker

Description

Authors

Presentation materials