Differences

This shows you the differences between two versions of the page.

--- science_cases:lmsu_science_case [2022/09/12 11:50] – admin
+++ science_cases:lmsu_science_case [2022/09/12 13:44] (current) – admin
@@ Line 37: / Line 37: @@
 |4|Magnetosphere|14.1|
-The boundaries of critical interest - bow shock and magnetopause - are minorities with only 3.7 and 2.3 % representation. The table highlights the **data imbalance issue** that requires investigating special techniques to ensure the predictor does not bias towards the overrepresented classes.
+The boundaries of critical interest - bow shock and magnetopause - are minorities with only 3.7 and 2.3 % representation. The table highlights the **[[:glossary#data imbalance|data imbalance]] issue**that requires investigating special techniques to ensure the [[:glossary#predictor|predictor]] does not bias towards the overrepresented classes.
 As a first step in pre-processing, [[:glossary#feature_selection|feature selection]] was performed to assess the contribution of available [[:glossary#feature|features]] in the estimation of the output. Based on statistical correlations, the magnetic flux features (BX_MSO, BY_MSO, BZ_MSO), spacecraft position coordinates (X_MSO, Y_MSO, Z_MSO) and planetary velocity components (VX, VY, VZ) were found to be most informative. In addition, three meta features, namely EXTREMA, COSALPHA and RHO_DIPOLE, were selected.
-In the feature preparation stage, a sliding window of variable sizes (3 seconds to 3 minutes) with a hop size of 1 second was computed on the time series signal to obtain feature vectors. Finally, the features were normalised to have mean of 0 and a standard deviation of 1. No other pre-processing or [[:glossary#feature_engineering|engineering]] was applied in order to allow the [[:glossary#deep_learning|deep learning]] model to [[:glossary#feature_engineering|engineer features]] implicitly.
+In the feature preparation stage, a sliding window of variable sizes (3 seconds to 3 minutes) with a hop size of 1 second was computed on the time series signal to obtain [[:glossary#feature_vector|feature vectors]]. Finally, the [[:glossary#feature|features]] were normalised to have mean of 0 and a standard deviation of 1. No other pre-processing or [[:glossary#feature_engineering|engineering]] was applied in order to allow the [[:glossary#deep_learning|deep learning]] model to [[:glossary#feature_engineering|engineer features]] implicitly.
-The windowed features are fed first into a block of 3 Convolutional layers with 1D filters, each followed by Batch Normalisation and ReLu activations. The activations obtained at the end of the [[:glossary#convolution_neural_network_cnn|CNN]] block are then passed to the Recurrent block with two layers of [[:glossary#long_short-term_memory_lstm|LSTMs]]. The final activations are then passed to a fully connected layer with softmax activations. The objective function used for training is Categorial cross entropy, with Adam optimizer.
+The windowed features are fed first into a block of 3 [[:glossary#convolutional layer|convolutional layers]] with 1D filters, each followed by [[:glossary#batch_normalisation|batch normalisation]] and [[:glossary#rectified_linear_unit_relu|Rectified Linear Unit (ReLU) activations]]. The [[:glossary#activation_function|activations]] obtained at the end of the [[:glossary#convolution_neural_network_cnn|CNN]] block are then passed to the recurrent block with two layers of [[:glossary#long_short-term_memory_lstm|LSTMs]]. The final activations are then passed to a fully connected layer with softmax activations. The objective function used for training is Categorial cross entropy, with Adam optimizer.
 The window size used in these experiments is 30 seconds. Overall, the [[:glossary#predictor|predictor]]** achieves a macro [[:glossary#f1|F1 score]] of about 80% **on the bow shock and the magnetopause crossings on a randomly sampled test of 300 orbits. None of the orbits overlap in the train and test sets.
-The results from the [[:glossary#active_learning|active learning]] experiment are still not complete. We are currently in the process of documenting them and we will put them forth in a publication soon.
 Results of this science case were presented at the {{:wiki:egu2021-lavrukhin_etal.pdf|EGU21}}  as well as at {{:wiki:epsc2021-mercuryboundaries.pdf|EPSC2021}}. This ML pipeline was presented in a [[https://github.com/epn-ml/EPSC2021-MercuryBoundaries-workshop|workshop at the EPSC2021]] and is available on [[https://github.com/epn-ml/LMSU-Mercury_boundaries|our GitHub repository]]. This work was submitted to and accepted by the ECML PKDD 2022 conference and will be published in the proceedings.