====== Glossary of machine learning terms ====== ===== A ===== ==== accuracy ==== Percentage of correct [[:glossary#prediction|predictions]] by a [[:glossary#classification|classification]] model. It is defined as $$ \frac{TP + TN}{TP + FP + FN + TN}\,.$$ TP…[[:glossary#true_positive_tp|true positive]], TN…[[:glossary#true_negative_tn|true negative]], FP…[[:glossary#false_positive_fp|false positive]], FN…[[:glossary#false_negative_fn|false negative]] ==== activation function ==== A function that defines the output of a layer in a [[:glossary#neural_network|neural network]] given an input from the previous layer (e.g. [[:glossary#rectified_linear_unit_relu|ReLU]]). ==== active learning ==== An ML approach in which the algorithm chooses the data to learn from. An active learning approach is particularly useful when there is a lot of unlabeled data and manual labeling is very expensive. Often, the number of examples to learn from is lower than when blindly seeking a diverse range of labeled examples in normal [[:glossary#supervised_learning|supervised learning]]. ===== B ===== ==== batch normalisation ==== A method that makes the training of a [[:glossary#deep_neural_network|deep neural network]] faster and more stable. It consists of normalising the input and ouput of an [[:glossary#activation_function|activation function]] in a [[:glossary#hidden_layer|hidden layer]]. ===== C ===== ==== class ==== One of a set of target values for a [[:glossary#label|label]]. ==== classification ==== The [[:glossary#prediction|prediction]] of a model is a category, i.e. a discrete [[:glossary#class|class]]. ==== clustering ==== Grouping of data, particulary during [[:glossary#unsupervised_learning|unsupervised learning]]. There exist many clustering algorithms. ==== convolutional layer ==== A layer in a [[:glossary#deep_neural_network|deep neural network]] in which a convolutional filter passes over the input matrix. ==== convolutional neural network (CNN) ==== A neural network in which at least one layer is a [[:glossary#convolutional_layer|convolutional layer]]. ==== cross-validation ==== A method to estimate how well a model will generalise to new data. In cross-validation, the model is trained on a subset of the data and then validated on the remaining non-overlapping subsets, e.g. [[:glossary#k-fold_cross-validation|k-fold cross-validation]]. ===== D ===== ==== data imbalance ==== When the [[:glossary#label|labels]] of the [[:glossary#class|classes]] have significantly different statistical distributions in the data set. It is also termed class-imbalanced data set. ==== deep learning ==== ==== deep neural network ==== A type of [[:glossary#neural_network|neural network]] containing multiple [[:glossary#hidden_layer|hidden layers]]. ===== E ===== ==== early stopping ==== ==== epoch ==== Describes the number of times the algorithm sees the whole data set. ===== F ===== ==== F1 ==== ==== false negative (FN) ==== ==== false positive (FN) ==== ==== false positive rate (FPR) ==== ==== feature ==== An input variable for making [[:glossary#prediction|predictions]]. ==== feature engineering ==== The process of converting data into useful [[:glossary#feature|features]] for training a model. ==== feature selection ==== The process of selecting relevant [[:glossary#feature|features]] from a data set. ==== feature vector ==== A list of [[:glossary#feature|features]] passed into a model. ===== G ===== ===== H ===== ==== hidden layer ==== Artificial layer in a [[:glossary#neural_network|neural network]] between input and output layer. Typically, hidden layers contain [[:glossary#activation_function|activation functions]]. ==== hierarchical agglomerative clustering ==== A clustering approach that creates a tree of clusters, specifially well-suited for hierarchically organised data. In a first step, the algorithm assigns a cluster to each example. In a second step, it merges the closest clusters to create a hierarchical tree. ==== hyperparameters ==== Higher-level properties of a model, such as the learning rate (how fast it can learn) or the number of [[:glossary#hidden_layer|hidden layers]]. ===== I ===== ===== J ===== ===== K ===== ==== k-fold cross-validation ==== The training set is split into k smaller subsets. The model is trained on one of the k folds as training set and validated on the remaining (k-1) folds. This is done for all k folds. The performance measure calculated by the k-fold cross-validation is the average of the results of all k folds. ===== L ===== ==== label ==== ==== long short-term memory (LSTM) ==== ==== loss ==== ===== M ===== ==== model ==== ==== multi-class classification ==== ===== N ===== ==== neural network ==== ===== P ===== ==== precision ==== ==== prediction ==== Output of a [[:glossary#model|model]]. ==== predictor ==== ===== R ===== ==== recall ==== ==== rectified linear unit (ReLU) ==== An [[:glossary#activation_function|activation function]] defined as follows: * If the input is negative or zero, the ouput is zero. * if the input is positive, the output is equal to the input. ===== S ===== ==== supervised learning ==== A [[:glossary#label|labeled data set]] is used to train a [[:glossary#model|model]]. ===== T ===== ==== true negative (TN) ==== ==== true positive (TP) ==== ===== U ===== ==== unsupervised learning ====