Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica
New approaches are necessary to generate performance models in current systems due the het erogeneity found in new systems. An alternative to traditional analytical models could be the use of machine learning algorithms, which may help to automatically create performance models to predict the correct configuration for one or multiple application’s parameters. To be able to build performance models, metrics are used as inputs to calculate or select the proper values for one or multiple parameters which can impact performance. The selection of the correct metrics is important as information can be redundant or insufficient. In addition, multiple scenarios should be taken into consideration when generating models, such as different problem sizes, to obtain the behaviour under different conditions, which allows to generalize the relationships between metrics and avoid relationships tailored to only one scenario. In this thesis we tackle the two previously explained problems for multi-thread applications using OpenMP with the development of two methodologies. First, a methodology to find the proper set of metrics for characterizing the behaviour of a parallel code region is developed. Through the use of this methodology the number of metrics necessary to correctly characterize an application or a code region is reduced, decreasing the overhead when measuring all the necessary metrics. We have decided to use hardware performance counters as metrics to characterize the execution of OpenMP parallel regions. Using this methodology the number of hardware performance counters was reduced to less than half the available general purpose list of available counters while avoiding loss of information. The second methodology is developed to build a representative and balanced dataset of patterns found in parallel applications. Given a set of candidate parallel regions to be included in a dataset for performance tuning, each candidate is compared against the patterns already included in the dataset to find whether they cover, or not, a different region of the search space. This comparison is based in the correlation analysis of the metrics measured for the candidate. For example, in one of the tested systems, a dataset was generated with only 8 patterns from 33 parallel kernels extracted from STREAM and PolyBench benchmarks. The previously generated dataset becomes imbalanced when used for performance tuning because in a system some parameters’ values generally provide better performance than other values. Consequently, machine learning algorithms may under-perform due to underrepresented cases and techniques to counter the natural imbalance are necessary. An initial study is provided to find which machine learning algorithms provide better accu racy for tuning the number of threads. This study includes: a) data methods to balance the dataset for the target parameter; b) algorithmic methods to modify how the error is calculated; and c) ensemble methods, the combination of multiple models into a bigger one, providing a general hypothesis from each individual model.
Aprenentatge automàtic; Aprendizaje automático; Machine learning; Eines de rendiment; Herramientas de rendimiento; Performance tools; Computació d'altes prestacions; Computación de altas prestaciones; High performance computing
004 - Computer science
Ciències Experimentals