Reinforcement Learning per Controllo di Sistemi Meccatronici

- The objective of this thesis work, made in collaboration with Siemens Digital Industry Software, is to investigate the applicability of Reinforcement Learning techniques for the control of complex systems and to try to understand how the choice of different algorithms and hyperparameters affect their performance. Being a relatively innovative application of this technology, there is no certainity that this will work as expected, nor a clear method to proceed towards the optimization of the algorithms. - In the first part of the paper, the theoretical elements required to understand the operating principles of Reinforcement Learning are presented, together with the differences between the three most used algorithms in this work. Later, the main tools used during the activities are illustrated, in particular the mechatronic systems development software Simcenter Amesim, used to model the physical systems to be controlled, and the generative design software Simcenter Studio, used to write and execute code in Python language that, with the help of special libraries dedicated to Reinforcement Learning, allows to simulate in interoperability with Amesim the effects of the implementation of any control algorithm on a physical 1D model. Subsequently, the results of some simulations are presented, in particular: • a first study on the model of an inverted pendulum on cart, on which the performance of the classic PID feedback control is compared to a Q-Learning algorithm; • a second study carried out on the model of an elevator for civil use, in which we try to compare the effectiveness of two other Reinforcement Learning algorithms which are Deep Q-Learning (or DQN) and the DDPG. In practice, after having carefully prepared the Python code to perform the desired simulations, we want to analyze the ability of the control system to follow the setpoints when varying the algorithm and the parameters that the latter uses during the learning process. Various combinations will be analysed, trying to deduce with an experimental approach what are the effects of the various parameters on the goodness of control and also try to define what could be the best compromise to reach the set objectives.

- L'obiettivo di questa tesi, svolta in collaborazione con Siemens Digital Industry Software, è quello di investigare l'applicabilità delle tecniche di Reinforcement Learning per il controllo di sistemi complessi e tentare di comprendere come la scelta di diversi algoritmi e parametri ne influenzino le prestazioni. Essendo un'applicazione relativamente innovativa di questa tecnologia non vi è a priori alcuna certezza di funzionamento, né un chiaro metodo di prosecuzione verso l'ottimizzazione degli algoritmi. - Nella prima parte dell'elaborato sono presentati in maniera non troppo approfondita gli elementi teorici necessari a comprendere i principi di funzionamento del Reinforcement Learning e le differenze tra i tre algoritmi maggiormente utilizzati in questo lavoro. A seguire vengono illustrati gli strumenti principali utilizzati durante le attività, in particolare il software di sviluppo di sistemi meccatronici Simcenter Amesim, utilizzato per modellare i sistemi fisici da controllare, ed il software di progettazione generativa Simcenter Studio, utilizzato per scrivere ed eseguire codice in linguaggio Python che, con l'aiuto di apposite librerie dedicate al Reinforcement Learning, consente di simulare in interoperabilità con Amesim gli effetti dell'attuazione di qualunque algoritmo di controllo su un modello fisico 1D. Successivamente si presentano gli esiti di alcune simulazioni, in particolare: • un primo studio eseguito sul modello di un pendolo inverso su carrello, in cui si comparano le prestazioni del classico controllo in retroazione PID ad un algoritmo Q-Learning; • un secondo studio eseguito sul modello di un ascensore per uso civile, in cui si cerca di comparare l'efficacia di due altri algoritmi di Reinforcement Learning che sono il Deep Q-Learning (o DQN) ed il DDPG. In pratica, dopo aver accuratamente preparato il codice Python per eseguire le simulazioni desiderate, si vuole analizzare la capacità del sistema di controllo a seguire i setpoint al variare dell'algoritmo scelto ed al variare dei parametri che quest'ultimo utilizza durante il processo di apprendimento. Si proveranno quindi svariate combinazioni, cercando di dedurre con un approccio sperimentale quali siano gli effetti dei vari parametri sulla bontà del controllo e anche provare a definire quale possa essere il miglior compromesso per raggiungere gli obiettivi impostati.