Machine-Learning Applied Methods

Author

Mauricio Palacio, Sebastián

Director

Borrell, Joan-Ramon

Tutor

Guillén, Montserrat

Date of defense

2020-07-23

Pages

210 p.



Department/Institute

Universitat de Barcelona. Facultat d'Economia i Empresa

Abstract

The presented discourse followed several topics where every new chapter introduced an economic prediction problem and showed how traditional approaches can be complemented with new techniques like machine learning and deep learning. These powerful tools combined with principles of economic theory is highly increasing the scope for empiricists. Chapter 3 addressed this discussion. By progressively moving from Ordinary Least Squares, Penalized Linear Regressions and Binary Trees to advanced ensemble trees. Results showed that ML algorithms significantly outperform statistical models in terms of predictive accuracy. Specifically, ML models perform 49-100\% better than unbiased methods. However, we cannot rely on parameter estimations. For example, Chapter 4 introduced a net prediction problem regarding fraudulent property claims in insurance. Despite the fact that we got extraordinary results in terms of predictive power, the complexity of the problem restricted us from getting behavioral insight. Contrarily, statistical models are easily interpretable. Coefficients give us the sign, the magnitude and the statistical significance. We can learn behavior from marginal impacts and elasticities. Chapter 5 analyzed another prediction problem in the insurance market, particularly, how the combination of self-reported data and risk categorization could improve the detection of risky potential customers in insurance markets. Results were also quite impressive in terms of prediction, but again, we did not know anything about the direction or the magnitude of the features. However, by using a Probit model, we showed the benefits of combining statistic models with ML-DL models. The Probit model let us get generalizable insights on what type of customers are likely to misreport, enhancing our results. Likewise, Chapter 2 is a clear example of how causal inference can benefit from ML and DL methods. These techniques allowed us to capture that 70 days before each auction there were abnormal behaviors in daily prices. By doing so, we could apply a solid statistical model and we could estimate precisely what the net effect of the mandated auctions in Spain was. This thesis aims at combining advantages of both methodologies, machine learning and econometrics, boosting their strengths and attenuating their weaknesses. Thus, we used ML and statistical methods side by side, exploring predictive performance and interpretability. Several conditions can be inferred from the nature of both approaches. First, as we have observed throughout the chapters, ML and traditional econometric approaches solve fundamentally different problems. We use ML and DL techniques to predict, not in terms of traditional forecast, but making our models generalizable to unseen data. On the other hand, traditional econometrics has been focused on causal inference and parameter estimation. Therefore, ML is not replacing traditional techniques, but rather complementing them. Second, ML methods focus in out-of-sample data instead of in-sample data, while statistical models typically focus on goodness-of-fit. It is then not surprising that ML techniques consistently outperformed traditional techniques in terms of predictive accuracy. The cost is then biased estimators. Third, the tradition in economics has been to choose a unique model based on theoretical principles and to fit the full dataset on it and, in consequence, obtaining unbiased estimators and their respective confidence intervals. On the other hand, ML relies on data driven selection models, and does not consider causal inference. Instead of manually choosing the covariates, the functional form is determined by the data. This also translates to the main weakness of ML, which is the lack of inference of the underlying data-generating process. I.e. we cannot derive economically meaningful conclusions from the coefficients. Focusing on out-of-sample performance comes at the expense of the ability to infer causal effects, due to the lack of standard errors on the coefficients. Therefore, predictors are typically biased, and estimators may not be normally distributed. Thus, we can conclude that in terms of out-sample performance it is hard to compete against ML models. However, ML cannot contend with the powerful insights that the causal inference analysis gives us, which allow us not only to get the most important variables and their magnitude but also the ability to understand economic behaviors.

Keywords

Aprenentatge automàtic; Aprendizaje automático; Machine learning; Teoria econòmica; Teoría económica; Economic theory

Subjects

33 - Economics. Economic science

Knowledge Area

Ciències Jurídiques, Econòmiques i Socials

Note

Programa de Doctorat en Economia

Documents

SMP_PhD_THESIS.pdf

4.871Mb

 

Rights

ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

This item appears in the following Collection(s)