Modeling and analyzing opinions from customer reviews

Author

García Moya, Lisette

Director

Berlanga Llavorí, Rafael

Date of defense

2016-01-11

Pages

128 p.



Department/Institute

Universitat Jaume I. Departament de Llenguatges i Sistemes Informàtics

Abstract

The main motivation behind this thesis is the problem of aspect-based sentiment summarization and its application to Business Intelligence (BI). Given a collection of opinion posts, aspect-based summarization has to do with extracting from the collection the most relevant opined aspects (also called features) along with their associated sentiment information (usually an opinion word and/or a polarity score that express the sentiment orientation of the opinion). In the recent scenario of e-commerce, we presume that BI could rely on extracted knowledge from reviews available in the Web in order to analyze recent trends as well as the satisfaction and behavior of customers and to prepare strategic plans accordingly. Specifically, this thesis proposes new methodologies to: - model and extract the opinions and their respective targets (i.e., aspects or features) from collections of opinion posts, and - integrate the extracted sentiment data into a traditional corporate data warehouse to enable BI. The modeling of opinions and their targets takes place in the general framework of statistical language modeling. The hypothesis is that there exists a language model of opinion words able to model the opinion lexicon of a domain, and that there is also a language model of aspects that can be learned from the model of opinions. Both the learning of the models and the extraction of the sentiment data (i.e., the tuples feature-opinion) are implemented using unsupervised approaches that do not need exhaustive natural language processing (except for POS-tagging/ lemmatization). The resulting methodologies can be applied to any language and domain given a seed set of general-domain opinion words. For the integration of sentiment data with traditional corporate data two scenarios are considered: a static one in which both the data sources and the user requirements are static and known in advance, and dynamic one based on an open data infrastructure where BI data can be linked to external sources on demand, without being attached to predefined (rigid) data structures or multidimensional schemas. We demonstrate our proposal on datasets of real opinions available in the Web. Results of the proposed method corroborate the thesis claims and show a good effectivity for their usage as a BI analysis tool.

Keywords

Informática

Subjects

004 - Computer science and technology. Computing. Data processing

Documents

2016_Tesis_García Moya_Lisette.pdf

3.486Mb

 

Rights

ADVERTIMENT. Tots els drets reservats. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

This item appears in the following Collection(s)