The use of Java in large scientific applications in HPC environments

dc.contributor
Universitat de Barcelona. Departament d'Astronomia i Meteorologia
dc.contributor.author
Fries, Aidan
dc.date.accessioned
2013-01-24T11:23:45Z
dc.date.available
2013-01-24T11:23:45Z
dc.date.issued
2013-01-21
dc.identifier.uri
http://hdl.handle.net/10803/98405
dc.description.abstract
Java is a very commonly used computer programming language, although its use amongst the scientific and High Performance Computing (HPC) communities remains relatively low. In this thesis, the option of using Java for developing scientific applications intended for execution in HPC environments is investigated. The data reduction pipeline for the Gaia space astronomy mission is an example of a large software project that has been written in Java, and will run in HPC environments. The efficient execution of the Gaia data reduction pipeline was one of the main motivations behind this thesis, although this thesis largely remains a general investigation into the use of Java in HPC. HPC is a fast changing field, in terms of hardware, software, and the scale of the problems that are being tackled. Amongst the most significant trends in HPC in recent years have been the increase in the number of cores per computing node, and the increase in the size of datasets that must be processed. A significant challenge in HPC is ensuring that data is made available in a particular node, when a core is ready to process it, thereby avoiding deadtime and providing high throughput. One danger to throughput is a decrease in the performance of shared storage devices, as the number of concurrent processes that are accessing those devices increases. Given the trends mentioned above, efficient data communication is very important for many applications running in HPC environments. In this thesis, we present an investigation into the current options for providing efficient data communication to Java applications in HPC environments. We investigate a number of implementations of Message Passing in Java (MPJ) and compare their performance. We present a new communication middleware application, called MPJ-Cache. This middleware makes use of an underlying implementation of Message-Passing in Java (MPJ), and adds prefetching, caching, and file-splitting functionality. It presents application developers with a high-level API, thus providing high-performance, as well as enabling high productivity amongst application developers. We compare the aggregate data rate that can be achieved though the use of this middleware, against that which can be achieved though direct access of a high performance shared storage device (GPFS), while distributing data amongst the nodes of a computer cluster. The use of MPJ-Cache has shown to provide an aggregate data rate of up to 103Gbps. Java applications are executed within a Java Virtual Machine (JVM), which is a managed runtime environment. The execution of applications within such a runtime environment is very different from the execution of native code, that was compiled ahead-of-time. The Java runtime environment consists of several sophisticated components, including the core runtime system, a garbage collector and a Just-In-Time (JIT) compiler. Modern JVMs strive to provide out-of-the-box high-performance, however in some situations, users may want to tune the JVM to better suit the behaviour and needs of a particular application. In order to do this, a profile of the target application should be obtained.
cat
dc.format.extent
240 p.
cat
dc.format.mimetype
application/pdf
dc.language.iso
eng
cat
dc.publisher
Universitat de Barcelona
dc.rights.license
ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
dc.source
TDX (Tesis Doctorals en Xarxa)
dc.subject
Java (Llenguatge de programació)
cat
dc.subject
Java (Lenguaje de programación)
cat
dc.subject
Java (Computer program language)
cat
dc.subject
HPC
cat
dc.subject
MPJ
cat
dc.subject
Gaia (Space astronomy mission)
cat
dc.subject
Gaia (Misión astronómica espacial)
cat
dc.subject
Gaia (Missió astronòmica espacial)
cat
dc.subject.other
Ciències Experimentals i Matemàtiques
cat
dc.title
The use of Java in large scientific applications in HPC environments
cat
dc.type
info:eu-repo/semantics/doctoralThesis
dc.type
info:eu-repo/semantics/publishedVersion
dc.subject.udc
52
cat
dc.contributor.director
Portell de Mora, Jordi
dc.contributor.director
Sirvent Pardell, Raül
dc.contributor.tutor
Luri Carrascoso, Xavier
dc.embargo.terms
cap
cat
dc.rights.accessLevel
info:eu-repo/semantics/openAccess
dc.identifier.dl
B. 3746-2013
cat


Documentos

FRIES_PhD_THESIS.pdf

3.959Mb PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)