With scientific computations, e.g. in the field of bioinformatics, we observe a dramatic increase in the amount of data that is computed. Even with modern high performance computers the storage capabilities often form the bottleneck in computing more detailed results. We focus our research on parallel file systems that can be found in cluster environments.
The critical issue here is the overall performance that you will get from your system. Scalability is a problem because of sequential parts in the code of the parallel file system when it comes to metadata operations. What is needed are tools that give insight into the internal behavior of parallel file systems and relate this information to the user level. By doing so we can see what activity in the user program triggers which low level read/write operations.
The talk will present an enhanced tool environment that is based on PVFS2 and MPICH2. We add tracing facilities to the parallel file system and thus can investigate its behavior and relate it to the parallel user program.
The second part of the talk will present results from a test on a cluster at the German Cancer Research Center, where we investigate different parallel file systems for image processing. It is interesting to see which aspects have to be considered when it comes to tens of millions of single files. Even simple ls-operations will need a long time to complete and with parallel file systems things can even get worse.
Thomas Ludwig received his habilitation degree from Technische Universitaet Muenchen, in Munich, Germany, where he worked for 13 years in the field of parallel computing with a focus on load balancing, development tools, and cluster and tool infrastructures.
He also conducted research in the field of parallel programming, namely with computer tomography and bioinformatics. Since 2001 he is a professor for computer science at the Ruprecht-Karls-Universitet Heidelberg in Heidelberg, Germany.
His current research focus is in the field of high performance parallel input/output systems for cluster environments.