Efficiently evaluate big data

|   Aktuelle Meldungen

The German Research Foundation is funding a new research unit with RUB participation.

Resolving the contradiction between optimum and computability in the analysis of Big Data is the goal of the Research Unit 5381 "Mathematical Statistics in the Information Age - Statistical Efficiency and Computational Tractability". It will be funded with 2.1 million euros for four years starting in 2022. Prof. Dr. Holger Dette from the Faculty of Mathematics heads the research group together with former RUB professor Dr. Angelika Rohde (now University of Freiburg) and works in two of five subprojects. The speaker university is the Albert Ludwigs University of Freiburg. The University of Potsdam, the University of Vienna/Austria, the University of Rostock, the Georg August University of Göttingen and the Humboldt University of Berlin are also members of the research unit.

In the Big Data age, data is ubiquitous and often generated automatically, for example in medical diagnostics, in the EU's Copernicus earth observation program or in social networks. Their correct analysis provides important information about medical, scientific, ecological or economic relationships. Mathematical statistics has developed efficient methods for evaluation. However, these methods, which are optimal from a statistical point of view, cannot be used for very large data volumes because they require too much time, even on high-performance computers, to deliver results in an acceptable amount of time.

New statistical methods

Ensuring that large amounts of data can be analyzed in a reasonable amount of time using the best possible methods and that the resulting findings can be used reliably and quickly is the goal of the research unit. For this purpose, the team studies all sequential data processing steps simultaneously to enable the best possible statistical evidence in each sub-step.

Subprojects with RUB participation

Subproject 1 "Practically computable bootstrap methods for high-dimensional data" deals with the quantification of uncertainties in estimates from high-dimensional data, which usually include the relation between dimension and sample size. The classical computer-aided methods (bootstrap) are neither consistent in a statistical sense - i.e., they yield incorrect results - nor computationally feasible for high dimension. In this project, therefore, alternative and implementable methods are developed that allow a valid quantification of uncertainties, such as by confidence intervals.

Subproject 4 "Sublinear methods with statistical guarantees" deals with the question whether in large data sets informative sub-samples can be identified from which the relevant statistical information can be determined in acceptable computing time with approximately the same accuracy as from the total sample, which cannot be used for data analysis because of too long running times of the algorithms. Another focus of the subproject is on new efficient methods to detect changes in signals quickly and reliably.


Click here for the original article in the RUB News.

Symbol image RAM
© RUB, Marquard
To Top