In the machine learning research line we deal with data problems coming from different scenarios: industry, biosciences, health, economy, etc. We pursue the developments of new machine learning algorithms that can efficiently tackle these problems. Particularly, we consider problems that account for a variety of data types: from time series, to steaming data or images and speech, and a wide range of modelization techniques and mathematical formalisms such as: probabilistic graphical models, Bayesian approaches, deep learning, etc.
Our goal is to develop novel and efficient machine learning algorithms able to deal with new data-related practical problems. We also pursue the mathematical modeling of these algorithms in order to provide theoretical guarantees of their performance.
Machine Learning DS 003
The research carried out in the machine learning line is inspired in problems that appear in other scientific, technological or economical disciplines. We develop new machine learning methods and algorithms related with the main data analysis activities such as clustering, supervised classification, feature subset selection, etc. to solve this kind of problems. Based on the specific characteristics of the problem at hand, we design tailored but general algorithms that extract as much information as possible from the available data providing efficient machine learning models that solve the problem.
In addition to that, we also develop mathematical tools able to model the behavior and performance of the algorithms: studying their convergence, the estimation of the performance, the behavior of the algorithms in terms of computational time and memory requirements, etc.
During the last years the machine learning line has worked with different machine learning problems and algorithms. Particularly, we can emphasize the work done in the area of time series mining and data streaming, the adaptation of classical clustering algorithms such as k-means or k-medoids to massive data environments, the probabilistic modeling of permutations and ranked data or the developments in anomaly detection, and the analysis of crowd learning environments.
In terms of formalisms, we strongly rely on probabilistic modeling, using different tools and techniques such as probabilistic graphical models and Gaussian process to name, which in most cases are learned under a Bayesian perspective. We also pursue the use of deep learning when we consider it the most appropriate technique for the problem at hand.
Have another try. This text can be translated as well.
This text can be translated with TMGMT. Use the “translate” Tab and choose “Request Translation” to get started.
Welcome to the Translation Management Tool Demo module!
The Translation Management Tool (TMGMT) demo module provides the configuration needed for translating predefined content types – translatable nodes.
It enables three languages. Besides English, it supports German and French.
Content translation is enabled by default. This allows users to translate the content on their own. Also, Export / Import File translator enables exporting source data into a file and import the translated in return.
TMGMT demo also supports translation of paragraphs. To do this, you first need to enable paragraphs_demo and tmgmt_demo after that.
The aim of the research in Applied Statistics is to consolidate BCAM as a reference in areas such as biostatistics, demography, environmental modeling, medical statistics, epidemiology, business analytics, and biomedical research applications involving data-driven mathematical and statistical tools. We aim to capture opportunities and challenges empowering collaboration with other research areas and groups (other BERC centers, business collaborators, Public Health institutions, government organizations, and Universities) in accessing, managing, integrating, analyzing and modeling datasets of diverse nature and complexity.

The aim of the research line in Applied Statistics is to create innovative statistical models, inference methods, computational algorithms and visualization tools for analyzing complex data sets from different and diverse sources.
Computational and Applied Statistics DS 002The Applied Statistics Research line at BCAM will contribute to create synergies between researchers from national and international institutions from different fields that require the use of statistical techniques for data modeling.
Our research is related to semi-parametric regression, multidimensional smoothing, (Bayesian) hierarchical models, random-effects models, longitudinal data, spatial and spatio-temporal modeling, functional data analysis, computational statistics, and data visualization tools and methods.
In particular, in the biomedical area, “Biostatistics” uses data to measure, understand and ultimately solve medical problems, by the use of statistical models and theory. Biostatistics is an exciting and versatile discipline contributing to all fields of medical research, evidence-based health care and decision-making. The increasing need of biostatistical support for the Basque Public Health Institutions, demands researchers in Biostatistics that not only support other researchers in biomedical and related sciences through statistical analyses and scientific support, but specially to contribute to high-impact research, excellence, innovation and training in statistical modeling.
The research line contributes with the Spanish National Network of Biostatistics (BIOSTATNET), a pioneer network led by applied statisticians from different institutions with own research projects and teaching experience in Biostatistics, working closely with biomedical researchers. We also actively collaborate with the Biostatistics group at University of the Basque Country (UPV/EHU) and other national and international institutions in order to address issues of mathematical and statistical theory and methodology to improve decision-making process. We aim to highlight and increase the role of Statistics and foster collaboration with our partners and promote professional development and training in the area of Applied Statistics.
The statistical modeling methodology developed by the group deals with those aspects of the analysis of data that are not highly specific to particular fields of study. Therefore, our research provides concepts and methods that will, with suitable modification, be applicable in many fields (e.g. Economics, Business, Engineering, Demography etc.) which demand a wide variety of data modeling and computational tools for the analysis of complex problems, particularly where a huge amount of data is collected.
Implements several nonparametric regression approaches for the inclusion of covariate information on the receiver operating characteristic (ROC) framework.
Download from:
https://CRAN.R-project.org/package=npROCRegression
Offers a variety of tools, such as specific plots and regression model approaches, for analyzing different patient reported questionnaires. Especially, mixed-effects models based on the beta-binomial distribution are implemented to deal with binomial data with over-dispersion (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2017).
Download from:
https://cran.r-project.org/package=PROreg
Allows for the use of two-dimensional (2D) penalised splines (P-splines) in the context of agricultural field trials. Traditionally, the modelling of the spatial or environmental effect in the expression of phenotypes has been done assuming correlated random noise (Gilmour et al, 1997). We, however, propose to model the spatial variation explicitly using 2D P-splines (Rodriguez-Alvarez et al., 2016; arXiv:1607.08255). Besides the existence of fast and stable algorithms for estimation (Rodriguez-Alvarez et al., 2015; Lee et al., 2013), the direct and nice interpretation of the spatial trend that this approach provides makes it attractive for the analysis of field experiments.
Download from: