Home > Research > Scientific Programmes: Structural Biology Programme > Structural Computational Biology Group

Structural Biology Programme

Structural Computational Biology Group

Group Leader:  vacancy
Research highlights

The Group has contributed to several community efforts in different areas:

  • Epigenomics with the BLUEPRINT EU flagship project, which is part of the iHEC consortium; the results from this work were published at the end of 2016.
  • Pancancer Analysis of Whole Genomes (PCAWG), global analysis of 2500 complete cancer genomes; these results will be published in 2017.
  • The BioCreative text mining challenge in chemical compounds resulted in a number of resources and publications that appeared throughout 2016.

We have introduced a new computational method for the prediction of pairs of residues in protein interfaces. This method can help in the analysis of cancer related mutations.

We have also introduced new methods for the analysis of epigenomes at the linear two dimensional level (chromatin states) and three dimensional level (chromatin structure in the nucleus).

The cancer genome analysis system

Our Group is deeply involved in the development of a computational framework for the analysis of human genomes with specific application to the analysis of cancer genomes. Over the years, this framework has been applied to a number of collaborative cancer projects, and it has been particularly instrumental in the CLL-ICGC project.

We have now moved on to a new phase in which the framework is used for the analysis of the large set of full cancer genomes of the Pancancer Analysis of Whole Genomes (PCAWG); it is one of the four frameworks for data organisation, analysis and exploration used by the consortium.

With regards to the future, given the characteristics of the framework in terms of its modular structure, capacity of integration of new methods in working pipelines, and ease of installation (e.g. adoption of docker and cloud technologies), we consider that it can be the seed of new developments in the overarching analysis of human disease genomes.

Protein structure prediction and cancer genomes

In the context of cancer genome analysis, and as part of the Pan Cancer global effort, we have developed a set of methods analyse the consequences of mutations in the interface of proteins. The underlying logic is that cellular functions are governed by signals transmitted via protein interactions and protein complexes. In these interactions, the amino acids located in interacting surfaces determine the intensity of the interactions and, very importantly, the specificity of the interactions. The exquisite functioning of cellular systems between proteins depends critically on the pairing of the proteins with their correct partners, and the accuracy of the interactions depends on the correct formation of pairs of residues of the 2 proteins in the interface.

We have shown that cancer associated mutations tend to accumulate in the protein interfaces to the point that, with the information available, it is possible to say that cancer related mutations specifically target protein interfaces. Therefore, understanding the nature of protein-protein ractions is important for understanding the impact of cancer mutations.

We have developed a new methodology able to predict, with high accuracy, a small set of pairs of residues located in the interface of interacting human proteins. The new methodology, based on the study of the co-evolution of the corresponding protein families, does not require any information about the corresponding structures and it is applicable to many human protein complexes for which no other information is available. Furthermore, we have shown that the pairs of residues predicted to interact are very conserved in structural terms ( they occupy the same position in space over the lengthy evolutionary time ), which is indicative of their importance in the organisation of the corresponding interfaces.

Based on these results, we are now exploring the use of the newly developed computational methods as an alternative approach for the interpretation of the consequences of cancer related mutations.

EPIGENOME analysis infrastructure and portal

In the context of the BLUEPRINT iHEC project we have designed a system for the comparative analysis of epigenetics data (the BluePrint analysis portal http://blueprint-data.bsc.es/ release_2016-08/, developed in collaboration with the BSC-CNS and EBI-EMBL). This portal is now the main point of access for the project’s results (e.g. chromatin states, ChIP-Seq positions of histone modifications), enabling the direct comparison of the epigenetic structure of different cell types.

Based on the information provided by the Blueprint Analysis Portal, we have developed the methodology to compare epigenomes at the level of their organisation in functional segments (chromatin states). The initial results show that the system is not only able to reproduce the structure of the lineage differentiation during haematopoiesis, but also to detect what the main potential epigenetic driving factors of the differentiation are. The method, initially developed for the Blueprint data sets, is now being extended to other data types provided by the iHEC consortium.

Alternative splicing at the protein level
p>In 2016, we continued our work on alternative splicing in the context of the NIH-funded GENCODE project. Our results, summarised in a review published in TIBS (Tress et al., 2016), show that in light of combined approaches, including protein modelling, proteomics and evolutionary analysis, there is little evidence to demonstrate that alternative isoforms are expressed at the protein level in detectable quantities. In other words, the only available evidence is that normal proteins are coded by the principal isoform of each gene and not by any of the potential alternative forms that are undoubtedly produced at the mRNA level. Even if this observation is in line with recent results of the large scale analysis of gene expression in human tissues (publications of the ENCODE/GTEx -www.gtexportal.org), it is still somewhat controversial since it indicates a big unexplained discrepancy between the results obtained at the level of gene and protein expression; a discrepancy that might have profound implications for our understanding of the role of mRNA in cells and the overall understanding of the biological function of processed RNAs.


Biological Text Mining

Text mining, an important part of the Group’s activity, has broadimplications in Biomedicine. In 2016, we completed this year an exhaustive review of the application of text mining to the area of chemistry (Krallinger et al., this work has been submitted to Chem Rev); this review was based on our experience in the analysis of text mining systems and the results in the context of the 2015 BioCreative Chemdner challenge (http://www.biocreative.org/tasks/biocreative-iv/chemdner/).

During 2016, we reached an agreement with the Ministerio de Energía, Turismo y Agenda Digital for the implementation of a biological text mining platform in the framework of the ‘Plan de Impulso de las Tecnologías del Lenguaje’; this project is to develop tools and procedures in line with the recommendations of the European e-Infrastructure in text mining OpenMinted, in which we also participate.