We have continued to develop our Firestar and FireDB methods for the prediction of protein binding sites as well as their utilisation for the evaluation of prediction methods in the context of “Critical Assessment of techniques for protein Structure Prediction” (CASP) (see Proteins CASP09 Special issue). Our work in the CASP evaluation of prediction methods opens new avenues for the integration of methods such as the recent demonstration of the potential of contact prediction methods to screen and validate structural models.
Combining the results from those prediction methods, the available protein structures and published experimental data on mutations and interactions, we have proposed new models for the specific interactions of ras-p21 and its main effectors.
On the methodological side we have developed a new high throughput approach for the systematic prediction of protein interactions based on the physical characteristics of the surfaces of interacting proteins (Figure 1).
In the field of biological text-mining we organised the BC II.5 BioCreative Challenge – a community-wide effort to evaluate information extraction systems applied to biological problems. BCII.5 is dedicated to assisting authors generate Structured Digital Abstracts through the application of textmining methodology, in accordance with the model implemented by FEBS letters in collaboration with the Molecular INTeraction database (MINT). This initiative will be followed by discussions with publishers and databases for the practical integration of text-mining methodology to link the information highlighted by the authors on their own papers directly with the corresponding database information. BC III will be held in Washington DC in September 2010.
During 2009 we have developed specific applications in text-mining technology for the prediction and classification of proteins involved with spindle formation, cell-cycle control, and chromosome condensation. These developments are a result of participating in the Experimental Network for Functional INtegration (ENFIN) Network of Excellence.
The fusion of our efforts in protein structure analysis and text-mining have resulted in the analysis of the distribution of cancer-related mutations in protein kinases, a work partially carried out in collaboration with C. Orengo’s Group at the University College of London (UCL), UK. We have been able to duplicate the number of mutations characterised in databases (including the artificially introduced ones) and natural variants by extracting them directly from the original references (Figure 2).
We have continued to work on protein interaction networks with the aim of using molecular networks as the framework for analysing cancer genome data, serving as links between molecular/genomic cancer data and clinical/phenotypic information. In the context of the Innovative Medicines Initiative (IMI) initiative in toxicogenomics (e-TOX) we will be further developing this idea; combining genomics and phenotypic (toxicology) information by applying bioinformatics, systems biology and text-mining technology.
As a first incarnation of this principle we have carried out a systematic comparison of cancer genome data associated to specific tissues, using the similarity of the distribution of related genes in the network of known protein interactions as metric. As expected, genes and pathways commonly implicated in cancer are clearly detected, but interesting novel associations linking cancer types and specific pathways that accumulate a significant number of mutated genes are also highlighted.
The Group is involved in the analysis of a variety of cancer genome data, including various platforms and systems (SNP arrays, GCH arrays, expression data, exon sequencing, and DNA methylation data), as well as Next Generation Sequence data (chip-seq and exon sequencing).
We are also developing the infrastructure analysing the results of the Spanish initiative in the context of the International Cancer Genome Consortium.
We have developed new methods for the statistical analysis of CGH-arrays and for the selection of candidate genes/functions from high-throughput genomic information. We are particularly interested in the study of genetic interactions using our basic ideas on gene/protein interactions and networks.