INSERM
Alexandre G. de Brevern
Accueil Research Publications / Communications CV Links

Alexandre G. de Brevern's papers


Definition of Protein Blocks and local prediction

de Brevern A.G., Etchebest C. & Hazout, S. (2000), Bayesian probabilistic approach for prediction backbone structures in terms of protein blocks, Proteins : Structure, Functions and Genetics, 41(3):271-287.
Using an unsupervised cluster analyser, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive Ca ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35 %. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6 %. This prédiction accuracy exceeds 75 % when keeping the first four predicted protein blocks at each site of the protein.

Top


Use of Protein Blocks to understand the sequence - structure relationship

de Brevern A.G. & Hazout S. (2000), Hybrid Protein Model (HPM): a method to compact protein 3D-structures information and physicochemical properties, IEEE - Computer Society , S1:49-54.
The transformation of protein 1D-sequence to protein 3D-structure is one of the main difficulties of the structural biology. A structural alphabet had been previously defined from dihedral angles describing the protein backbone as structural information by using an unsupervised classifier. The 16 Protein Blocks (PBs), basis element of the structural alphabet, allows a correct 3D structure approximation. Local prediction had been estimated by a Bayesian approach and shown that sequence information induces strongly the local fold, but stays coarse (prediction rate of 40.7 % with one PB, 75.8 % with the four most probable PBs). The Hybrid Protein Model presented in this study learns both sequence and structure of the proteins. The analysis made along the hybrid protein has permitted to appreciate more precisely the spatial location of some types of amino acid residues in the secondary structures and their flanking regions. This study leads to a fuzzy model of dependence between sequence and structure.

Top


Protein Blocks and similar local structures

de Brevern A.G. & Hazout S. (2001), Compacting local protein folds with a Hybrid Protein, Theoretical Chemistry Accounts, 106(1/2):36-47.
The "Hybrid Protein Model" (HPM) is a fuzzy model for compacting local protein structures. It learns a non-redundant database encoded in a previously defined structural alphabet composed of 16 protein blocks (PBs). The hybrid protein is composed of a series of distributions of the probability of observing the PBs. The training is an iterative unsupervised process that for every fold to be learnt consists of looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position of the hybrid protein corresponds to a set of similar local structures. Superimposing those local structures yields an average root mean square of 3.14 Å. The significant amino acid characteristics related to the local structures are determined. The use of this model is illustrated by finding the most similar folds between two cytochromes P450.

Top


Use of a structural alphabet for predicting the loops

Camproux A.C., de Brevern A.G., Hazout S. & Tuffery P. (2001), Exploring the use of a structural alphabet for a structural prediction of protein loops, Theoretical Chemistry Accounts, 106(1/2):28-35.
The prediction of loop conformations is one of the challenging problems of homology modeling, due to the large sequence variability associated with these parts of protein structures. In the present study, we introduce a search procedure that evolves in a structural alphabet space deduced from a hidden Markov model to simplify the structural information. It uses a Bayesian criterion to predict, from the amino acid sequence of a loop region, its corresponding word in the structural alphabet space. Results show, that our approach ranks 30 % of the target words with the best score, 50 % within the 5 best scores. Interestingly, our approach is also suited to accept or not the prediction performed. This allows to rank 57 % of the target words with the best score, 67 % within the 5 best scores, accepting 16 % of learned words and rejecting 93 % of unknown words.

Top


Thése / PhD french

de Brevern A.G. (2001), Nouvelles stratégies d'analyses et de prédiction des structures tridimensionnelles des protéines, Doctorat de l'Université PARIS 7 - Spécialité : Analyses de Génomes et modélisation moléculaire, 208 p.
Résumé :Caractériser la structure tridimensionnelle des protéines avec les structures secondaires classiques est assez pauvre structurellement. Nous avons donc développé une nouvelle méthodologie pour concevoir des séries de petits prototypes moyens nommés Blocs Protéiques (BPs) qui permettent une bonne approximation des structures protéiques. L'analyse de la spécificité des blocs protéiques a montré leur stabilité et leur spécificité sur le plan structural. Le choix final du nombre de BPs est associé a une prédiction locale correcte.
Cette prédiction se base avec une méthode bayésienne qui permet de comprendre l'importance des acides aminés de manière simple. Pour améliorer cette prédiction, nous nous sommes bases sur deux concepts : (i) 1 repliement local -> n séquences et (ii) 1 séquence -> n repliements. Le premier concept signifie que plusieurs types de séquences peuvent être associes a la même structure et le second qu'une séquence peut-être associée a plusieurs type de repliements. Ces deux aspects sont développés en se basant sur la recherche d'un indice de fiabilité lie a la prédiction locale, pour trouver des zones de fortes probabilités. Certains mots, i.e. successions de blocs protéiques apparaissent plus fréquemment que d'autres. Nous avons donc défini au mieux quelle est l'architecture de ces successions, les liens existants entre ces différents mots.
Du fait de cette redondance qui peut apparaître dans la structure protéique, une méthode de compactage qui permet d'associer des structures structurellement proches sur le plan local a été mise au point. Cette approche appelée "protéine hybride" de conception simple permet de catégoriser en classes "structurellement dépendantes" l'ensemble des structures de la base de données protéiques. Cette approche, en plus du compactage, peut être utilisée dans une optique différente, celle de la recherche d'homologie structurale et de la caractérisation des dépendances entre structures et séquences.

paper :The secondary structures approximate badly the 3D protein structures. A new method have been developped to create a more complex and precise structural alphabet (called Protein Blocks) which could be used in a local prediction method. The analysis of the specificity and stability of this alphabet has been performed. This alphabet is composed of 16 prototypes of 5 Calpha length. The local prediction of PBs from the sequence gives correct results.
The prediction is based on Bayesian statistics which is efficient to understand the meaning and influences of every amino acids. To improve this prediction, we have used two concepts : (i) 1 local fold is associated with a set of sequences and (ii) 1 sequence could give different folds. The first point is associated with the splitting of occurrence matrices associated with the most common PBs. The second point is based upon a confidence index and allow the location of well-predicted residues. Some succession of Protein Blocks are over-represented. So, we have define a network describing most of the protein topology.
Finally, we propose a method called Hybrid Protein Model which allow the compaction of succession of Protein Blocks in a fuzzy manner and create structurally dependant cluster. This approach has been extended to the research of structural homology.

Top


Structural alphabet : A review

de Brevern A.G., Camproux A.C., Hazout S., Etchebest C., and Tuffery P. (2001), Protein structural alphabets: beyond the secondary structure description, Recent Adv. In Prot. Eng., 1:319-331, Sangadai SG ed. Research signpost, Trivandrum, India.
The considerable increase of the protein structural database allows to cross the line from the classical secondary structure description of proteins. While still confronted with numerous problems, defining structural alphabets is an emerging concept in the field of protein structure analysis. It is an attempt to objectively classify the whole set of conformations occurring in protein structures described by small overlapping fragments. It is expected to lead to a better understanding of protein architecture and to open new opportunities for protein structure prediction.

Top


Genomic Compartimentation

de Brevern A.G., Loirat F., Badel-Chagnon A., André C., Vincens P., and Hazout S. (2002), Genome compartimentation by a Hybrid Chromosome Model (Hc M). Application to Saccharomyces cerevisae subtelomeres., Computers and Chemistry, 26:437-445.
The aim of this paper is to present a new approach, called "Hybrid Chromosome Model" (Hc M), which allows both the extraction of regions of similarity between two sequences, and the compartimentation of a set of DNA sequences. The principle of the method consists in compacting a set of sequences (split into fragments of fixed length) into a "hybrid chromosome", which results from the stacking of the whole sequence fragments. We have illustrated our approach on the 32 subtelomeres of Saccharomyces cerevisae. The compartimentation of these chromosome extremities into common regions of similarity have been carried out. The approach Hc M is a fast and efficient tool for mapping entire genomes and for extracting ancient duplications within or between genomes.

Top


Compartimentation à grande échelle french

de Brevern A.G. (2002), Compartimentation chromosomique., Biofutur, 225:20-22.
Du fait du nombre important de génomes complets actuellement disponibles, il devient possible de les comparer directement. La méthode de chromosome hybride (Hc M) permet, en découpant ces génomes en longues zones similaires, cette comparaison.

Top


The Words of the structural alphabet

de Brevern A.G., Valadié, H., Hazout H. & Etchebest C. (2002), Extension of a local backbone description using a structural alphabet. A new approach to the sequence-structure relationship., Protein Science, 11(12):2871-2886.
Protein Blocks (PBs) comprise a structural alphabet of 16 protein fragments, each 5 C alpha long. They make it possible to approximate and correctly predict local protein 3-dimensional (3-D) structures (de Brevern et al., 2000). We have selected the 72 most frequent sequences of 5 PBs, which we call Structural Words (SWs). Analysis of 4 different protein databanks shows that SWs cover 92% of the amino acids in them and provide a good structural approximation for residues, that is, sequences, 9 C alpha long. We present most of them in a simple network that describes 90% of the overall residues and, interestingly, includes more than 80% of the amino acids present in coils. Analysis of the network shows the specificity and quality of the 3D descriptions as well as a new type of relation between local folds and amino acid distribution. The results show that the 3D structure of these protein databanks can be easily described by a combination of subgraphs included in the network. Finally, a Bayesian probabilistic approach improved the prediction rate by 4%.

Top


Improvement of Hybrid Protein Model

de Brevern A.G. & Hazout S. (2003), A "Hybrid Protein Model" for defining optimally a repertory of contiguous 3D protein structure fragments, Bioinformatics, 19:345-353. [color figures].
Motivation : Our aim is to define automatically a repertory of contiguous 3D protein structure fragments. protein structures in order to exploit the defined domains. We present the improvements of a methodology, the "Hybrid Protein Model" (de Brevern and Hazout, 2001). The hybrid protein aims in learning a non-redundant database encoded in a previously defined structural alphabet composed of 16 Protein Blocks (PBs) (de Brevern et al., 2000). The hybrid protein is composed of probability series of observing the PBs. It consists in learning every local fold by looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position corresponds to a set of similar local structures.

Results : In this paper, we present the strategy for defining optimally the hybrid protein. The strategy lies upon the "baby training" which consists in introducing large structure fragments and progressively reducing their sizes, and, the deletion of the redundancy in the hybrid protein. Assessing of these two improvements is carried out with a description of the repertory.

Top


Improvement of the Hybrid Protein Model

Benros C., de Brevern A.G. & Hazout H. (2003), Hybrid Protein Model (HPM) : A Method For Building A Library Of Overlapping Local Structural Prototypes. Sensitivity Study And Improvements Of The Training., IEEE NNSP, 1:53-72.
Predicting protein structure from amino acid sequence is one of the main challenges of Genomics. Various computational methods have been developed during the last decade to reach this goal. However, the problem of structure prediction remains difficult. Before facing this complex problem, our goal is to focus on the accurate analysis of protein structures at a local level. In our study, we present an approach called "Hybrid Protein Model" (HPM) which uses a training procedure similar to the one of the Self-Organizing Maps. It allows the compression of a non-redundant protein structure databank into a library of overlapping 3D structural fragments. The "Hybrid Protein Model" carries out a multiple alignment of structural fragments. We present in this study an improvement of this strategy by introducing gaps in the local structures, and a sensitivity study of the training according to the control parameters. The library obtained is composed of a finite number of structural classes, each class including fragments sharing similar local structures. These classes are representative of the structural motifs found in the protein structures from the databank. Thus, this library constitutes an efficient tool for determining structural similarities between proteins and especially for predicting the local protein structure from the amino acid sequence.

Top


The communications (posters and presentations).

back

If you want more information about those works mailto: debrevern@ebgm.jussieu.fr.

De BREVERN Alexandre
Equipe de Bioinformatique Génomique & Moléculaire du professeur Serge Hazout (EBGM)
Unité INSERM E0346
Université Paris VII, case 7113
2, place Jussieu
75251 Paris Cedex 05
pour envoyer un mail e-mail with Subject: Protein Blocks.