Marten van Dijk

      Consultant, Inventor, Researcher, Applied Mathematician, & Computer Scientist

 

  Resume

  Research Projects

 

  Curriculum Vitae

  Teaching

 

  Publications

  Patents

  Contact Information

 

  Home Page


   Protein Folding:
   
    It is an important and relevant problem to accurately predict the secondary structure of proteins 
    based on their amino acid sequence. The identification of basic secondary structure elements -- alpha 
    helices, beta strands, and coils -- is a critical prerequisite for many tertiary structure predictors, 
    which consider the complete three-dimensional protein structure. There exist a broad array of approaches 
    to secondary structure prediction, including statistical techniques, neural networks, Hidden Markov 
    Models (HMMs), Support Vector Machines (SVMs), nearest neighbor methods and energy minimization. 
    In terms of prediction accuracy, neural networks are among the most popular methods in use today, 
    delivering a pointwise prediction accuracy (Q_3) of about 77% and a segment overlap measure (SOV) of 
    about 74%.
 
    To improve the long-term performance of secondary structure prediction, it likely will be necessary to 
    develop a cost model that mirrors the underlying biological constraints. While neural networks offer good 
    performance today, their operation is largely opaque. Often containing up to 10,000 parameters and relying 
    on complex layers of non-linear perceptrons, neural networks offer little insight into the patterns learned.  
    Moreover, they mask the shortcomings of the underlying models, rendering it a tedious and ad-hoc process 
    to improve them. The largest improvements in neural network prediction accuracy have been due to the 
    integration of homologous sequence alignments rather than specific changes to the underlying cost model.

    Of the approaches developed to date, Hidden Markov Models (HMMs) offer perhaps the most natural 
    representation of protein secondary structure. An HMM consists of a finite set of states with learned
    transition probabilities between states. In biological terms, each transition corresponds to a local folding 
    event, with the most likely sequence of states corresponding to the lowest-energy protein structure. HMMs 
    generally contain hundreds of parameters, 1-2 orders of magnitude less than that of neural networks. In 
    addition to providing a tractable model that can be reasoned about, the reduction in parameters lessens the 
    risk of overlearning. However, the leading HMM methods to date have not exceeded a Q_3 value of 75%, 
    and SOV scores are often unreported.

    In [1,2], we focus on improving the prediction accuracy of HMM-based methods, thereby advancing the goal 
    of achieving a state-of-the-art predictor while maintaining an intuitive and biophysically-motivated cost 
    model. Our technique relies on Hidden Markov Support Vector Machines (HM-SVMs), a recent innovation 
    in the field of machine learning. While HM-SVMs share the prediction structure of HMMs, the learning 
    algorithm is more powerful. Unlike the expectation-maximization algorithms typically used to train HMMs, 
    training with an SVM allows for a discriminative learning function, a soft margin criterion, and 
    bi-directional influence of features on parameters.

    Using the HM-SVM approach, we develop a simple 7-state HMM for predicting alpha helices and coils. The 
    HMM contains 302 parameters, representing the energetic benefit for each residue being in the middle of a 
    helix or being in a specific position relative to the N- or C-cap. Our technique does not depend on any 
    homologous sequence alignments. Applied to a database of all-alpha proteins, our predictor achieves a 
    Q_alpha value of 77.6% and an SOV_alpha score of 73.4%. These performance numbers are among the best 
    for techniques that do not rely on multiple sequence alignments.  

    [1] B. Gassend, C.W. O'Donnell, W. Thies, A. Lee, M. van Dijk, and S. Devadas, Predicting secondary 
    structure of all-helical proteins using hidden Markov support vector machines, PRIB 2006, p. 93-104, 2006.

    [2] B. Gassend, C.W. O'Donnell, G.E. Suh, W. Thies, A. Lee, M. van Dijk, and S. Devadas, Learning 
    biophysically-motivated parameters for alpha helix prediction, BMC Bioinformatics 8(5), p. S3, 2007. 
    Poster at 10th Annual International Conference on Research in Computational Molecular Biology 
    (RECOMB 2006), 2006.
   
 


© 2009 Marten van Dijk . All rights reserved.

 

 

 

This Web Page Created with PageBreeze Free HTML Editor