NMR Resonance Assignment and Microarray Data Analysis (May 2001)
The Computational Protein Structure Group in the Computational
Biology Section of ORNL's Life Sciences Division has made significant
progress in the research areas of NMR resonance assignment and
microarray data analysis.
We developed a computer program and submitted a paper for the
problem of NMR resonance assignment, which is one of the key steps in
solving an NMR protein structure. The assignment process links
resonance peaks to individual residues of the target protein
sequence, providing the prerequisite for establishing intra- and
inter-residue spatial relationships between atoms. The assignment
process is tedious and time-consuming, which could take many weeks.
Though there exist a number of computer programs to assist the
assignment process, many NMR labs are still doing the assignments
manually or semi-manually for quality reasons. We designed a new
computational framework for automating the assignment process,
particularly for backbone resonance peak assignment. We formulate the
assignment problem as a constrained weighted bipartite matching
problem. The formulation provides a natural framework for
incorporating all available information into the assignment process.
We have implemented the algorithm and tested it on four proteins with
both real and simulated NMR peaks. The promising results indicate our
method has made further progress in fully automated peak assignment.
This work was submitted for publication to a special issue of
"Intelligent Systems in Biology" in IEEE Intelligent Systems &
Their Applications at the end of May 2001.
We developed a computer program to analyze microarray data and
applied the program to study the gene expression to chitin in
Arabidopsis. One of the most powerful tools to investigate gene
function and pathway is DNA microarray. Massive microarray data on
gene expression have been generated. Although some computational
tools have been developed to analyze the data, they generally do not
meet the needs of experimentalists well due to their limitations in
algorithms, utilities, and software interfaces. To address these
limitations, we have developed a computer program for clustering gene
expression profiles of microarray data using a new approach based on
the minimum spanning tree algorithm. Our preliminary studies show
that the program produces good results in benchmark tests. It also
runs fast, and only takes seconds for analyzing a set of microarray
data. The program offers varieties of options in addition to an
easy-to-use default setting. It is integrated with a user-friendly
JAVA interface. We have applied this program to a project in
collaboration with Gary Stacey's group at the University of Tennessee
and Shauna Somerville' Group at Carnegie Institution on the analysis
of gene expression and upstream regulatory regions of plants. We
clustered the chitin-responsive ESTs according to their expression
profiles. We also searched the upstream regions of the corresponding
genes, and carried out a comparative promoter analysis of the genes
in the same cluster. We identified some interesting conserved
motifs, including several known binding motifs such as W-boxes.
The study of NMR resonance assignment is funded by DOE OBER (a
project titled "Structures of DNA-Repair Proteins: A New approach
Combining Computational Modeling and NMR Spectroscopy"). Microarray
data analysis is funded by LDRD at ORNL (a project titled
"Computational Inference of Regulatory and Metabolic Networks").
Contact: Ying Xu, 865-574-7263 or xuy1@ornl.gov
Funding Source: DOE-OBER (KP)
|