本次Java代写是使用Java解决计算生物学研究中的现实问题

CSCI 4470 Project

### Fall 2019

Due Friday November 22, 2019

This programming assignment is to develop a deliverable program that solves a realistic problem in computational biology research. Only dynamic programming (and no biological background) is needed to accomplish such a project.

# 1 Structure Prediction

This project is to compute the maximum number of permissible letter pairs from the input sequence of 4 letters (A, C, G, U). For these letters, legal pairs are A-U, U-A, G-C, C-G while others (such a G-A) are illegal. In addition, a pairing between two letters is exclusive, e.g., if a letter U on the sequence that is paired with a letter A, that letter U cannot be paired with another letter A. This problem is called **Structure Prediction **because it is a meaningful abstraction of the significant problem *RNA secondary structure prediction *in computational biology, where A, C, G, and U are nucleotide bases that constitute RNA sequences. A nucleotide base pair brings two bases physically together, contributing to the structure into which an RNA molecule may form. The maximization of such base pairs corresponds to the minimization of free energy that may stabilize the formed structure. The reference paper by Eddy gives more background to interested readers.

Problem **Structure Prediction **resembles the homework question Max Parenthesization (see Assignment #4), where the number of exclusive bracket pairs () and [] are to be maximized and can be solved with an “inside” dynamic programming method. But **Structure Prediction **prob- lem permits pairs )( and ][ as well as () and []. Nevertheless, pairings still have to be in nested and/or parallel fashions. For example, given input sequence AUGUCAGCGUU, the following structures are all legal:

AUGUCAGCGUU | AUGUCAGCGUU | AUGUCAGCGUU |

{}{.}{{}.}. | {..}{.}{}.. | {{{.}}.{}.} |

where the denotational symbols *{ *and *} *underlining the sequence indicate which two letters are predicted to be paired and the dot symbol ’*.*’ indicates a letter predicted to be not paired with any other.

## 1.1 Input Data

Input data to your program are in a plain text file, named sequence.txt. It contains an RNA sequence in the format:

>name of the sequence AUGUCAGCGUU

## 1.2 Output

The output may be saved to a plain text file and should contain the max- imum count of base pairs and the input sequence underlined with denota- tional symbols *{*, *} *and dot ’.’, in the format:

>AUGUCAGCGUU

{{{.}}.{}.}

>max count of pairs 4

## 1.3 Reference

Sean Eddy, How do RNA Folding Algorithms Work? *Nature Biotechnology*, Vol 22, No. 7, July 2004.

## 1.4 Further Notes

It needs to be pointed out that answering the question posed in the **Struc- ture Prediction **problem only offers a remote approximation to solving the underlying science problem. To make your program more realistically applicable, there are a few considerations that can be incorporated into your algorithm design. **These are optional, however.**

** **

- A nucleotide base pair, g., A-U, can only be biologically plausible if there are at least 3 other bases between them. How would your dynamic programming handle this restriction?
- Since base pairs reduce free energy to achieve structural stability, con- tributions for different base pairs may be different, instead of being uniformly counted. In particular, your program can use a base pair
*score**function γ*to differentiate score contributions from different base pairs, g.,*γ*(A*,*U) =*γ*(U*,*A) = 2, and*γ*(G*,*C) =*γ*(C*,*G) = 3, due to dif- ferent numbers of hydrogen bonds needed for these base pairs. Then the objective function of the**Structure Prediction**is the maximize the total score contribution from base pairs. How would you modify your algorithm accordingly? - In light of (2), there is another type base pairs G-U and U-G that can be They are weaker than A-U, U-A, G-C, and C-G and have been omitted from our original problem description. We may

assume that *γ*(G*,* U) = *γ*(U*,* G) = 1. More realistically, the contribution function *γ *often comes in the form of (negative) *energy function *on these base pairs. Consequently, the objective function is to minimize the total energy contribution from involved base pairs. How would you modify your algorithm to include predictions of G-U and U-G pairs?

- Structural stability is not only determined by individual base pairs but also how the base pairs are organized together. While base pairs in a structure exist individually, “stacked” nested base pairs are bio- logically preferred over those non-stacked ones. For example, in the following for sequence GCGAAACCGC, the first structure would be pre- ferred over the second:

GCGAAACCGC GCGAAACCGC

{{{….}}} {{{…}.}}

To allow structure prediction to prefer stacked base pairs, contribution score can be made more meaningful with consideration of every two stacked, nested base pairs, e.g., for the first structure predicted in the above example, the total score is contributed from the outermost pair stacked with the second outermost pair, i.e, GC-GC and from the second outermost pair stacked with the innermost pair CG-CG, etc.. You may assume that there is a predefined *stacking score scheme*, *δ*(*x, u, v, y*), in the forma of 6 *× *6 matrix, where both (*x, y*) and (*u, v*) can be one of the 6 base pairs A-U, U-A, G-C, C-G, G-U and U-G. Under this assumption, only stacked base pairs will be predicted. That is, an isolated, single base pair will not be a part of the predicted structure.

Accordingly, recursive formulation for the objective function needs to be slightly adjusted to accommodate stacked, instead of individual, base pairs. How would you accomplish this?

# 2 Requirements

For your project, you need to accomplish the following tasks:

- Design an objective function associated with an optimal solution and formulate recursive solution for the objective function;
- Write pseudocode for computing the objective function and pseu- docode for traceback an optimal solution;
- Implement pseudocode in (2) into a program in any chosen program- ming language;

#### (4) Submit a hard-copy report by due date to the instructor; it should include your completed work for parts (1) and (2);

**Email to the TA your program in a sin- gle zipped file**by the due date, which should include:*source**code*,*executable code*, and*any auxiliary programs*;- Schedule a demo date/time with and demo your program to the TA by the end of

#### Project report and demo that deviate from the requirements will not be graded.

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** [email protected] **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。