Python代写 | BINF6111 – Protein sequence evolution and alignmen

本次Python代写是完成蛋白质序列进化模拟程序

BINF6111 – Assignment 1
Assignment 1 – Protein sequence evolution and alignment
Part 1
implement a protein sequence evolution simulator [35% of assignment mark]
Part 2 (alignment)
implement optimal pairwise alignment by dynamic programming [45% of assignment mark] and
Part 2 (evaluation and report)
run your software from Part 1 to simulate the evolution of a protein sequence over increasing evolutionary time, and run your
alignment software from to align the mutated sequences with the original, then write a report including a plot of percentage
identity against evolution time and discuss your results. [20% of assignment mark]
Each part builds on the previous part and you should complete each part before moving on to the next.
The marking will be based on having a running program for each part and writing a proper report. Marks will also be given for good
programming and writing style. Marks will be deducted for program bugs and other errors.
The standard warnings against plagiarism apply and will be enforced. In particular, you must not re-use or copy software from another
student, or that you found on the Internet, in a textbook, or elsewhere to implement either parts 1 or 2 of this assignment. Doing so will
simply deprive you of the essential learning experience that can only come from solving a programming problem and building working
code by yourself to implement your solution.
Penalties for late submission of assignment parts will be incurred at the rate of a reduction of 1 mark per day in the maximum possible
mark for a part, up to the number of marks for that part.
Part 1
Description of the task:
You should download the file of sample amino acid sequences to help in development of your program. These sequences should be in
separate files in FASTA format (see below). You also need to download the amino acid mutation matrix (see below) and add it to your
code. This is an asymmetric 20×20 matrix. A matrix entry M_i,j in column i and row j denotes the probability that the column amino
acid will mutate to the row amino acid. For ease of handling, the probabilities are expressed as counts per 10000. So “56” means the
probability 0.0056 in this matrix. The columns and rows are labelled with a single letter amino acid code, as follows:
Code Abbreviation Amino acid
A Ala alanine
R Arg arginine
N Asn asparagine
D Asp aspartate
C Cys cysteine
Q Gln glutamine
E Glu glutamate
G Gly glycine
H His histidine
I Ile isoleucine
L Leu leucine
BINF6111 – Assignment 1 9/6/20, 9:32 am
http://www.cse.unsw.edu.au/~bi6111/spec11.html Page 2 of 3
K Lys lysine
M Met methionine
F Phe phenylalanine
P Pro proline
S Ser serine
T Thr threonine
W Trp tryptophan
Y Tyr tyrosine
V Val valine
N.B. this is a subset of the FASTA amino acid code letter set (it does not include B,U,Z,X,*,-).
Amino acid mutation matrix:
,A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V
A,9867,2,9,10,3,8,17,21,2,6,4,2,6,2,22,35,32,0,2,18
R,1,9914,1,0,1,10,0,0,10,3,1,19,4,1,4,6,1,8,0,1
N,4,1,9822,36,0,4,6,6,21,3,1,13,0,1,2,20,9,1,4,1
D,6,0,42,9859,0,6,53,6,4,1,0,3,0,0,1,5,3,0,0,1
C,1,1,0,0,9973,0,0,0,1,1,0,0,0,0,1,5,1,0,3,2
Q,3,9,4,5,0,9876,27,1,23,1,3,6,4,0,6,2,2,0,0,1
E,10,0,7,56,0,35,9865,4,2,3,1,4,1,0,3,4,2,0,1,2
G,21,1,12,11,1,3,7,9935,1,0,1,2,1,1,3,21,3,0,0,5
H,1,8,18,3,1,20,1,0,9913,0,1,1,0,2,3,1,1,1,4,1
I,2,2,3,1,2,1,2,0,0,9871,9,2,12,7,0,1,7,0,1,33
L,3,1,3,0,0,6,1,1,4,22,9947,2,45,13,3,1,3,4,2,15
K,2,37,25,6,0,12,7,2,2,4,1,9924,20,0,3,8,11,0,1,1
M,1,1,0,0,0,2,0,0,0,5,8,4,9875,1,0,1,2,0,0,4
F,1,1,1,0,0,0,0,1,2,8,6,0,4,9944,0,2,1,3,28,0
P,13,5,2,1,1,8,3,2,5,1,2,2,1,1,9924,12,4,0,0,2
S,28,11,34,7,11,4,6,16,2,2,1,7,4,3,17,9840,38,5,2,2
T,22,2,13,4,1,3,2,2,1,11,2,8,6,1,5,32,9869,0,2,9
W,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,9976,1,0
Y,1,0,3,0,3,0,1,0,4,1,1,0,0,21,0,1,1,2,9947,1
V,13,2,1,1,3,2,2,3,3,57,11,1,17,1,3,2,10,0,2,9901
,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000,10000