Open main menu

CDOT Wiki β

Changes

BLAStoise

19 bytes removed, 19:55, 12 February 2017
Project Name Goes here
----
<h1h4>DNA Edit Distance and Sequence Alignment Tool - Analysis by Jonathan D.</h1h4>
This program is a bioinformatics tool which will calculate the edit distance between two sequences. Furthermore, if the option is selected, then the program will also perform a sequence alignment on the two sequences. The two sequences must be given in FASTA format, but there are already ready to use sequences within the test directory of the project.
Original source code can be be found [https://github.com/kbiscanic/bioinformatics here].
<h3h5>Compilation</h3h5>
To compile the program, navigate to the project’s src directory and run the following command:
$ g++ -std=c++11 -g -pg -O2 BasicEditDistance.cpp main.cpp Parser.cpp Result.cpp Sequence.cpp Solver.cpp SubmatrixCalculator.cpp Writer.cpp
<h3h5>Running the Program</h3h5>
There are three options to run the program, as noted in the github readme:
Testing of the program was done by using the "a" option which utilizes the Masek-Paterson algorithm to calculate edit distance and perform alignment sequencing on maximum sequence length of 100 nucleotides. (FASTA files were included in the project under the test directory)
<h4h6>INPUT FILE (test-100.fa):</h4h6>
>Chromosome_3565586_3565685_0:0:0_0:0:0_0/1
$ ./a.out a ../test/data/test-100.fa test.maf
<h4h6>SCREEN OUTPUT:</h4h6>
<pre>
Submatrix dimension: 1
</pre>
<h4h6>TEXT OUTPUT (test.maf)</h4h6>
a score=58
As you can see, the two sequences are now aligned which will show conserved regions of nucleotides between the two.
<h3h5>Analysis</h3h5>
In order to perform gprof analysis, sequence length of 10000 was used:
At 50000 nucleotides, it is even more clear that the Solver::fill_edit_matrix() function is the dominating term of the execution since it's taking up 85% of the elapsed time.
<h4h5>Below is a snippet of the code:Code Snippet</h4h5>
<pre>
26
edits