Write a PERL program that generates a random DNA sequence (DNAR) of 400 nucleoti
ID: 3759199 • Letter: W
Question
Write a PERL program that generates a random DNA sequence (DNAR) of 400 nucleotides.
A.) Using BLAST, explore how well DNAR aligns with DNA sequences of known organisms. As
DNAR is random, it would come as no surprise if none of the alignments would be particularly
good and yield a low similarity score.
B.) Next, choose three (3) different DNA sequences (DNA-1,
DNA-2, and DNA-3), each 400 nucleotides long, from different existing organisms and use
BLAST to explore their alignments with known sequences. No surprise, their alignments yield
much higher similarity scores. We would expect their alignments to yield higher
similarity scores even when compared with sequences from other existing organisms such as human, cow, and cat.
WHY??
You task is to explore how DNAR is structurally different from the three sample human, cat, and cow DNAs. For
example, you may conduct a statistical analysis on the occurrence of the four bases A-T-C-G, or
the likelihood of certain patterns. Use your knowledge of PERL to develop methods that will
facilitate your analysis.
Explanation / Answer
working perl code for DNA sequence
#!/usr/bin/perl
use warnings;
use strict;
srand(time|$$);
print "1) Please type the number of iterations (How many random sequences do you want):
EXAMPLE: "10" ";
my $iterations = <STDIN>;
chomp $iterations;
print " 2) Please type the length of the random DNA strings (how many nucleotides length):
EXAMPLE: "50" ";
my $length = <STDIN>;
chomp $length;
print " 3) Please type the probability distribution of A content:
REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"
EXAMPLE: "0.25" ";
my $A_content = <STDIN>;
print "
########################################################################
# From a value of "1.00" as total probability, there are: ", (1-($A_content))," available
######################################################################## ";
print " 4) Please type the probability distribution of T content:
REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"
EXAMPLE: "0.25" ";
my $T_content = <STDIN>;
print "
########################################################################
# From a value of "1.00" as total probability, there are: ", (1-($A_content+$T_content))," available
######################################################################## ";
print " 5) Please type the probability distribution of G content:
REMEMBER THAT THE SUM OF THE FOUR PROBABILITIES MUST BE EQUAL TO "1.00"
EXAMPLE: "0.25" ";
my $G_content = <STDIN>;
print "
########################################################################
# From a value of "1.00" as total probability, there are: ", my $C_content = (1-($A_content+$T_content+$G_content))," available
######################################################################## ";
print " 6) Setting the probability distribution of C content ";
print $C_content," ";
#### Ask the user for the name of the fasta header
print " 7) Please, type the name of the fasta header for each sequence (is not necessary to put the >):
EXAMPLE: "random_seq" ";
my $fasta_header_name =<STDIN>;
print " 8) Please, type the name of the output file:
EXAMPLE: "random_sequences_set.fa" ";
my $output_file_name =<STDIN>;
chomp ($A_content,$T_content,$G_content,$C_content,$fasta_header_name,$output_file_name);
my @distribution = ($A_content,$T_content,$G_content,$C_content);
print "
------------------------------ RESULTS SUMMARY ------------------------------
SUCCESS: Here is the $iterations iterations of $length nucleotides length of
DNA strings in FASTA format with probabilities of:
A = $A_content
T = $T_content
C = $G_content
G = $C_content
EXPORTED TO FHE FILE: "$output_file_name"
----------------------------------------------------------------------------- ";
# Name of the output file
my $output_file = "$output_file_name";
# Set the file handle "OUTPUT".
open (OUTPUT_SEQ, ">$output_file");
for(my $k=0;$k<$iterations;$k++){
print OUTPUT_SEQ " >",$fasta_header_name,"_",($k+1)," ";
for(my $i=0;$i<$length;$i++){
print OUTPUT_SEQ distribution(@distribution);
}
}
exit;
sub distribution{
my @probability = @_;
unless ($probability[0] + $probability[1] + $probability[2] + $probability[3] == 1){
print "Sum of probabilites must be equal to "1.0"! ";
exit;
}
my $randnum = rand(1);
if($randnum < $probability[0]) {
return 'A';
}elsif($randnum < $probability[0] + $probability[1]) {
return 'T';
}elsif($randnum < $probability[0] + $probability[1] + $probability[2]) {
return 'C';
}else{
return 'G';
}
}
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.