Web Analytics Made Easy -
StatCounter help with de-bugging perl script - CodingForum

Announcement

Collapse
No announcement yet.

help with de-bugging perl script

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with de-bugging perl script

    I have written a perl script, and need help to debug it. I am pretty new to Perl, and therefore do not know quite where I have gone wrong. Any help would be MUCH appreciated. The code will be put in below.

    Essentially, it will first read in two protein sequences from an alignment file. It will then go to a subroutine "sub readBlosum", which will read in a scoring matrix from a file, with Z being one axis and X being the other, to get a value for each pair of protein residues. This subroutine will then score the sequence against the matrix. Finally, a bunch of calculations will be done, and then it should spit out a number at the end. However, my Perl debugger is giving me errors, and as I am new to this, I am not quite sure what is wrong.

    Code:
    #!/usr/bin/perl -w
    
    use warnings;
    use strict;
    
    my $protein1 = ("");
    my $protein2 = ("");
    my @seq1 = ("");
    my @seq2 = ("");
    my @line = ("");
    my @seq_line= ("");
    my $readBLOSUM = "blosum62.txt";
    #parse sequences by residues
    while (<STDIN>)
    
           { if ($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
             {$protein1=$2;}
            elsif ($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g)
                {$protein2 = $2;}       
                
                
                    
            if ($_ =~ /(^\|*|.|:)/n)
                    {null;}
            else
                    {
                    @line = split($_,/\s+/);                  # split the sequence and sequence length
    #number parts of the line to an array with two entries
                    @seq_line = split($line[1],//);     # put sequence to an array by character
                    }
           
           if ($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g)
                {@seq_line-> @seq1}
             elsif ($line[0] =  m/^(Hs[1-9]{8})(\S+)(.*)/g)
                {@seq_line-> @seq2}
                
                
                
    readBlosum;            
        
        
        sub readBlosum{
    open BLOSUM , "blosum62" or die "Unable to open inputFile.";
    my $line = ;
    my @acidIndices = split(/ /, $line);
    my %blosumMatrix = ();
    
    while () {
    my($line) = $_;
    my @scores = split(/ /, $line);
    my $currAcid = $scores[0];
    for ( my $i = 1; $i < scalar @scores; $i++) {
    $blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];
    
    }
    
    close BLOSUM;
    return \%blosumMatrix;
    }
    
    my $blosumMat = &readBlosum();
    $blosumMat->{”Z”}->{”X”};
    
    #calculate weights of sequence 1/sequence 2 alignment and sum
    #S1 is the similarity score
    
     total;
     lengthseq;
    expscore;
    s1norm;
    unrelscore;
    unrelnorm;
    rawdist;
    scoredist;
    
    
    
    $scoremat=$blosumMat;
    
    sub total {my ($s1, $scoremat);
    		my $s1 = 0;
    		foreach $scoremat {
    		$s1+= $scoremat;}
    		return ($s1);}
    				
    			   
    
    #calculate length
    sub lengthseq{
    	$length = @seq1 ;
    	$len=length($length);}
    	
    
    #calculate expected score
    sub expscore {$expscore= (-0.52) * $len}
    
    #calculate Normalized score
    sub s1norm {$s1norm= $s1 - $expscore}
    
    #calculate unrelated score
    sub unrelscore {$unrelscore=(([7769 + 9529)/2)}
    
    #calculate normalized unrelated score
    sub unrelnorm {$unrelnorm=$unrelscore-$s1norm}
    
    #calculate raw distance
    sub rawdist {$rawdist=-ln($s1norm/$unrelscore)*100}
    
    #calculate scoredist estimate
    sub scoredist {$scoredist=1.1775*$rawdist}
    print $scoredist;

  • #2
    there are few problems but first you don't close some '}', one to the while and one when you define readBlosum sub.
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my $readBLOSUM = "blosum62.txt";
    
    my $protein1 = ("");
    my $protein2 = ("");
    my @seq1 = ("");
    my @seq2 = ("");
    my @line = ("");
    my @seq_line= ("");
    
    #parse sequences by residues
    while(<STDIN>){
        if($_ =~ m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
    	$protein1=$2;
        }elsif($_ =~ m/^(Hs[1-9]{8})(\S+)(.*)/g){
    	$protein2 = $2;
        }
    
        if($_ =~ /(^\|*|.|:)/n){
    	null;
        }else{
    	@line = split($_,/\s+/);                  # split the sequence and sequence length
    #number parts of the line to an array with two entries
    	@seq_line = split($line[1],//);     # put sequence to an array by character
        }
    
        if($line[0]=m/^(At[1-5][g][0-9]{5})(\S+)(.*)/g){
    	@seq_line-> @seq1
        }elsif($line[0] =  m/^(Hs[1-9]{8})(\S+)(.*)/g){
    	@seq_line-> @seq2
        }
    }
    
    readBlosum;
    
    sub readBlosum{
        open BLOSUM , "blosum62" or die "Unable to open inputFile.";
        my $line = ;
        my @acidIndices = split(/ /, $line);
        my %blosumMatrix = ();
    
        while(){
    	my($line) = $_;
    	my @scores = split(/ /, $line);
    	my $currAcid = $scores[0];
    	for(my $i = 1; $i < scalar @scores; $i++){
    	    $blosumMatrix{$currAcid}{$acidIndices[$i]} = $scores[$i];
    	}
    
    	close BLOSUM;
    	return \%blosumMatrix;
        }
    }
    
    my $blosumMat = &readBlosum();
    $blosumMat->{’،ةZ’،ة}->{’،ةX’،ة};
    
    #calculate weights of sequence 1/sequence 2 alignment and sum
    #S1 is the similarity score
    
    total;
    lengthseq;
    expscore;
    s1norm;
    unrelscore;
    unrelnorm;
    rawdist;
    scoredist;
    
    $scoremat=$blosumMat;
    
    sub total {
        my ($s1, $scoremat);
        my $s1 = 0;
        foreach $scoremat {
    	$s1+= $scoremat;
        }
        return ($s1);
    }
    
    
    #calculate length
    sub lengthseq {
        $length = @seq1 ;
        $len=length($length);
    }
    
    #calculate expected score
    sub expscore {
        $expscore= (-0.52) * $len;
    }
    
    #calculate Normalized score
    sub s1norm {
        $s1norm= $s1 - $expscore;
    }
    
    #calculate unrelated score
    sub unrelscore {
        $unrelscore = ((7769 + 9529)/2);
    }
    
    #calculate normalized unrelated score
    sub unrelnorm {
        $unrelnorm=$unrelscore-$s1norm;
    }
    
    #calculate raw distance
    sub rawdist {
        $rawdist=-ln($s1norm/$unrelscore)*100;
    }
    
    #calculate scoredist estimate
    sub scoredist {
        $scoredist=1.1775*$rawdist;
    }
    
    print $scoredist;
    I didn't run this script, just indented, please post blosum62.txt or some similar data.

    best regards

    Comment


    • #3
      .....
      Last edited by roro; Apr 16, 2009, 06:27 PM.

      Comment


      • #4
        Originally posted by roro View Post
        Attached is BLOSUM62.txt. I would really appreciate it if you could help me get this running.

        Thanks a lot.
        you get data from two sources or I'm wrong. First is from stdin and second from blosum62.txt which is a matrix.
        I'm not familiar with this domain, can you also provide some input example from stdin?

        best regards

        Comment


        • #5
          .....
          Last edited by roro; Apr 16, 2009, 06:26 PM.

          Comment


          • #6
            Originally posted by roro View Post
            Sorry, here is the sequence file! stdin would be this file being read into the script from command line.
            I try to see how it work and come back with results.
            I hope this is not a homework assignment,

            best regards

            Comment


            • #7
              .....
              Last edited by roro; Apr 16, 2009, 06:24 PM.

              Comment


              • #8
                Did anything ever happen with this?

                Getting the input working is easy, but I never could figure out how the calculations were intended to work.

                Comment

                Working...
                X