Web Analytics Made Easy -
StatCounter Parsing problem.. need help asap! - CodingForum

Announcement

Collapse
No announcement yet.

Parsing problem.. need help asap!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing problem.. need help asap!

    Trying to parse all the At3g57520 stuff from this sample file below -
    I have a if statement that recognizes the At3g57520 and then to get only the sequence (or LETTERS) I have this series of substitutions-

    $oseq =~ s/$val//g;
    $oseq =~ s/\-//g;
    $oseq =~ s/(\n+)//g;
    $oseq =~ s/(\s+)//g;
    $oseq =~ s/\n+//;
    $sequence = $oseq;

    It does not work properly. The $sequence has many new lines, and alot of crap and it doesnt save the sequence continously. I need the $sequence to be: "TTCTTAACGTTGTTGGTTTATCCCTTGTGATCTAGAAGCGGTGTTGAGAAGATGACGATTA".

    What am I doing wrong?
    =======================
    >Header Information, Some Text >
    At5g40390 ------------------------------------------------------------
    At4g01970 ------------------------------------------------------------
    At5g20250 ------------------------------------------------------------
    At3g57520 TTCTTAACGTTGTTGGTTTATCCCTTGTGAT
    At1g55740 ----GTCTATGAATAATTATGTCAACTATTCAG
    At5g13420 ------------------------------------------------------------
    At5g08380 ------------------------------------------------------------


    At5g40390 ------------------------------------------------------------
    At4g01970 ------------------------------------------------------------
    At5g20250 ------------------------------------------------------------
    At3g57520 CTAGAAGCGGTGTTGAGAAGATGACGATTA
    At1g55740 TCTAATCATAATATTGGTTACAAGAAATAGA
    At5g13420 ------------------------------------------------------------
    At5g08380 ------------------------------------------------------------

  • #2
    See if this helps.
    Code:
    #!/usr/bin/perl -w
    
    while(<DATA>) {
       chomp;
       $sequence .= $1 if /^At3g57520 (.+)$/;
    }
    print $sequence;
    
    
    __DATA__
    At5g40390 ------------------------------------------------------------
    At4g01970 ------------------------------------------------------------
    At5g20250 ------------------------------------------------------------
    At3g57520 TTCTTAACGTTGTTGGTTTATCCCTTGTGAT
    At1g55740 ----GTCTATGAATAATTATGTCAACTATTCAG
    At5g13420 ------------------------------------------------------------
    At5g08380 ------------------------------------------------------------
    
    
    At5g40390 ------------------------------------------------------------
    At4g01970 ------------------------------------------------------------
    At5g20250 ------------------------------------------------------------
    At3g57520 CTAGAAGCGGTGTTGAGAAGATGACGATTA
    At1g55740 TCTAATCATAATATTGGTTACAAGAAATAGA
    At5g13420 ------------------------------------------------------------
    Last edited by FishMonger; Oct 3, 2006, 09:19 PM.

    Comment

    Working...
    X