2014年6月12日 星期四

Read a fasta file in C#

a fasta file containing several DNA sequence like:
>1
taatgtttgtgctggtTTTTGTGGCATCGGGCGAGAATagcgcgtggtgtgaaagactgtTTTTTTGATCGTTTTCACAAAAatggaagtccacagtcttgacag
>2
gacaaaaacgcgtaacAAAAGTGTCTATAATCACGGCAgaaaagtccacattgaTTATTTGCACGGCGTCACACTTtgctatgccatagcatttttatccataag
>3
acaaatcccaataacttaattattgggatttgttatatataactttataaattcctaaaattacacaaagttaatAACTGTGAGCATGGTCATATTTttatcaat

 split the file on newline, and look for a > character to determine the name.

if the sequence data is all in one line(no line breaks), we should just store that sequence information, like:
var reader = new StreamReader("C:\myfile.fasta");
while(true)
{
    var line = reader.ReadLine();
    if(string.IsNullOrEmpty(line))
        break;
    if(line.StartsWith(">"))
        StoreProteinName(line);
    else
        StoreSequence(line);
}

Reference:
http://stackoverflow.com/questions/3097051/best-way-to-read-a-fasta-file-in-c-sharp