5

Divide a multiple FASTA file into separate files by retaining their original nam...

 2 years ago
source link: https://www.codesd.com/item/divide-a-multiple-fasta-file-into-separate-files-by-retaining-their-original-names.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Divide a multiple FASTA file into separate files by retaining their original names

advertisements

I am trying to work with an AWK script that was posted earlier on this forum. I am trying to split a large FASTA file containing multiple DNA sequences, into separate FASTA files. I need to separate each sequence into its own FASTA file, and the name of each of the new FASTA files needs to be the name of the DNA sequence from the original, large multifasta file (all the characters after the >).

I tried this script that I found here at stackoverflow:

awk '/^>chr/ {OUT=substr($0,2) ".fa"}; OUT {print >OUT}' your_input

It works well, but the DNA sequence begins directly after the name of the file- with no space. The DNA sequence needs to begin on a new line (regular FASTA format).

I would appreciate any help to solve this. Thank you!!


Do you mean something like this?

awk '/^>chr/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' your_input

where the new file that is created for each "chromosome/sequence/thing" gets a blank line at the start?


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK