The file has millions of lines, arranged in 2 line blocks, where 1st line in each block is a header, marked with a >
, followed by two lines of letter characters.
In linux and/or bash, how to split the file into smaller ones, preserving the 2block structure? Ideally, with some flexibility over how many are output, based on number of output files or the number of blocks per smaller file?
short example:
>k99_12
CCTTCTTCAATGCCAATACCCTCGAAGAATTGCACCGCCTCGAACAAACCACATGACACACCCACCGTCACTGGCTGACATTGCCGCACAACTTGAAGCCTATGACCCGCAGGCACTACCTGCCAATGAGGTCTTGAATTTTCTGGATCACTTGGTGACGCCAGTGCACGACTCAGAATCAGTCGACATCTTTGCAGCATTGGGCAGAGTCACTGCGCAA
>k99_27
ATCCAAGCCAGAGAATATGCCTACCCGCATCGCACCGCGATCTCGCAAATGTCGTGTAATCGCGCGGGTATCAACACCCTGAATGCCAATGATTCCTTGCTCAATCAATTCCGCCTCAAGATCATTTTGTGCGCGCCAATTTGATGTCAAAGAGGATGGGTTTCTAACAACAAACCCTGCCACCCAAATCTTTGATGACTCATTATCTAA
>k99_31
CCATTGCGCAAACGGACTGCCGGACACCAAGTGCACCTCGTGCGACAGACCATACTGGTCGTCATAAGAGAGTTCAGGGTTTTCGCGGTGGTCGGCCATGCCGTCCACATTGTGCACCTGCTGGTGCAGCGAACCGCCCATGGCCACATTGATTTCCTGGAAGCCACGGCAAATCCCCAGCAGGGGCACACCCTGCGCGACGCAGGCGCGTACCAAGGGCAAGGTCAGGCTGTCGCGGTGCGGATCCAGCGGCAGACGCGGA
>k99_35
AAAATTGAGTTTGAAGGAATTTCGCATTTCATCAAAAATCAACACGACGAGAGTGGTTCAACAACTATAAAACGTTGGGCAAAGGAATTTATGGACGAAATAAATTGTCCTGTTTGCGAAGGTTCACGATTAAAAAAAGAAGCTTTATTTTTCAAAATTAATGGAAAAAACATCACTGAATTATGCAATATGGATATTTCGGATGTCACGGCTTGGTTTTTGGAATTGAACACCCATTTATCAGATAAACAAAAGACTATAGCGACGGAGGTTATCAAGGAAATAAAAGATCGATTGGCCTTTTTAATGAATGTAGGTTTGGATTATTT
>k99_40
GAGGCCGGCGAAGGCGCGGTGATCGACGAGGAGGACGACGACGCTGGCGCGGGCGAGCGCGTCGGCGAGGGAAACGAGTTCCGCCCTGCCCTGCAGCGCTTTCGGCAAGGCGGTCACGTGCGGCTCGACGGCCAGGACCTTCAGGCCGGCGTCGGCGAGGTGCGCGGCGATCTCGACGGCGGGAGATTCACGGAGGTCGTCGACGTTCGCCTTGAAGGCGAGGCCGAGGCAGGCGACGGCGGCGCCAGAA
>k99_42
AGACCAAATCGCACGGCTAGCAGGATCAAAACGCAAGATGCGCGGGTCTCTTACTTCATCGCGCAGAGTAGGGCGCATCAGCGCGACTTTTTCGCGCACGTCATCGGCGCCTTTGCGGCCGTCTATGTTGAGGTCAAACTCAACCACCACCACCGACACGCCTTCATAACTGCGAGATGTGAGGGCATTGATACCGGCAATGGAATTGACTGCTTCTTCCACTTTTTTAGTCACCTCGCTCTCGACAATTTCAGGAGAGGCGCCTGGATATTCGGTGCTCACGACAACGACGGGCAAATCAATATTAGGAAACTGGTCGATCTTGAGGCGCTGATAAGAGAACAAGCCCAGCACCACAAAGGCAAGCATCACCATCGTTGCGAAGACGGGGTTTTTGAGGCTGACTTTGGTGA
>k99_75
AAAGGTAGCATTGAAGATTATACGCAGTTGTTTCAGGCAGCAGCACAAATTGCGAATGAATCGGCACATATGCAACTCGATATAGATGTCGAGGGATTCAACGAATTTGCTACGGCGGCGGACGACCTCAGTAAGTTATTCACTGGTTTCATTTTGAAGTTGGAGAATGTGAGTATCATCGACGATACTGTATTTTTGACTGCGGTGGCAAATGCTCTCTCGAAGATAAGCAATTTGTCGAAAGTGTTTGGTAAGTTCAAAGAAACTATATTGGGCACTTCGACAATTCGTTTGCCCAAATCCGCACATGATGCATCGGTTATACTGAAAGATGTGGTTGGGCAAATCAATTGTGCAATGACGTATATAAACCATTTTGTCGATTCGAGTGTTCCCGCACCAAGTGTTGCGGAATTATCGAAAGAAGAGAAGAATATAATCGACGCTGCGGTGACAACCATTCACAATTGGAATACATTGTGTGACCAAGGAGTTAGTATTGCCATGTCAAGCGACCCAGATATTCAATTTGTTAGTAATGCGAATCAATCGCT
>k99_76
TCGTAAGCTAACTAAATCAACTGAACAATCTATCACCAATAGTATGTAATCAGAAATCAACTTAAATCTCATATATTAATGAAAGTTTTATCAATTGTTGGAACAAGGCCGGAAATAATTAAGTTATCAAGAGTGTTTCATGAACTTGAAAAATATACTGAACACATTTTAGTACATACAGGTCAAAACTTTGATTATGAACTAAATGAAATATTTTTCAATGATCTTAAAATTAAGAAACCTGATTTTTTTTTAAATGTTGTTGGCGAATCTTTAGCTGATACTATTGCAAACATAATTTCCAAATCCGATAAAGTTCTAGAAAAAATAAAACCA
>k99_79
GATGTACTGGTACTCGTTGTAGGTCGTCGTCTTGCTACCTCTGCTGCTGTCGTTCGTGGCCTCGTTGCGGTGGTCGTAGTTGTTGTGGTCGCTCTCGCAGGCCCGCCGCTCAGAGCTTGGAACGAGTTCTTGGAGACGAAGTCTCCCAGCGTTGCGCCGCGAGGCGTCGGGCGAGGTCGAGCTGCGACCTTCGCCTGGACAAAGCCGTCCTGGACCAGAGAGATGTCCATCCGCTGCGGCGGCCCCTCTTCGACGCTCCTGACGGCGCCTGTCGTTGGCCTCTGCGGGCAGGCTCGGGAGGAGTGACCGGTCTTTTTGCAGATCCAGCACTTCCGGAGCTCCCGTGCGACCTCAGGCAGGGGGCACTTGATGGCGGCGTGCGACTCGCC
>k99_83
CCCGAACACAATCGCTTTAGTCGAGCGGGAAACGCGGTGGGATTATGCGGACCCAGCCTTTACGAACGGGATCGCGGAAGACTTCTCCATCGACCAGTCTACTCACTCGCTCTTCGGCGCCTCGAAGGTTGCCGCCGACGTTTTGGTGCAGGAATACGGCCGCTATTTTGGAATGCCTACTTGCGTGCTGCGCGGCGGCTGCCTCACCGGCCCGAATCACAGCGGCGTCCAG
>k99_90
GGCTGACGTACAAGATGCGCCGTCCGTGGTCACGCGGCACGCTGGGCGTGGTGTTCAACGCGTTGTATGCCGTCATGTTCCTGTTCACGATCACGGTGATCGCGTCGATTCTCCACTCGTTCGAGTTCAACGGGCTATCCATCTTCTTCTTCCTGTTCTTCCTGTCGCTCGTGACCTTCTTCGGCCTGAAGATTCGCAATACGCGCCGCGAGCTGATGGTGGTAGAGGCGCGCGTCGGCATCGTCGGCACGATCGCGGACATCCTGTTTCTCCCCATGATACGCGCCGGCCGCTGGGTCGCGCTCCGGGCGCCGCGGGCCATCGCCACGCGGCCGGTCCGGACCATTTCCATGATCCCGTACGGCCGAAGCACCTCGAGCAGACCGTCAATCTTGTCTTCCGTACCGGTGATCTCGATGATCAGCGAATCCACCGCCACGTCGATCACCCGCGCGCGGAACACCTCGGCGAGCTGCATGACGTGCGGCCTGGATTCCGCCGACGCGGCAAC
>k99_100
AAAATACAGGTCTTTCAATGATGAAAGAAATGGATGATGCAAAAAATCTCGTTGGAATTGATTATACGAAGCATTTTGCTGATTTGGTAGAGAAAGCAGATCCTTTTGGTTCTAAAGCAGCGTTTATGCCAATGAAAGTAATTACTGCTTTGGCTTTGTTTGGTGAAAACGGCTCAACGAAAGCATTGGAAAGCTCATTAAAAAGAGGTGGAAGTGAAGAAAATTTAAACGATCTTTATTTAAACAGAGTAGGTGAGTACAAATGGAATGGTAAAACCTGGATTAAAAATAAAGAAGTTAAAGATAAAATTATTTTACGCTTTCCATCTTCTAATGCTAAAACTGTAAATAACGCTTCTTATGAAATTTCATTTGTGAACTATGCTGGAGCAGGTTTGCCTGATGA
>k99_104
GGTTCCATACATGTAACGCCAGGAATAGTGGACAACATTTGGTGCATCAGTGCGCCACGACGAGCAAATGCCTCACGCATCATGTGCACCGCTGATAAATCACCACTGACTGCAGCAAGTGCTGCAACCTGTGACACGTTGGCAACGTTTGACGTGGCATGCGATTGGAAGTTTGTTGAAGCCTTCATGATGTCTTTCGGTCCA
>k99_108
CCGCAGCATCTGACCGAGATCGAAGGGGCGGCCGTAGGGGCGCCGGCTGCTGTGCTGGCGCGCTGGACGGCGGCGGGCATGGCGCCGGCAGTCGTCATCGGCGACGGCGCGTTGGCCTTCGAGTCCCTCCTCGCCGGAGAGGCCCGCGTGTGTGGCGCGCAGCCGCTCGCCGGGACAATCGGACGAATCGCGGCGATCCGCGCGGATCGGGGAGAAGCGGTGGAGCCACACGCCGTGCGCGCGCTGTACGTCCGGCGTTCTGACGCGGAGGTCGAGAGGGACCGTGCCCGCTGATTCGAATGGTGCGGCGCCCCTCGCGCTGACGGTCGATCTCTTGTCGTCACTCGACGAACTGGACGAGGTGATGGCGGTCGA
>k99_112
ATGTCGAGCGCCAGCATTAGCGGGCGGGCGGACAAGGATGTTGATGCGCGCTCAATCGCTTTGGTGAAAACCGGTGACGAAGAAGCCAGCGCTGGCCGCGTCGATAGCGCAATTGGCTGGTATGAAACTGCGCTCGCGGTCGACCCGCGCAACCGTGCCGCTTATGTCGCCATGGCGCGCGCCGTAAAATCCCAGGGGTTG
>k99_115
GCAGTGGATGCCATACCAGAAAAAGTCGGGATGGTGCGGCTCGAATTCGGCGGGCCCGATGGAGTAGACGGAGGTGATGTCACCGATCTTGGCGAACTTTGAGGCGAAGGTTTCAGGTGGATACCTAAGGGACGAAGAACTGAAGAGTGGGATTTTGTGTTCGCGGGCGAGGCGGACGATTTCGACGGCGTCTGTAAGAGAGCCGGCGAGGGGCTTGTCGATGAAGACGGGTTTTTTCGCGGCGAGGGTCTGGCGGAATTGTTCGAGGTGGGGGCGGCCGTCGACGCTTTCGATCAGA
>k99_117
CGTCTCTGAGCTTTTCAGCTTCCATCAACTTGGCTTTTCCTATGGCCGCACTGTCGGAAACAATCGCAATCGGCCCTAACCTGGCTCCCAAGCAAGCCATGTGCATCGCCACGTGCTCAGCGGCCCCGGACGAGATTCGATATCCCACCGGAATGGCGACCGCCATGGATCTGACACAAACCTTCCCGTCCTTTCGCCATACCGTCCGTCGCTTCGTACGTCGTCTGGTGTGGCGTCGCTGTCCTCCGTGGTGCTGTAGACGTTCTCGATGGGGTCGTCGCCCTGGAAGTACTGGAGGTGGCTGTAGTCGCGCGGGAC
>k99_121
ATGAGTACAACAGTCAGTCATAACTGCGTAAGGGGCACCTGTAAATCTAGCCAATGCATGTTCAAATTCTAGTATTTTCTCAAACATTTTCGCTCAAGTGATCTTGTTTAATTTCTCGCACTGGGCAATTTAGTAATTCTGCTATAGTATTTTTAACTGCTATTCTTTTATTATTCCAATTTCTTATTAGTATAGCACGTCGTCCAATTTCTTCCAAACTTAATTCTTCTTCGCAACCACTTTTTAATTCAGCTTCTAATGTCCAGATAGTATCATGTATTCTTTTGAGCTCATCAAAACACGTTTTAACAAGAGATAAATCGAATTGTGAAGTTTGATCTTGATACCAATTAAGTTCCTCTTGATTGCTGTGTGTCCGATCCCACTTAACTTCGGCTATGGCTAATCTATCAAATAGTTCAATTACTGGAAAGTGGTAACTCATAGATATAGTCCTTCAATTTTTTCTGGA
>k99_135
AAAAGACTGTTGGCTTCTCCCAAAAAATTTACTTAAAAAATAATATTCAGACAACAATTCTTGAAAGTGCTATGCTTTGAAAGTTGTGTTTTTTTTAATTATGGCCAAAGAAAAAACAATACACACAAAAAAAGTTTGAAACATGGCCGATTTTCGTTTTAACGTGAAAGCTGATACCACAGATTAGATATAGAATAGATAGAGGCTTCCTAAATATCAGTAGTTCCCGGTCAAAGGGGCAGGATCAAGAGGGTTGCGGGGTTTCCTCTCTTCACATTGTACATTGTACACCTTGGTTGTAATAATAGAATATGTAACACCTTGT