SSR mining
Perfect, imperfect and compound SSRs were in-silico mined using the SciRoKo SSR-search module (http://kofler.or.at/bioinformatics/SciRoKo). A minimum of four repetitions together with a minimum length of 15nt was requested; so any sequence was considered as a perfect SSR where a motif was repeated at least 15 times (1nt motif), eight times (2nt), five times (3nt) or four times (4-6nt), allowing for only one mismatch. For compound repeats, the maximum default interruption (spacer) length was set at 100bp.
SSR motif frequency and distribution
On the whole, about 403,000 perfect SSR motifs were identified, including 42,000 compound SSRs. The SSR loci identified were classified on the basis of the repeat motif as well as the number of repeat units. Di-nucleotides are the most frequent (60.0%), followed by tri- (23.7%), tetra- (6.8%) and mono-nucleotides (3.4%); penta- & hexa-nucleotides are rare (2.7 and 3.3% respectively). The imperfect SSR motifs identified were more than 189,000.
Genomic distribution of perfect SSRs.
Frequency of the main identified SSR motifs (considering sequence complementary)