Different types of input can be provided: fastq, sra, read/count and fasta format.
The files can be provided as plain text (fastq, read/count and fasta) or compressed with gzip (*.gz extension). Please try whenever possible upload *.gz files.
Please note that sRNAbench infers the file format by means of the extension. unknown extensions are treated as read/count format. For example sample.gz would be treated as read/count format file. The program will fail if the file format is incorrectly inferred.
The recognized extensions are:
fasta: fa, fasta, fa.gz, fasta.gz
read/count: rc, rc.gz
fastq: fastq.gz, fastq, fq, fq.gz, FASTQ, FASTQ.gz, fastQ, fastQ.gz
File format explanation:
read/count format is tab separated with two columns: the read sequence separated by the read count (number of times this read was sequenced)
read read count
Note that spaces between the readID and the read count are also allowed.
sRNAbench will try to guess the adapter. Briefly, sRNAbench will align the first 250000 reads to the genome using the Bowtie seed function (
the adapters will not count for the mismatches). Then , the adapter
sequence is defined as the most frequent 10-mer starting at the first mismatch (default: guessAdapter=false)
Do not map to genome (Library mode)
The input reads are mapped directly against the annotations from the sRNAbenchDB database or against the user provided libraries. They are not mapped first to the genome.
Minimum Adapter Length
Reads can have both, a 5’ barcode and 3’ adapter sequences. For example, reads of 36 nt length, out of which 5 nt correspond to the barcode will have at the most 31 nt ‘useful’ information. In such a case, the default minimum adapter length cannot be used as this would imply that only small RNAs equal or shorter than 31nt -10nt = 21nt can be profiled.
Therefore, in such a case, the minimum adapter length should be set to 6nt allowing the profiling of small RNAs up to 25 nt. Moreover, the allowed max. number of mismatches in adapter detection should be set to 0 as otherwise the false positive detection of the adapters will increase notably (given the short sequence of only 6nt that has a much higher probability to occur by chance alone.
Number of Mismatches in Adapter
Permitting more mismatches between the read and the adapter sequence will allow to detect and trim a higher number of adapter sequences, but will also increase the number of false positive trimmings (especially if the minimum adapter length is decreased!).
Do not profile other ncRNAs
Only microRNAs are profiled. This option basically lowers the run-time notably. To reduce run-time further, the prediction of novel microRNAs can be deactivated (see Parameters section).
Recursive Adapter Trimming
If the adapter was not found within the complete read sequence using the minimum adapter length, the program will try to detect the adapter at the 3’ end of the read using recursively shorter minimum adapter lengths. For example, if the adapter min. length is 10, then in the first round the last nine bases would be aligned to the adapter (only to the first 9 bases of the 5’ end of the adapter sequence), in the second round the last 8 bases etc. No lower threshold for the minimum adapter length is established and therefore most trimmings of the last couple of bases might be by chance alone.
sRNAbench can profile small RNAs from experiments with genetic material from different organisms. Therefore,
different species can be selected by means of activating the corresponding ‘checkboxes’.
Upload User Annotations for Profiling
The user can upload annotation files for profiling, i.e. the expression values of these small RNA annotations are detected. Allowed file formats are: fasta, bed or gff.
If the genome is not in our database...
All species contained miRBase can be used even if the corresponding genome assembly is not in the our database (only the microRNA expression profiles would be generated in this case).