April 8, 2021: Biosequence Query Validations and Download Optimization

Biosequence Query Validations

In order to simplify the input of biosequence queries, STNext is introducing additional support and validation for EMBL, GenBank, and FASTA file formats

When the user clicks the Run Search button for a new or saved biosequence search, STNext performs the following validations against the query:

  1. Acceptable Format

  2. Multiple Sequences

  3. Line-Level Validation

1. Acceptable Format

The first validation check is for whether the query is one of the below acceptable formats. If the format is invalid, the “Invalid format” error message displays.

Notes:

Plain

MDIAIHHPW IRRPFFPFHS PSRLFDQF FGEHLLE SDLFPAS TSLSPFYLR   

PPSFLRAPS WIDTGLSEMR LEKDRFSV NLDVKHF SPEELKV KVLGDVIEV

EMBL

Sequence data having trailing line numbers within the <metadata>.

Example:

<meta data>

(ID AMU73928

.....

SQ Sequence....

</meta data>

MDIAIHHPW IRRPFFPFHS PSRLFDQF FGEHLLE SDLFPAS TSLSPFYLR   60

PPSFLRAPS WIDTGLSEMR LEKDRFSV NLDVKHF SPEELKV KVLGDVIEV 120

HGKHEERQD EHGFISREFH RKYRIPAD VDPLAIT SSLSSDG VLTVNGPRK 180

//

Sequence data having trailing line numbers without <metadata>.

Example:

MDIAIHHPW IRRPFFPFHS PSRLFDQF FGEHLLE SDLFPAS TSLSPFYLR    60

PPSFLRAPS WIDTGLSEMR LEKDRFSV NLDVKHF SPEELKV KVLGDVIEV  120

HGKHEERQD EHGFISREFH RKYRIPAD VDPLAIT SSLSSDG VLTVNGPRK 180

GENBANK

Sequence data having leading line numbers within the <metadata>.

Example:

<meta data>

LOCUS SCU49845

....

ORIGIN

</meta data>

    1 MDIAIHHPW IRRPFFPFHS PSRLFDQF FGEHLLE SDLFPAS TSLSPFYLR

  61 PPSFLRAPS WIDTGLSEMR LEKDRFSV NLDVKHF SPEELKV KVLGDVIEV

121 HGKHEERQD EHGFISREFH RKYRIPAD VDPLAIT SSLSSDG VLTVNGPRK

//

Sequence data having leading line numbers without <metadata>.

Example:

    1 MDIAIHHPW IRRPFFPFHS PSRLFDQF FGEHLLE SDLFPAS TSLSPFYLR

  61 PPSFLRAPS WIDTGLSEMR LEKDRFSV NLDVKHF SPEELKV KVLGDVIEV

121 HGKHEERQD EHGFISREFH RKYRIPAD VDPLAIT SSLSSDG VLTVNGPRK

FASTA

Single sequence with data below the line with the > sign and title text.

Example:

>crab_mouse ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN) (P23).       

tgcaccaaac atgtctaaag ctggaaccaaaa ttactttctttg aagacaaaaactttca

aggccgccac tatgacagcg attgcgactgtg cagatttccaca tgtacctgagccgctg

caactccatc agagtggaag gaggcacctggg ctgtgtatgaaaggcccaattttgctgg

gtacatgtacatcctaccccggggcgagtatcctgagtaccagcactggatgggcctcaa

Back to Application Updates

2. Multiple Sequences

The second validation check is for whether a single query field has multiple sequences (with the > tag); if so, the “Multiple sequences” error message displays with the line numbers shown/highlighted.

Back to Application Updates

3. Line-Level Validation

If the query passes the first two validations, the following are checked for every line of the file:

Back to Application Updates

Download Optimization

Back to Application Updates