ATtRACT database can be consulted through three types of search.
Search Tools
- Search specific entries of the database
- Search motifs
- Search sequences
Database Search
- Official Gene name e.g: "SRSF1" or Synonyms e.g: "SFRS1"
- Gene id e.g: "ENSG00000136450"
- Minimum length of the motif or Maximum length of the motif
- Type of experiment (multiple choice allowed)
- Organism (multiple choice allowed)
- Domain (multiple choice allowed)
Searches can be restricted by using combinations of queries. I.e, all Human motifs belonging to "PCBP2" gene ranging from 6 to 8 nucleotides can be retrieved by entering "PCBP2" in the gene name, selecting "Homo Sapiens" under the "organism" checkbox, and typing "6" in "MINimum length of motif" field and "8" in "MAXimum length of motif" field.
Gene ID Search
User can search entry in ATtRACT using gene id. Refer to the following table for knowing the database from where gene ID is extracted:
organism | Database | ID |
Homo sapiens | Ensembl | ENSG.... |
Mus Musculus | Ensembl and PDB | Ensembl: ENSMUSG... PDB: PDBID_CHAIN I.e 3IVK_H |
Drosophila melanogaster | Ensembl | FBGN... |
Saccharomyces cerevisiae | Ensembl | YLR... |
Caenorhabditis elegans | Ensembl | WBGENE... |
Bos taurus | Ensembl | ENSBTAG... |
Bombyx mori | Ensembl Metazoa | BGIBMGA... |
Aspergillus nidulans | Ensembl Fungi | CADANIAG... |
Danio rerio | Ensembl | ENSDARG... |
Naegleria gruberi | Ensembl protist | lmjf... |
Plasmodium falciparum | Ensembl protist | MAL... |
Pongo abelii | Ensembl | ENSPPYG... |
Schizosaccharomyces pombe | Ensembl fungi | SPAC... |
Tetraodon nigroviridis | Ensembl | ENSTNIG... |
Thalassiosira pseudonana | Ensembl protist | THAPS... |
Gallus Gallus | Ensembl | ENSGALG... |
Xenopus tropicalis | Ensembl | ENSXETG... |
Xenopus laevis | Xenbase | XB-GENE-... |
Chaetomium thermophilum | Eurepean nucleotide Database | GL... |
Mesocricetus auratus | Eurepean nucleotide Database | ML... |
Oryzias latipes | Eurepean nucleotide Database | DQ... |
Vanderwaltozyma polyspora | Eurepean nucleotide Database | DS... |
Zea mays | Eurepean nucleotide Database | FJ... |
Arabidopsis thaliana, Cricetulus griseus, Leishmania major, Nematostella vectensis, Neurospora crassa, Ostreococcus tauri, Physcomitrella patens, Phytophthora ramorum, Rhizopus oryzae, Schistosoma mansoni, Trichomonas vaginalis, Trypanosoma brucei | Database not available | Same as gene name |
Motif Search
Symbol | Description | Bases represented |
A | Adenine | [A] |
C | Cytosine | [C] |
G | Guanine | [G] |
T | Thymine | [T] |
U | Uracil | [U] |
W | Weak | [A,T] |
S | Strong | [C,G] |
M | aMino | [A,C] |
K | Keto | [G,T] |
R | puRine | [A,G] |
Y | pYrimidine | [C,T] |
B | Not A | [C,G,T] |
D | Not C | [A,G,T] |
H | Not G | [A,C,T] |
V | Not T | [A,C,G] |
N or X | aNy base | [A,C,G,T] |
All queries are displayed as tables. A file, containing the search results, can be retrieved by clicking on drop-down Download menu at the top of the page and choosing the preferred format (csv or tsv text format)
User can choose how many items wants to display by selecting the corresponding number from the dropdown menu.
User can further filter entries of the table through a full-text search using the search input box.
User can also copy results in the clipboard or print them.
One can sort the table according to their requirement simply by clicking on the header.
Table headers and explanation of the field follows:
Header | Description |
Gene name | The official gene name is reported |
Gene ID | The official gene ID is reported |
Organism | Organism where the motif has been assess. |
Motif | Sequence of the motif |
Len | Length of the motif |
Pubmed | A link to Pubmed ID is provided. User can have a look at the reference experimentally supporting the binding, by clicking on it. |
Experiment | Type of experiment used to asses the motif |
Domain | Domain present in RBP |
Offset | Distance measured in nucleotides from the beginning of the sequence |
Go Terms | All the Go terms associated with RPB are provided |
LOGO | A graphical representation of the sequence profile |
Quality score | A numerical representation of affinity between RBP and binding sites |
It is possible to investigate the associated Go term of the RNA binding protein by clicking the corresponding button in the Go terms column. A popup window will appear with all the associated go term provided in table format.
Scan a sequence or a set of sequences
User can upload a TXT file containing RNA\DNA sequence(s) in fasta or multi-fasta format and scan the sequence(s) searching for the presence of motifs. I.e: Fasta format
I.e: Multi-fasta
The Burrows wheleer transform (BWT) algorithm is implemented in order to speed up the searching process.
BWT permit to:
- count the number of patterns in one or more strings
- to locate the offset of a motif in one or more strings
Results are provided in table format and graphical format.
Table description
As in the Search Results section user can choose how many items wants to display by selecting the corresponding number from the drop down menu. User can filter entries of the table through a full-text search using the search input box. User can also copy results in the clipboard or print them. One can sort the table simply by clicking on the header.
Exon250 | The log odd ratio of this specific motif belonging to an exon plus 250 nucleotides upstream and 250 downstream (for further details) |
CDS | The log odd ratio of this specific motif belonging to a coding sequence (for further details) |
Intron | The log odd ratio of this specific motif belonging to an intron sequence (for further details) |
It is possible to investigate the associated Go term of the RNA binding protein by clicking the corresponding button in the Go terms column. A popup window will appear with all the associated go terms. It is possible to perform a full-text search on all the field of the table, simply filling the corresponding form.
User can download:-
A file containing all the analyzed sequences. They can be retrieved by clicking the drop-down Download menu in the green stripe at the top of the page and choose the preferred format between csv or tsv text format. Each sequence analyzed starts with:
Graph Description
A graphical format is provided in order to visualize the results. The purpose of the graph is to identify those peaks where a concentration of motifs occurs.
Header | Description |
Gene name | The official gene name is reported |
Organism | The organism where the motif has been assessed. |
Motif | The sequence of the motif starting in this point |
Moving the mouse wheel is possible to zoom in and out the figure.
Scoring Function
Let M=[m1,m2 m3 ,...,mn] the set of all the motifs in the database.Let S1 = [s11,s12 s13 ,...,s1n] where s11,s12 s13 ,...,s1n are the sequences in the considered genome representing an exon plus 250 nucleotides upstream and downstream.
Let S2 = [s21,s22 s23 ,...,s2n] where s21,s22 s23 ,...,s2n are the sequences in the considered geneome representing a coding sequence.
Let S3 = [s31,s32 s33 ,...,s3n] where s31,s32 s33 ,...,s3n are the sequences in the considered geneome representing an intron.
Let CS1 = [cm1,cm2,cm3,...,cmn] where cm1,cm2,cm3,...,cmn are the occurences of motifs m1,m2,m3,...,mn in S1
Let CS2 = [cm1,cm2,cm3,...,cmn] where cm1,cm2,cm3,...,cmn are the occurences of motifs m1,m2,m3,...,mn in S2
Let CS3 = [cm1,cm2,cm3,...,cmn] where cm1,cm2,cm3,...,cmn are the occurences of motifs m1,m2,m3,...,mn in S3
Let s an input sequence of length ls and mx ∈ M a motif of length lm of multiplicity t found in the input sequence . The Log Off ratio is computed as:
for computing the score for sequences belonging to set S1
for computing the score for sequences belonging to set S2
for computing the score for sequences belonging to set S3

and Exp is defined as:

To ensure that the new motifs discovered are similar to motifs experimentally validated, they are compared with ATtRACT database or a subset of it. For achieving the task a newly pipeline was developed and is shown in figure:

Two different tools are integrated with ATtRACT :
Tools | Description |
MEME | Meme analyzes the input sequences for similarities among them and produces as output as many motifs as requested. MEME takes advantages of an extension of expectation maximization (EM) algorithm to produce a statistical model to automatically find a relationship between possibly related unaligned sequences. |
Tomtom | Tomtom analyzes MEME output to assess whether a newly discovered motif resembles any motif in ATtRACT database. |
Description of the input field
The description of the input necessary for the De Novo motif analysis follow:Input Field | Description | Mandatory |
Upload a multi-FASTA file | The multi-fasta file of putatively related fasta sequences | Yes if Upload MEME txt Output field is empty |
Upload MEME txt Output | Upload the output or your own MEME analysis and compare with Tomtom | Yes if Upload a multi-FASTA file field is empty |
Model | Three different type of models are available:
Yes (not taken in consideration if you upload Upload MEME txt Output) |
Maximum number of motifs | Meme will stop when the selected number of distinct motifs in the training set is reached or when none can be found with E-value < 10 (default) | No (not taken into consideration if you upload Upload MEME txt Output) |
MINimum length of motif | Lower bound for a motif length [default = 4] | No (not taken into consideration if you upload Upload MEME txt Output) |
MAXimum length of motif | Upper bound for a motif length [default = 14] | No (not taken into consideration if you upload Upload MEME txt Output) |
Evalue | MEME and Tomtom stop if motif E-value greater than >10 [default = 10] | No |
Upload your own MEME db | Permit to upload a subset of motifs extracted from ATtRACT database. For further information refer to this link | No |
Generation of database of known motifs
The Expect value (E) is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. The Evalue is strongly dependent on the size of the database. Would be better to compare the De novo motifs with a database containing only those motifs belonging to the same or to related species. For this reason, ATtRACT gives to the user the possibility to build an own database of known motifs.Input Field | Description |
MINimum length of motif | Lower bound for a motif length |
MAXimum length of motif | Upper bound for a motif length |
Experiment | Select type of experiments |
Organism | Select type of organisms |
Domain | Select specific domains |
To overcame this limitation the user can be less restrictive selecting the various fields I.e: select other species or increase the MINimum length of the motif and/or MAXimum length of the motif
User can visualize two different types of output. On the top of the page, the De Novo motifs discovered by MEME. The motifs are ordered on the base of their Evalue. MEME uses an objective function on motifs to select the "best" motif. The objective function is based on the statistical significance of the log-likelihood ratio (LLR) of the occurrences of the motif. Evalue is assigned by MEME and indicate an estimate of the number of motifs (with the same width and number of occurrences) that would have equal or higher log likelihood ratio if the training set sequences had been generated randomly User can download the results by pressing the drop-down menu button. Three possibilities file format are available:- TSV format
- CSV format
Input Field | Description | ||||||||||||||||||||||||||
Motif [num] | Motif identified by MEME | ||||||||||||||||||||||||||
Summary | Provides information about the best alignment between the De novo motif and one of the entries of the ATtRACT database.

Description of the summary fields follow:
Description of the summary fields follow:
Alignment | Provides an alignment figure between the motif present in ATtRACT database (up) and the De novo motif (down) |
