RADS
RADS (Rapid Alignment of Domain Strings) searches with a query in a database for similar domain arrangements. It has the following features:
- search for domains and domain arrangements
- we provide precomputed databases
- you can make your own database
This website covers the basics of RADS. For more detailed information please check the manual.
Download & Setting up
Requirements
Although we try to keep the dependencies to a minimum some existing libraries are needed.
- cmake
- compiler supporting c++11 - e.g. g++ 4.8 or higher
- boost modules: system, filesystem, program_options and iostreams (http://www.boost.org/)
- BioSeqDataLib (see download instructions)
Optional:
- git - Recommended: It simplifies future update processes
In most Linux distributions (e.g. Ubuntu, Arch Linux) it should be possible to install most of these dependencies using the package manager.
Download
There are two ways to download RADS. Either you can download it using git or manually from the website. Both ways are described below.
Tip
If you use git
you can easily update to a newer version when available.
Download using git
Use git to clone the repository and download BioSeqDataLib as a submodule:
git clone https://zivgitlab.uni-muenster.de/domain-world/RADS.git
cd RADS
git submodule init
git submodule update
Compilation & Installation
Inside the source folder a build directory is needed in which the code will be compiled. CMake is used to find all the needed requirements of the library.
mkdir build
cd build
cmake ..
make
Updating
Sometimes it will be necessary to update RADS either because it contains some new features or because we unfortunately had a bug somewhere that we have now fixed. If you used git for the original download you can simply use git to update your code. Simply change into the RADS directory and type:
git pull
git submodule foreach git pull origin master
You now simply follow again the steps in the Installation section. If you downloaded the code without git you will have to download the latest version and replace the old one with it, Do not forget to update the BioSeqDataLib folder as well.
Setting up your system
You will need to have a Domain Similarity Matrix (DSM) installed in your system.
Getting a database for RADS
There are two possibilities to get a RADS database. The easiest is to use one of our precomputed matrices. If they do not contain the sequences you need you can very simply create your own database.
We provide some precomputed databases:
Currently we provide precomputed databases based on the InterPro domain annotations. With RADS version 2.3 we updated the database format. The old databases are not compatible anymore with the new version of RADS.
database | size (unzipped) | matrix | comment |
---|---|---|---|
interPro81-pfam.tar.bz2 | 1.4 GB (5.3 GB) | pfam-33.1.dsm | Contains all Pfam matches of the InterPro annotation (version 81). |
interPro69-pfam.tar.bz2 | 864 MB (3.1 GB) | pfam-31.dsm | Contains all Pfam matches of the InterPro annotation (version 69). |
interPro69-ssf.tar.bz2 | 777 MB (2.8 GB) | ssf-1_75.dsm | Contains all SuperFamily matches of the InterPro annotation (version 69). |
Creating your own database
It is very simple to create your own database. You need domain files in a supported format (e.g. the output of pfam_scan.pl). If you want to have sequence lengths given in the RADS output you will need to provide the sequences in fasta format as well. If you do not provide sequences, the length will be set to 0.
makeRadsDB -i domainFile1.pfam domainFile2.pfam -s seqFile1.fa seqFile2.fa -o myDB
The command above will create two files myDB.db and myDB.da. Both are needed by RADS.
Running RADS
After you have set up your system as described above you can get a short overview on how to use RADS to find similar domain arrangements. A more detailed description can be found in the manual.
You can provide the query in different formats:
- provide the domains manually:
rads --db interPro64-pfam -M pfam-31.dsm -D PF02758 PF05729
- provide a fasta sequence (will be annotated using pfam_scan.pl):
rads --db interPro64-pfam -M pfam-31.dsm -Q seq.fasta
- provide a domain annotation:
rads --db InterPro60-pfam -M pfam-31.dsm -q seq.dom
The output
The output of RADS consist of a single file containing the targets found by RADS (find below an example).
# RADS version 2.1.1
# RADS Output v1
# run at Thu Aug 3 15:59:19 2017
#
# query file: -
# database: interPro64-pfam
# matrix: pfam-31.dsm
# ******************************************************************
Results for: manual entered query
Domain arrangement: PF00001
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
100 1.00 10020:000030 611 PF00001 44 293
100 1.00 10020:000054 276 PF00001 2 215
100 1.00 10020:0001c3 337 PF00001 42 293
100 1.00 10020:000327 402 PF00001 75 353
100 1.00 10020:000359 410 PF00001 52 305
100 1.00 10020:000393 372 PF00001 67 321
The targets are listed in a table consisting of five columns. Each column is tab separated from the next. The columns are:
- score The score of the alignment of the query to the target arrangement. Table is sorted by this value.
- normalized The normalized version of score with a value between 0 and 1.
- SeqID The sequence ID of the target sequence.
- sequence length The length of the target sequence. If sequence length is not included in the database this value will be 0.
- domain arrangement The list of domains in the target arrangement. Each element consists of three values starting with the domain accession number followed by the start and end position of the domain in the target sequence. Values are space separated.
Contact the developer
If you find a problem, have questions or any kind of comment please contact us (domainworld[@]uni-muenster.de).
Citation
If you use RADS in your project please cite our publication:
Terrapon, Nicolas, Weiner, January, Grath, Sonja, Moore, Andrew D, Bornberg-Bauer, Erich: Rapid similarity search of proteins using alignments of domain arrangements., Bioinformatics (2014) 30 (2): 274-281. doi: 10.1093/bioinformatics/btt379
http://bioinformatics.oxfordjournals.org/content/30/2/274.long