RADS

RADS (Rapid Alignment of Domain Strings) searches with a query in a database for similar domain arrangements. It has the following features:

search for domains and domain arrangements
we provide precomputed databases
you can make your own database

This website covers the basics of RADS. For more detailed information please check the manual.

Download & Setting up

Requirements

Although we try to keep the dependencies to a minimum some existing libraries are needed.

cmake
compiler supporting c++11 - e.g. g++ 4.8 or higher
boost modules: system, filesystem, program_options and iostreams (http://www.boost.org/)
BioSeqDataLib (see download instructions)

Optional:

git - Recommended: It simplifies future update processes

In most Linux distributions (e.g. Ubuntu, Arch Linux) it should be possible to install most of these dependencies using the package manager.

Download

There are two ways to download RADS. Either you can download it using git or manually from the website. Both ways are described below.

Tip

If you use git you can easily update to a newer version when available.

Download using git

Use git to clone the repository and download BioSeqDataLib as a submodule:

git clone https://zivgitlab.uni-muenster.de/domain-world/RADS.git
cd RADS
git submodule init
git submodule update

Compilation & Installation

Inside the source folder a build directory is needed in which the code will be compiled. CMake is used to find all the needed requirements of the library.

mkdir build
cd build
cmake ..
make

Updating

Sometimes it will be necessary to update RADS either because it contains some new features or because we unfortunately had a bug somewhere that we have now fixed. If you used git for the original download you can simply use git to update your code. Simply change into the RADS directory and type:

git pull
git submodule foreach git pull origin master

You now simply follow again the steps in the Installation section. If you downloaded the code without git you will have to download the latest version and replace the old one with it, Do not forget to update the BioSeqDataLib folder as well.

Setting up your system

You will need to have a Domain Similarity Matrix (DSM) installed in your system.

Getting a database for RADS

There are two possibilities to get a RADS database. The easiest is to use one of our precomputed matrices. If they do not contain the sequences you need you can very simply create your own database.

We provide some precomputed databases:

Currently we provide precomputed databases based on the InterPro domain annotations. With RADS version 2.3 we updated the database format. The old databases are not compatible anymore with the new version of RADS.

database	size (unzipped)	matrix	comment
interPro81-pfam.tar.bz2	1.4 GB (5.3 GB)	pfam-33.1.dsm	Contains all Pfam matches of the InterPro annotation (version 81).
interPro69-pfam.tar.bz2	864 MB (3.1 GB)	pfam-31.dsm	Contains all Pfam matches of the InterPro annotation (version 69).
interPro69-ssf.tar.bz2	777 MB (2.8 GB)	ssf-1_75.dsm	Contains all SuperFamily matches of the InterPro annotation (version 69).

Creating your own database

It is very simple to create your own database. You need domain files in a supported format (e.g. the output of pfam_scan.pl). If you want to have sequence lengths given in the RADS output you will need to provide the sequences in fasta format as well. If you do not provide sequences, the length will be set to 0.

makeRadsDB -i domainFile1.pfam domainFile2.pfam -s seqFile1.fa seqFile2.fa -o myDB

The command above will create two files myDB.db and myDB.da. Both are needed by RADS.

Running RADS

After you have set up your system as described above you can get a short overview on how to use RADS to find similar domain arrangements. A more detailed description can be found in the manual.

You can provide the query in different formats:

provide the domains manually:

rads --db interPro64-pfam -M pfam-31.dsm -D PF02758 PF05729

provide a fasta sequence (will be annotated using pfam_scan.pl):

rads --db interPro64-pfam -M pfam-31.dsm -Q seq.fasta

provide a domain annotation:

rads --db InterPro60-pfam -M pfam-31.dsm -q seq.dom

The output

The output of RADS consist of a single file containing the targets found by RADS (find below an example).

# RADS version 2.1.1
# RADS Output v1
# run at Thu Aug  3 15:59:19 2017
#
# query file: -
# database: interPro64-pfam
# matrix: pfam-31.dsm
# ******************************************************************

Results for: manual entered query
Domain arrangement: PF00001

# score | normalized | SeqID | sequence length | domain arrangement  
# -------------------------------------------------------------------
100	1.00	10020:000030	611	 PF00001 44 293
100	1.00	10020:000054	276	 PF00001 2 215
100	1.00	10020:0001c3	337	 PF00001 42 293
100	1.00	10020:000327	402	 PF00001 75 353
100	1.00	10020:000359	410	 PF00001 52 305
100	1.00	10020:000393	372	 PF00001 67 321

The targets are listed in a table consisting of five columns. Each column is tab separated from the next. The columns are:

score The score of the alignment of the query to the target arrangement. Table is sorted by this value.

normalized The normalized version of score with a value between 0 and 1.
SeqID The sequence ID of the target sequence.
sequence length The length of the target sequence. If sequence length is not included in the database this value will be 0.
domain arrangement The list of domains in the target arrangement. Each element consists of three values starting with the domain accession number followed by the start and end position of the domain in the target sequence. Values are space separated.

Contact the developer

If you find a problem, have questions or any kind of comment please contact us (domainworld[@]uni-muenster.de).

Citation

If you use RADS in your project please cite our publication:

Terrapon, Nicolas, Weiner, January, Grath, Sonja, Moore, Andrew D, Bornberg-Bauer, Erich: Rapid similarity search of proteins using alignments of domain arrangements., Bioinformatics (2014) 30 (2): 274-281. doi: 10.1093/bioinformatics/btt379

http://bioinformatics.oxfordjournals.org/content/30/2/274.long