MDA
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups
Public Types | Public Member Functions | Related Functions | List of all members
MDAT::ProteinSequenceSet< MemoryType > Class Template Reference

A SequenceSet object for sequences of type ProteinSequence. More...

#include <ProteinSequenceSet.hpp>

Inheritance diagram for MDAT::ProteinSequenceSet< MemoryType >:
MDAT::SequenceSetBase< ProteinSequence, MemoryType >

Public Types

typedef ProteinSequence value_type
 The sequence type used in the object.
 

Public Member Functions

 ProteinSequenceSet (size_t id_val)
 
 ProteinSequenceSet (const ProteinSequenceSet &)=delete
 
 ProteinSequenceSet (ProteinSequenceSet &&)=default
 
Domain functions
void add_pfam_domains (const std::string &domain_f)
 Add domain information to the sequence set. The input file can be either the hmmer domain tabular output or pfamScan.pl output. More...
 
void clean_up_domains (unsigned char options)
 Solves overlaps and nested domains. More...
 
void refine_boundaries ()
 Refines the domain boundaries by using the envelope and hmm information.
 
void extract_architectures ()
 Calculates the architectures from the domains in the set.
 
size_t n_architectures () const
 Returns the number of architectures of the set. More...
 
size_t n_domains () const
 Returns the number of different domains. More...
 
const std::string & domain_name (size_t i) const
 Returns the domain accession name. More...
 
DomainArchitectureSetdom_archis ()
 
const DomainArchitectureSetdom_archis () const
 
void dom_archis (DomainArchitectureSet dom_arch)
 
void write_domArchitecture (const std::string &out_f) const
 
Operators
ProteinSequenceoperator[] (unsigned int index)
 Operator to access the sequence. More...
 
const ProteinSequenceoperator[] (unsigned int index) const
 
ProteinSequenceoperator[] (const std::string &seq_name)
 Access a function by name. More...
 
const ProteinSequenceoperator[] (const std::string &seq_name) const
 
Basic methods
const ProteinSequenceseq (unsigned int index) const
 Returns a sequence. More...
 
size_t n_seqs () const
 returns the number of sequences. More...
 
size_t size () const
 returns the number of sequences. More...
 
size_t length () const
 Returns the length of the sequence inside. More...
 
double avg_size () const
 The average size of the sequence set. More...
 
bool empty () const
 Returns true if no sequences are contained in this object.
 
std::string file () const
 Returns the file the sequences were read from.
 
char seq_type () const throw ()
 Returns type. More...
 
void seq_type (char seq_type_) throw ()
 Sets the sequence type. More...
 
int id () const throw ()
 Returns the id of the set. More...
 
void id (int val)
 Sets the id of the sequence set. More...
 
void clear ()
 Sets everything to 0.
 
Input & Output
virtual void read (const std::string &seq_f, const std::vector< std::string > &seq_names, bool check=false, short format=-1)
 Extracts a subalignment from the sequence set. More...
 
virtual void read (const std::string &seq_f, bool check=false, short format=-1)
 Reads a set of sequences. More...
 
virtual void write (const std::string &seq_f, const std::string format) const
 Writes the sequences into a file. More...
 
void add_seq (ProteinSequence *seq)
 Append a sequence to a set. More...
 
Manipulation methods
void to_upper ()
 Turns all characters to uppercase.
 
void to_lower ()
 Turns all characters to lowercase.
 
virtual void delete_seqs (const std::map< std::string, bool > &names)
 Deletes sequences from the alignment. More...
 
virtual void delete_seqs (std::vector< size_t > &indices)
 Deletes sequences from the alignment. More...
 
virtual void keep_seqs (std::vector< size_t > &indices)
 Deletes sequences if they are not in the given list. More...
 
void share (const SequenceSetBase< ProteinSequence, MemoryType > &set, size_t id)
 Shares a sequence between two sets. More...
 
void transfer (SequenceSetBase< ProteinSequence, MemoryType > &set, size_t id)
 Transfer a sequence from one set to another. More...
 
void transfer (SequenceSetBase< ProteinSequence, MemoryType > &set)
 Transfers all sequences from one set to another. More...
 
void sort (std::string type)
 Sorts the sequences. More...
 
void insert_gaps (const std::string &edit_string)
 Inserts gaps into each sequence. More...
 

Related Functions

(Note that these are not member functions.)

template<typename MemoryType >
void domain_column_split (const ProteinSequenceSet< MemoryType > &set, SplitSet< ProteinSequenceSet< Default > > &splitSet)
 Splits a ProteinSequenceSet into columns according to its domains. More...
 
template<typename MemoryType >
void splitByArchitecture (const ProteinSequenceSet< MemoryType > &set, std::vector< ProteinSequenceSet< MemoryType > > &architectureSplits)
 Splits a set according to the domain architecture of the sequences. More...
 

Detailed Description

template<typename MemoryType>
class MDAT::ProteinSequenceSet< MemoryType >

A SequenceSet object for sequences of type ProteinSequence.

It provides additional functions to read domain files of various formats and connects them to the sequences.

Member Function Documentation

template<typename MemoryType >
void MDAT::ProteinSequenceSet< MemoryType >::add_pfam_domains ( const std::string &  domain_f)

Add domain information to the sequence set. The input file can be either the hmmer domain tabular output or pfamScan.pl output.

Parameters
domain_fA file containing the domains.
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::add_seq ( ProteinSequence seq)
inlineinherited

Append a sequence to a set.

Parameters
seqA pointer to the new sequence.
double MDAT::SequenceSetBase< ProteinSequence , MemoryType >::avg_size ( ) const
inherited

The average size of the sequence set.

Returns
The average size.
template<typename MemoryType >
void MDAT::ProteinSequenceSet< MemoryType >::clean_up_domains ( unsigned char  options)

Solves overlaps and nested domains.

Solves several problems given the used domains.

Parameters
optionsThe cleaning options to be performed
virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::delete_seqs ( const std::map< std::string, bool > &  names)
virtualinherited

Deletes sequences from the alignment.

Parameters
namesThe names of the sequences to delete
virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::delete_seqs ( std::vector< size_t > &  indices)
virtualinherited

Deletes sequences from the alignment.

Parameters
indicesThe indices of the sequences to delete.
template<typename MemoryType>
DomainArchitectureSet& MDAT::ProteinSequenceSet< MemoryType >::dom_archis ( )
inline

Returns the DomainArchitectureSet belonging to the sequences.

Returns
template<typename MemoryType>
const DomainArchitectureSet& MDAT::ProteinSequenceSet< MemoryType >::dom_archis ( ) const
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

template<typename MemoryType>
const std::string& MDAT::ProteinSequenceSet< MemoryType >::domain_name ( size_t  i) const
inline

Returns the domain accession name.

Parameters
iThe id of the domain.
int MDAT::SequenceSetBase< ProteinSequence , MemoryType >::id ( ) const throw ()
inlineinherited

Returns the id of the set.

Returns
The id
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::id ( int  val)
inlineinherited

Sets the id of the sequence set.

Parameters
valThe id
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::insert_gaps ( const std::string &  edit_string)
inherited

Inserts gaps into each sequence.

Parameters
edit_stringThe matter of matches and gaps in reverse order
virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::keep_seqs ( std::vector< size_t > &  indices)
virtualinherited

Deletes sequences if they are not in the given list.

Parameters
indicesThe indices of the sequences to keep.
size_t MDAT::SequenceSetBase< ProteinSequence , MemoryType >::length ( ) const
inlineinherited

Returns the length of the sequence inside.

Returns the length of the first sequence or 0 if not existant.

Returns
The length
template<typename MemoryType>
size_t MDAT::ProteinSequenceSet< MemoryType >::n_architectures ( ) const
inline

Returns the number of architectures of the set.

Returns
Number of architectures.
template<typename MemoryType>
size_t MDAT::ProteinSequenceSet< MemoryType >::n_domains ( ) const
inline

Returns the number of different domains.

Returns
Number of different domains.
size_t MDAT::SequenceSetBase< ProteinSequence , MemoryType >::n_seqs ( ) const
inlineinherited

returns the number of sequences.

Returns
The number of sequences.
ProteinSequence & MDAT::SequenceSetBase< ProteinSequence , MemoryType >::operator[] ( unsigned int  index)
inlineinherited

Operator to access the sequence.

Parameters
indexThe sequence position to return.
Returns
Pointer to the sequence.
const ProteinSequence & MDAT::SequenceSetBase< ProteinSequence , MemoryType >::operator[] ( unsigned int  index) const
inlineinherited

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

ProteinSequence & MDAT::SequenceSetBase< ProteinSequence , MemoryType >::operator[] ( const std::string &  seq_name)
inlineinherited

Access a function by name.

Parameters
seq_nameThe name of the sequence
Returns
The Sequence
const ProteinSequence & MDAT::SequenceSetBase< ProteinSequence , MemoryType >::operator[] ( const std::string &  seq_name) const
inlineinherited

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::read ( const std::string &  seq_f,
const std::vector< std::string > &  seq_names,
bool  check = false,
short  format = -1 
)
virtualinherited

Extracts a subalignment from the sequence set.

Only sequences which are denoted in seq_names are extracted. Names in seq_names not occurring in the alignment are ignored. Columns consisting of gaps only are removed.

Parameters
seq_fThe file of the sequences to read.
seq_namesThe names to read.
formatThe format of the alignment. (-1 enables automatic format detection)
checkChecks if the sequence is a proper biological sequence
virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::read ( const std::string &  seq_f,
bool  check = false,
short  format = -1 
)
inlinevirtualinherited

Reads a set of sequences.

This function can read unaligned sequences in FASTA format as well as aligned sequences in several formats.

Parameters
seq_fThe file with the sequences to read.
formatThe format of the alignment. (-1 enables automatic format detection)
checkChecks if the sequence is a proper biological sequence
const ProteinSequence * MDAT::SequenceSetBase< ProteinSequence , MemoryType >::seq ( unsigned int  index) const
inlineinherited

Returns a sequence.

Parameters
indexIndex of the sequence.
Returns
const reference to the sequence.
char MDAT::SequenceSetBase< ProteinSequence , MemoryType >::seq_type ( ) const throw ()
inlineinherited

Returns type.

Returns
The sequence type.
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::seq_type ( char  seq_type_) throw ()
inlineinherited

Sets the sequence type.

Parameters
seq_type_The sequence type.
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::share ( const SequenceSetBase< ProteinSequence , MemoryType > &  set,
size_t  id 
)
inlineinherited

Shares a sequence between two sets.

Parameters
setThe set to take the sequence from.
idThe index of the sequence.
size_t MDAT::SequenceSetBase< ProteinSequence , MemoryType >::size ( ) const
inlineinherited

returns the number of sequences.

Returns
The number of sequences
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::sort ( std::string  type)
inherited

Sorts the sequences.

Parameters
type"input" sorts the sequences by order of the input. "name" sorts by sequence name. "seq" sorts the sequences by alphabetical order.
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::transfer ( SequenceSetBase< ProteinSequence , MemoryType > &  set,
size_t  id 
)
inlineinherited

Transfer a sequence from one set to another.

Parameters
setThe set to take the sequence from.
idThe index of the sequence.
void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::transfer ( SequenceSetBase< ProteinSequence , MemoryType > &  set)
inlineinherited

Transfers all sequences from one set to another.

Parameters
setThe set to take the sequence from.
virtual void MDAT::SequenceSetBase< ProteinSequence , MemoryType >::write ( const std::string &  seq_f,
const std::string  format 
) const
virtualinherited

Writes the sequences into a file.

This function supports the following formats: FASTA, MSF.

Parameters
seq_fThe file to write the alignment to
formatThe format to use (fasta, clustalw, msf, phylip_i, phylip_s)
template<typename MemoryType >
void MDAT::ProteinSequenceSet< MemoryType >::write_domArchitecture ( const std::string &  out_f) const

Writes the domain Architectures to a file.

Parameters
out_fThe file to write the architectures to.

Friends And Related Function Documentation

template<typename MemoryType >
void domain_column_split ( const ProteinSequenceSet< MemoryType > &  set,
SplitSet< ProteinSequenceSet< Default > > &  splitSet 
)
related

Splits a ProteinSequenceSet into columns according to its domains.

The sequences of the set are split into domain and non-domain columns according to the domain architecture set. In case of gaps in the domain architecture the domain column and the following non_domain column will contain an empty string.

Note
The size of each occurring architectures needs to be the same.
Template Parameters
TheMemory type of The SequenceSet
Parameters
set[in]The ProteinSequenceSet to split.
splitSet[out]The set to which the single columns will be added.
template<typename MemoryType >
void splitByArchitecture ( const ProteinSequenceSet< MemoryType > &  set,
std::vector< ProteinSequenceSet< MemoryType > > &  architectureSplits 
)
related

Splits a set according to the domain architecture of the sequences.

Parameters
setThe sequence set.
architectureSplitsThe resulting split