MDA
|
A SequenceSet object for sequences of type ProteinSequence. More...
#include <ProteinSequenceSet.hpp>
Public Types | |
typedef ProteinSequence | value_type |
The sequence type used in the object. | |
Public Member Functions | |
ProteinSequenceSet (size_t id_val) | |
ProteinSequenceSet (const ProteinSequenceSet &)=delete | |
ProteinSequenceSet (ProteinSequenceSet &&)=default | |
Domain functions | |
void | add_pfam_domains (const std::string &domain_f) |
Add domain information to the sequence set. The input file can be either the hmmer domain tabular output or pfamScan.pl output. More... | |
void | clean_up_domains (unsigned char options) |
Solves overlaps and nested domains. More... | |
void | refine_boundaries () |
Refines the domain boundaries by using the envelope and hmm information. | |
void | extract_architectures () |
Calculates the architectures from the domains in the set. | |
size_t | n_architectures () const |
Returns the number of architectures of the set. More... | |
size_t | n_domains () const |
Returns the number of different domains. More... | |
const std::string & | domain_name (size_t i) const |
Returns the domain accession name. More... | |
DomainArchitectureSet & | dom_archis () |
const DomainArchitectureSet & | dom_archis () const |
void | dom_archis (DomainArchitectureSet dom_arch) |
void | write_domArchitecture (const std::string &out_f) const |
Operators | |
ProteinSequence & | operator[] (unsigned int index) |
Operator to access the sequence. More... | |
const ProteinSequence & | operator[] (unsigned int index) const |
ProteinSequence & | operator[] (const std::string &seq_name) |
Access a function by name. More... | |
const ProteinSequence & | operator[] (const std::string &seq_name) const |
Basic methods | |
const ProteinSequence * | seq (unsigned int index) const |
Returns a sequence. More... | |
size_t | n_seqs () const |
returns the number of sequences. More... | |
size_t | size () const |
returns the number of sequences. More... | |
size_t | length () const |
Returns the length of the sequence inside. More... | |
double | avg_size () const |
The average size of the sequence set. More... | |
bool | empty () const |
Returns true if no sequences are contained in this object. | |
std::string | file () const |
Returns the file the sequences were read from. | |
char | seq_type () const throw () |
Returns type. More... | |
void | seq_type (char seq_type_) throw () |
Sets the sequence type. More... | |
int | id () const throw () |
Returns the id of the set. More... | |
void | id (int val) |
Sets the id of the sequence set. More... | |
void | clear () |
Sets everything to 0. | |
Input & Output | |
virtual void | read (const std::string &seq_f, const std::vector< std::string > &seq_names, bool check=false, short format=-1) |
Extracts a subalignment from the sequence set. More... | |
virtual void | read (const std::string &seq_f, bool check=false, short format=-1) |
Reads a set of sequences. More... | |
virtual void | write (const std::string &seq_f, const std::string format) const |
Writes the sequences into a file. More... | |
void | add_seq (ProteinSequence *seq) |
Append a sequence to a set. More... | |
Manipulation methods | |
void | to_upper () |
Turns all characters to uppercase. | |
void | to_lower () |
Turns all characters to lowercase. | |
virtual void | delete_seqs (const std::map< std::string, bool > &names) |
Deletes sequences from the alignment. More... | |
virtual void | delete_seqs (std::vector< size_t > &indices) |
Deletes sequences from the alignment. More... | |
virtual void | keep_seqs (std::vector< size_t > &indices) |
Deletes sequences if they are not in the given list. More... | |
void | share (const SequenceSetBase< ProteinSequence, MemoryType > &set, size_t id) |
Shares a sequence between two sets. More... | |
void | transfer (SequenceSetBase< ProteinSequence, MemoryType > &set, size_t id) |
Transfer a sequence from one set to another. More... | |
void | transfer (SequenceSetBase< ProteinSequence, MemoryType > &set) |
Transfers all sequences from one set to another. More... | |
void | sort (std::string type) |
Sorts the sequences. More... | |
void | insert_gaps (const std::string &edit_string) |
Inserts gaps into each sequence. More... | |
Related Functions | |
(Note that these are not member functions.) | |
template<typename MemoryType > | |
void | domain_column_split (const ProteinSequenceSet< MemoryType > &set, SplitSet< ProteinSequenceSet< Default > > &splitSet) |
Splits a ProteinSequenceSet into columns according to its domains. More... | |
template<typename MemoryType > | |
void | splitByArchitecture (const ProteinSequenceSet< MemoryType > &set, std::vector< ProteinSequenceSet< MemoryType > > &architectureSplits) |
Splits a set according to the domain architecture of the sequences. More... | |
A SequenceSet object for sequences of type ProteinSequence.
It provides additional functions to read domain files of various formats and connects them to the sequences.
void MDAT::ProteinSequenceSet< MemoryType >::add_pfam_domains | ( | const std::string & | domain_f | ) |
Add domain information to the sequence set. The input file can be either the hmmer domain tabular output or pfamScan.pl output.
domain_f | A file containing the domains. |
|
inlineinherited |
Append a sequence to a set.
seq | A pointer to the new sequence. |
|
inherited |
The average size of the sequence set.
void MDAT::ProteinSequenceSet< MemoryType >::clean_up_domains | ( | unsigned char | options | ) |
Solves overlaps and nested domains.
Solves several problems given the used domains.
options | The cleaning options to be performed |
|
virtualinherited |
Deletes sequences from the alignment.
names | The names of the sequences to delete |
|
virtualinherited |
Deletes sequences from the alignment.
indices | The indices of the sequences to delete. |
|
inline |
Returns the DomainArchitectureSet belonging to the sequences.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
inline |
Returns the domain accession name.
i | The id of the domain. |
|
inlineinherited |
Returns the id of the set.
|
inlineinherited |
Sets the id of the sequence set.
val | The id |
|
inherited |
Inserts gaps into each sequence.
edit_string | The matter of matches and gaps in reverse order |
|
virtualinherited |
Deletes sequences if they are not in the given list.
indices | The indices of the sequences to keep. |
|
inlineinherited |
Returns the length of the sequence inside.
Returns the length of the first sequence or 0 if not existant.
|
inline |
Returns the number of architectures of the set.
|
inline |
Returns the number of different domains.
|
inlineinherited |
returns the number of sequences.
|
inlineinherited |
Operator to access the sequence.
index | The sequence position to return. |
|
inlineinherited |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
inlineinherited |
Access a function by name.
seq_name | The name of the sequence |
|
inlineinherited |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
virtualinherited |
Extracts a subalignment from the sequence set.
Only sequences which are denoted in seq_names are extracted. Names in seq_names not occurring in the alignment are ignored. Columns consisting of gaps only are removed.
seq_f | The file of the sequences to read. |
seq_names | The names to read. |
format | The format of the alignment. (-1 enables automatic format detection) |
check | Checks if the sequence is a proper biological sequence |
|
inlinevirtualinherited |
Reads a set of sequences.
This function can read unaligned sequences in FASTA format as well as aligned sequences in several formats.
seq_f | The file with the sequences to read. |
format | The format of the alignment. (-1 enables automatic format detection) |
check | Checks if the sequence is a proper biological sequence |
|
inlineinherited |
Returns a sequence.
index | Index of the sequence. |
|
inlineinherited |
Returns type.
|
inlineinherited |
Sets the sequence type.
seq_type_ | The sequence type. |
|
inlineinherited |
Shares a sequence between two sets.
set | The set to take the sequence from. |
id | The index of the sequence. |
|
inlineinherited |
returns the number of sequences.
|
inherited |
Sorts the sequences.
type | "input" sorts the sequences by order of the input. "name" sorts by sequence name. "seq" sorts the sequences by alphabetical order. |
|
inlineinherited |
Transfer a sequence from one set to another.
set | The set to take the sequence from. |
id | The index of the sequence. |
|
inlineinherited |
Transfers all sequences from one set to another.
set | The set to take the sequence from. |
|
virtualinherited |
Writes the sequences into a file.
This function supports the following formats: FASTA, MSF.
seq_f | The file to write the alignment to |
format | The format to use (fasta, clustalw, msf, phylip_i, phylip_s) |
void MDAT::ProteinSequenceSet< MemoryType >::write_domArchitecture | ( | const std::string & | out_f | ) | const |
Writes the domain Architectures to a file.
out_f | The file to write the architectures to. |
|
related |
Splits a ProteinSequenceSet into columns according to its domains.
The sequences of the set are split into domain and non-domain columns according to the domain architecture set. In case of gaps in the domain architecture the domain column and the following non_domain column will contain an empty string.
The | Memory type of The SequenceSet |
set[in] | The ProteinSequenceSet to split. |
splitSet[out] | The set to which the single columns will be added. |
|
related |
Splits a set according to the domain architecture of the sequences.
set | The sequence set. |
architectureSplits | The resulting split |