Featured post
How to perform basic Multiple Sequence Alignments in R? -
(i've tried asking on biostar, slight chance text mining think there better solution, reposting here)
the task i'm trying achieve align several sequences.
i don't have basic pattern match to. know "true" pattern should of length "30" , sequences have had missing values introduced them @ random points.
here example of such sequences, on left see real location of missing values, , on right see sequence able observe.
my goal reconstruct left column using sequences i've got on right column (based on fact many of letters in each position same)
real_sequence the_sequence_we_see 1 cgcaatactaac-agctgacttacgcaccg cgcaatactaacagctgacttacgcaccg 2 cgcaatactagc-aggtgacttcc-ct-cg cgcaatactagcaggtgacttccctcg 3 cgcaatgatcac--ggtggctcccggtgcg cgcaatgatcacggtggctcccggtgcg 4 cgcaatactaacca-ctaact--cgctgcg cgcaatactaaccactaactcgctgcg 5 cgcacgggtaagaacgtga-ttacgctcag cgcacgggtaagaacgtgattacgctcag 6 cgctatactaacaa-gtg-cttaggc-ctg cgctatactaacaagtgcttaggcctg 7 ccca-c-ctaa-acggtgacttacgctccg cccacctaaacggtgacttacgctccg
here example code reproduce above example:
atcg <- c("a","t","c","g") set.seed(40) original.seq <- sample(atcg, 30, t) seqs <- matrix(original.seq,200,30, t) change.letters <- function(x, number.of.changes = 15, letters.to.change.with = atcg) { number.of.changes <- sample(seq_len(number.of.changes), 1) new.letters <- sample(letters.to.change.with , number.of.changes, t) where.to.change.the.letters <- sample(seq_along(x) , number.of.changes, f) x[where.to.change.the.letters] <- new.letters return(x) } change.letters(original.seq) insert.missing.values <- function(x) change.letters(x, 3, "-") insert.missing.values(original.seq) seqs2 <- t(apply(seqs, 1, change.letters)) seqs3 <- t(apply(seqs2, 1, insert.missing.values)) seqs4 <- apply(seqs3,1, function(x) {paste(x, collapse = "")}) require(stringr) # library(help=stringr) all.seqs <- str_replace(seqs4,"-" , "") # how allign this? data.frame(real_sequence = seqs4, the_sequence_we_see = all.seqs)
i understand if had string , pattern able use
library(biostrings) pairwisealignment(...)
but in case present dealing many sequences align 1 (instead of aligning them 1 pattern).
is there known method doing in r?
thanks,
tal
though quite old thread, not want miss opportunity mention that, since bioconductor 3.1, there package 'msa
' implements interfaces 3 different multiple sequence alignment algorithms: clustalw, clustalomega, , muscle. package runs on major platforms (linux/unix, mac os, , windows) , self-contained in sense need not install external software. more information can found on http://www.bioinf.jku.at/software/msa/ , http://www.bioconductor.org/packages/release/bioc/html/msa.html.
- Get link
- X
- Other Apps
Comments
Post a Comment