This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hello!
I'm carrying out a bioinformatics project as part of the computational section of my master's degree project and need some help finding the right tools for the job.
So far I've been carrying out a conservation analysis on my protein of interest using the Consurf platform. My protein of interest is conserved across eukaryotes and has two paralogues, 1 and 2, supposedly in "higher eukaryotes" and I'm trying to find out when exactly this divergence occurred by sifting through the WASABI phylogenetic trees I've generated in Consurf which is.... fun. This part I'm ok with I think, it's just tedious and I don't really know how to use the WASABI very well.
My supervisor has advised me to map the conservation pattern of each paralogue between organisms, so look at all the mammalian paralogue A's for example, which share like 80-100% sequence ID, or like all the chordates and map their conservation scores onto a pymol structure. What's interesting, however, is that paralogue A and B for the same organism seem to only show about 50% ID, but I've only tested this for a very small number of organisms. I'd like to see whether this pattern is maintained across organisms.
What I'm looking for is a sort of high throughput pairwise alignment tool where I can align the paralogues in each organism and compare the shared sequence ID between the paralogues across a range of organisms. I'm not sure if I've explained that very clearly but the output should look something my diagram below (which uses made up numbers). Of course I can just do a bunch of pairwise alignments manually but I'm wondering whether theres a tool that I can use to upload a bunch of sequences to pairwise align all in one go without needing 500 tabs of Emboss Needle open. Maybe I'd be able to spot some patterns where in, idk, bony fish for example the paralogues are more closely related. I don't know, but I think it would be cool
Organism | %ID | Alignment |
---|---|---|
Orangutan 1/2 | 51 | GCNAL...... |
Dolphin 1/2 | 48 | etc |
Chimpanzee 1/2 | 44 | etc |
The main focus of my project is not on evolution, it's supposed to be more like structural biology but cos of the 'rona we've had to make do with some computational stuff often in areas we're less familiar with so any help is welcome. If anyone has any advice on conservation analysis with protein paralogues or identifying gene duplication events or even some cool literature to read that would be great!
Thanks everyone!
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/bioinformat...