This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I have been having so much trouble the last few days trying to do this. I am just super frustrated.I want to be able to take the Majority.Protein.IDs or even the Protein.IDs columns and match them but out of a sample of 10k rows/IDs I only get 15 matches which just can't be right.
I want to do some analysis on what protein groups might be present in the sample I am working with as well as some pathway/clustering analysis as well.
I am using R to do this.
Here are some examples of the transcript from the MaxQuant output proteinGroups.txt
CON__ENSEMBL:ENSBTAP00000001528CON__ENSEMBL:ENSBTAP00000024146.6-2CON__ENSEMBL:ENSBTAP00000025008CON__ENSEMBL:ENSBTAP00000031900CON__PO274-1
etc.
Any and all guidance would be greatly appreciated.
So I actually removed the CON contaminants.Put everything into a flattened array so that they are cleaned like so:[1] "ENST00000269576.5_1;P13645" [1] "ENST00000252242.6_1;P13647;ENST00000252245.5_1;95678"
[1] "ENST00000246662.6_1;P35527;ENST00000588431.1_1"
I can post my code as well so you can see where I am going wrong if that helps.
EDIT:
Solved thanks to guidance from u/No-Significance8288
- Cleaning the protein IDs from all the contaminants and prefixes, as well as empty, N?A values was the problem I was having.
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/bioinformat...