This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I have studied small molecules datasets extensively and so i understand how to perform exploratory data analysis (EDA) on a new molecular dataset. Ill try to understand clusters, relative distances between datapoints and will try to visualize the datapoints in 2D via some dimensional reduction technique (UMAPs, fingerprints, VAEs). Ill also use classyfire to get some semantic structural labels.
- What do people do for protein datasets? (computationally in Python)
- How do you calculate the distance between two protein sequences?
- How do you identify clusters?
- What are basic ways to embed or get vectorized representations of proteins?
I understand each of these questions will have many answers, looking for something "standard", with all it's flaws. Colabs, python snippets, examples are greatly appreciated.
Subreddit
Post Details
- Posted
- 10 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/bioinformat...