This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Two years ago, I started working on a project uses both RNAseq and ATACseq. It's supposed to be a simple Healthy Control (HC) vs disease study. The sample collection was done in 2 phases, and it was clear that there was a batch effect that we could adjust for.
However, recently, we received additional metadata. Plotting a PCA plot showed that there was another, larger batch effect that we didn't account for--location of sample donation. There are 4 different locations with the disease samples being from any of the 4, but ALL HC came from only one of the locations.
I resent the count data through DESeq2 with this design formula: phase location disease. It didn't fuss about collinearity like it often likes to do and then pooped out a big list of DEG in the RNAseq.
I could probably run the DEG through GSEA to see if the results match the disease's previously known signatures, but what statistical worries should I have about this design matrix? What justification would I need for this statistical asymmetry? Thanks.
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/bioinformat...