Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

8
Working with batch effects
Post Body

Two years ago, I started working on a project uses both RNAseq and ATACseq. It's supposed to be a simple Healthy Control (HC) vs disease study. The sample collection was done in 2 phases, and it was clear that there was a batch effect that we could adjust for.

However, recently, we received additional metadata. Plotting a PCA plot showed that there was another, larger batch effect that we didn't account for--location of sample donation. There are 4 different locations with the disease samples being from any of the 4, but ALL HC came from only one of the locations.

I resent the count data through DESeq2 with this design formula: phase location disease. It didn't fuss about collinearity like it often likes to do and then pooped out a big list of DEG in the RNAseq.

I could probably run the DEG through GSEA to see if the results match the disease's previously known signatures, but what statistical worries should I have about this design matrix? What justification would I need for this statistical asymmetry? Thanks.

Author
Account Strength
100%
Account Age
11 years
Verified Email
Yes
Verified Flair
No
Total Karma
146,573
Link Karma
39,202
Comment Karma
106,081
Profile updated: 3 days ago
Posts updated: 10 months ago
Msc | Academia

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
3 years ago