This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
So, just to give a sense of my background: I work in theory and method development in statistical genetics, but I don't have a ton formal training in math or statistics, with the exception a few classes in linear algebra, diff-eq, probability theory, stochastic processes, etc., and about 2 years on the job at this point. So I'm somewhat conversant in a lot of basic statistical and mathematical areas, but there are likely a lot of standard tricks and whatnot that I probably am not familiar with.
Briefly, my situation is this: I have data in the form of a single vector "Z", which, under the null hypothesis, is expected to be a draw from some multivariate normal distribution with a mean of zero, and a variance-covariance matrix which I'll call F. The alternative hypothesis that I'm interested in is that there is more variance among the elements of Z than we would expect if it was truly a draw from F. Essentially, I know that F underlies the distribution of Z, but there may be another process operating, and if so, I need to know what effect that process has had on Z.
I can test the null hypothesis simply by recognizing that ZT F-1 Z is distributed as a chi squared random variable under the null, and thus if the value of this test statistic is way out in the right tail, then I've got something that looks more like my alternative hypothesis than the null, which is great. What I'd really like to be able to do on top of that, however, is to essentially ask the data where this signal is coming from. For example, some of the signal could plausibly be coming from having two elements of Z that are expected to be very tightly correlated under the null, but which are actually not very close. Alternatively, one could imagine having two clusters of elements, where the difference between the two clusters is much larger than expected given the covariance matrix.
One trick I know I can play is to use conditional multivariate normal distributions to ask whether certain elements or groups of elements look unusual, given the values observed for another set of elements. This is nifty, and quite handy, and I can definitely learn something about the data from it, but it seems like there should probably be some simple linear algebra tricks that should just tell me which axes in the data are most interesting, without me having to specify and test a bunch of different hypotheses with the conditional MVN approach.
Any suggestions?
Subreddit
Post Details
- Posted
- 11 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/AskStatisti...