I have a dependent variable which can be nicely modeled with a linear expression on 8 variables. (n = 578, r = .977, p values of the variables nearly all in the 10^-5 range or better.)
The catch is this: both logic and an error analysis shows that the linear weights are not fixed, but are in fact themselves dependent on a master controlling variable. That master variable has a relatively narrow range _in this data set_, which is why we can get such a nice correlation, but we would like to be able to use this model in instances where the master variable is much smaller or higher and hence where significant errors may enter.
Currently I have nice _average values_ for the linear weights. What I'd like to do is express each one as a different linear function of the master variable (they may not be linear functions but it will get us a lot closer to the truth!).
Is there any algorithm / software tool that can crack this?
I'm hopeful that there are indeed ways to deal with this because I can think of real-world situations that seem like they might behave like this. For instance, you might have an IQ predictor based as a regression on 4 variables, but the weights of the variables might depend on socio- economic status. It would seem to be a logical next step in constructing models of reality.
(For those who are curious, what I'm working on is run scoring in baseball! The relative importance of a 2B or HR versus, say a SB, is very much a function of the overall OBP -- as OBP goes up the differences between various offensive events become smaller, as OBP goes down events such as the HR become much more important than a SB. You therefore can't use a model with fixed weights to accurately estimate run scoring in a single game.)
> I have a dependent variable which can be nicely modeled with a linear > expression on 8 variables. (n = 578, r = .977, p values of the variables > nearly all in the 10^-5 range or better.)
> The catch is this: both logic and an error analysis shows that the linear > weights are not fixed, but are in fact themselves dependent on a master > controlling variable. That master variable has a relatively narrow range > _in this data set_, which is why we can get such a nice correlation, but > we would like to be able to use this model in instances where the master > variable is much smaller or higher and hence where significant errors may > enter.
> Currently I have nice _average values_ for the linear weights. What I'd > like to do is express each one as a different linear function of the > master variable (they may not be linear functions but it will get us a > lot closer to the truth!).
> Is there any algorithm / software tool that can crack this?
> I'm hopeful that there are indeed ways to deal with this because I can > think of real-world situations that seem like they might behave like > this. For instance, you might have an IQ predictor based as a regression > on 4 variables, but the weights of the variables might depend on socio- > economic status. It would seem to be a logical next step in constructing > models of reality.
> (For those who are curious, what I'm working on is run scoring in > baseball! The relative importance of a 2B or HR versus, say a SB, is > very much a function of the overall OBP -- as OBP goes up the differences > between various offensive events become smaller, as OBP goes down events > such as the HR become much more important than a SB. You therefore can't > use a model with fixed weights to accurately estimate run scoring in a > single game.)
I might be missing something here, but why not make it a quadratic polynomial? Regress the response variables on the original predictors, the master variable, and the product of the master variable with each of the original predictors. Since it's a polynomial, you can still use OLS regression.
> I have a dependent variable which can be nicely modeled with a linear > expression on 8 variables. (n = 578, r = .977, p values of the variables > nearly all in the 10^-5 range or better.)
> The catch is this: both logic and an error analysis shows that the linear > weights are not fixed, but are in fact themselves dependent on a master > controlling variable. That master variable has a relatively narrow range > _in this data set_, which is why we can get such a nice correlation, but > we would like to be able to use this model in instances where the master > variable is much smaller or higher and hence where significant errors may > enter.
> Currently I have nice _average values_ for the linear weights. What I'd > like to do is express each one as a different linear function of the > master variable (they may not be linear functions but it will get us a > lot closer to the truth!).
> Is there any algorithm / software tool that can crack this?
> I'm hopeful that there are indeed ways to deal with this because I can > think of real-world situations that seem like they might behave like > this. For instance, you might have an IQ predictor based as a regression > on 4 variables, but the weights of the variables might depend on socio- > economic status. It would seem to be a logical next step in constructing > models of reality.
> (For those who are curious, what I'm working on is run scoring in > baseball! The relative importance of a 2B or HR versus, say a SB, is > very much a function of the overall OBP -- as OBP goes up the differences > between various offensive events become smaller, as OBP goes down events > such as the HR become much more important than a SB. You therefore can't > use a model with fixed weights to accurately estimate run scoring in a > single game.)
In some circles your 'master' variable is called a 'moderator' variable. To model its linear action, add it and its products with the predictors to the model. For instance, if you have 4 predictors (x1,x2,x3,x4) and 2 moderators (z1,z2), your model would be
> On Nov 6, 7:42 am, "Eric M. Van" <em...@post.harvard.edu> wrote: >> I have a dependent variable which can be nicely modeled with a linear >> expression on 8 variables. (n = 578, r = .977, p values of the variables >> nearly all in the 10^-5 range or better.)
>> The catch is this: both logic and an error analysis shows that the linear >> weights are not fixed, but are in fact themselves dependent on a master >> controlling variable. That master variable has a relatively narrow range >> _in this data set_, which is why we can get such a nice correlation, but >> we would like to be able to use this model in instances where the master >> variable is much smaller or higher and hence where significant errors may >> enter.
>> Currently I have nice _average values_ for the linear weights. What I'd >> like to do is express each one as a different linear function of the >> master variable (they may not be linear functions but it will get us a >> lot closer to the truth!).
>> Is there any algorithm / software tool that can crack this?
>> I'm hopeful that there are indeed ways to deal with this because I can >> think of real-world situations that seem like they might behave like >> this. For instance, you might have an IQ predictor based as a regression >> on 4 variables, but the weights of the variables might depend on socio- >> economic status. It would seem to be a logical next step in constructing >> models of reality.
>> (For those who are curious, what I'm working on is run scoring in >> baseball! The relative importance of a 2B or HR versus, say a SB, is >> very much a function of the overall OBP -- as OBP goes up the differences >> between various offensive events become smaller, as OBP goes down events >> such as the HR become much more important than a SB. You therefore can't >> use a model with fixed weights to accurately estimate run scoring in a >> single game.)
> In some circles your 'master' variable is called a 'moderator' > variable. To model its linear action, add it and its products with > the predictors to the model. For instance, if you have 4 predictors > (x1,x2,x3,x4) and 2 moderators (z1,z2), your model would be
Ray, this is PRECISELY what I need, and you explained it at exactly the level I understand. Thankfully I just have one moderator but I now know how to deal with multiple ones if I run into that. Thanks muchly.
"Paul" also provided, I think, the same answer (thanks, Paul!), but (as someone whose last formal stats class was in prep school in 1972) it went over my head!
Ray Koopman <koop...@sfu.ca> wrote in news:e9213085-aa6d-47cb-8e64- 287097a4c...@m33g2000pri.googlegroups.com: