[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
how to speed up multiple regressions?
Hi,
I have some code to construct a composite of a
meteorological phenomena in three dimensions (x, y, lag).
The compositing index is a time series (ts) of a certain
variable, and the data being composited (x, y, time) is
regressed onto this compositing index. Because of the
length of the time series and the size of the data array,
and the fact that I do this compositing for multiple fields,
I'm looking for ways to speed up the process, which is
currently quite time consuming. The greatest amount of time
seems to be spent in computing the significance of the
correlation, rather than in computing the regressions. The
regression is only done for periods where the signal is the
"ts" time series is "big" (i.e., big = WHERE(ts GE
threshold)).
Here are the main chunks of code used:
1) to do the regressions:
ts_ac = A_CORRELATE(ts,lags) ; auto-corr of index time
series
dataf = fltarr(dim(1),dim(2),2*lagdays+1,2) ; regression
a,b coefficients
datar = fltarr(dim(1),dim(2),2*lagdays+1) ; corr. during
big var periods
data_ac = fltarr(dim(1),dim(2),2*lagdays+1) ; data
auto_correlation
data_tau = fltarr(dim(1),dim(2),2*lagdays+1) ;
decorrelation time scale
; Livezey & Chen, MWR '83
print, 'computing regression coefficients...'
FOR j = 0,dim(2)-1 DO BEGIN
FOR i = 0,dim(1)-1 DO BEGIN
temp = A_CORRELATE(data(i,j,*),lags)
data_ac(i,j,*) = temp
FOR lag = 0,2*lagdays DO BEGIN ; first = -29, last =
29
dataf(i,j,lag,*) =
LINFIT(ts(big),data(i,j,big+lag-lagdays))
datar(i,j,lag) =
CORRELATE(ts(big),data(i,j,big+lag-lagdays))
data_tau(i,j,lag) = $
(1.+2.*TOTAL(ts_ac(0:lag)*data_ac(i,j,0:lag))) > 1.
ENDFOR
ENDFOR
ENDFOR
2) to compute the significance of the correlation:
; compute the number of degress of freedom
datadof =
(fltarr(dim(1),dim(2),2*lagdays+1)+big_count)/data_tau
; find where corrlation is significant at 95% level
(Student's t)
data_sig = intarr(dim(1),dim(2),2*lagdays+1)
data_t = fltarr(dim(1),dim(2),2*lagdays+1)
tsval = 2.*SQRT(mean_var)
datacomp = fltarr(dim(1),dim(2),2*lagdays+1)
FOR lag = 0,2*lagdays DO BEGIN
FOR j = 0,dim(2)-1 DO BEGIN
FOR i = 0,dim(1)-1 DO BEGIN
data_t(i,j,lag) =
((ABS(datar(i,j,lag))*SQRT(datadof(i,j,lag)))/$
SQRT(1.-datar(i,j,lag)*2.))
data_sig(i,j,lag) =
((datar(i,j,lag)*SQRT(datadof(i,j,lag)))/$
SQRT(1.-datar(i,j,lag)*2.)) GT $
T_CVF(.1,datadof(i,j,lag))
datacomp(i,j,lag) = dataf(i,j,lag,0) +
dataf(i,j,lag,1)*tsval
ENDFOR
ENDFOR
ENDFOR
Any suggestions will be greatly appreciated. This code was
written nearly 2 years ago, so perhaps more recent versions
of IDL handle this better?
Many thanks,
Charlotte