The Origin Forum

The Origin Forum

Username:	Password:
Save Password
Forgot your Password? \| Admin Options

All Forums

Origin Forum

origin

New Topic

Reply to Topic

Printer Friendly

Author

Topic

chiacchi

USA
Posts

Posted - 05/13/2007 : 9:36:50 PM

I have been trying to use 'loops' instead of 'for' loops in my scripts using LabTalk but the execution still takes quite a while to complete. The script I am using averages a column of values (total of 262,800 rows) in one dataset, which takes about 8 minutes. Is this typical for Origin using LabTalk or could the reason be that I have only 10 GB left on my harddrive? Any suggestions?

Mike Buess

USA
3037 Posts

Posted - 05/13/2007 : 11:21:31 PM

Disk space should have nothing to do with it unless your physical memory is low and you rely on the page file. 8 minutes seems unreasonably long to average a column of 262,800 rows. On my PC (1.3GHz/1.2GB) the LOOP loop below takes approximately 16 seconds and the SUM command does the same thing in 0.5 seconds. (O50 and O75) Note that using FOR loop instead of LOOP loop increases execution time to only 22 seconds so the advantage of LOOP over FOR is not dramatic. The SUM function does much better than either type of loop.

nn=262800;
mean=0;
loop(ii,1,nn) {mean += col(A)[ii]};
mean /= nn;
mean=;
MEAN=average value

sum(col(A));
sum.mean=;
SUM.MEAN=average value

Mike Buess
Origin WebRing Member

Edited by - Mike Buess on 05/13/2007 11:46:50 PM

chiacchi

USA
Posts

Posted - 05/14/2007 : 9:18:45 PM

Thanks Mike! However, my PC is also (1.3GHz and 1.2GB), but not sure why it takes so long to run the script. At times there is a warning that says something like, low virtual memory. Not sure what's going on. Any clues?

larry_lan

China
Posts

Posted - 05/14/2007 : 10:06:51 PM

What's your script to average a column of data?

Larry
OriginLab Technical Services

Mike Buess

USA
3037 Posts

Posted - 05/14/2007 : 11:39:58 PM

Of course we really do need to see your script (I'm surprised you haven't shown it already) but it's possible that your project has collected a lot of "junk" (temporary datasets, windows, etc). The easiest way to test that is to try your script in a new project. If you need to run in an existing project search the index of the programming guide for the list command with which you can find out how many datasets and variables exist and the delete command which deletes them.

http://www.originlab.com/forum/topic.asp?TOPIC_ID=4579

Mike Buess
Origin WebRing Member

chiacchi

USA
Posts

Posted - 05/15/2007 : 6:33:24 PM

Hi Mike (or Larry),

I used the 'list -s' command and received 256 files. Let me know how I can prevent this in the script itself, if possible, because when I go to 'save' the project it also takes a few minutes! (In the project there should be 17 data files or data sheets that have over 262,000 rows.)

I used the same script in a new project with only one dataset and it took about 2 minutes. I am wondering if this is more normal or can it be faster. Here is the script that I am using to compute the monthly averages for one of the columns in my data sheet. It was originally written by someone else and I modified it slightly:

jj = 2; // Index of the Month column;
get %(%H,jj) -e end; //get number of rows of data in month column
ncols = wks.ncols; //number of columns in worksheet

ii=8;
//renames the column name
if (ii==8) {
work -n $(ii) Global;
}
%L = wks.col$(ii).name$;
month1 = wcol(jj)[1]; ;

count = 0;
group = 1;

wks.addcol(Month %L);
grpcol = wks.ncols;
wks.col$(grpcol).width=14;

wks.AddCol(Average %L);
avecol = wks.ncols;
wks.col$(avecol).width=14;

//set bad data as missing data//
wcol(ii)=wcol(ii) > 2000 ? 0/0:wcol(ii);

// Define what gets done at the end of each group
def NewGroup {
set wcol(ii) -bs $(nn - count);
set wcol(ii) -es $(nn - 1);
sum(wcol(ii));
cell(group,grpcol) = month1;
cell(group,avecol) = sum.mean;
group++;
}

// Just keep a count of members in the group
loop(nn,1,end) {
month2 = wcol(jj)[nn]; //get next month values
if(month2 != month1) {
NewGroup;
month1 = month2;
count = 1;
} else {
count += 1;
}
}
NewGroup;

Mike Buess

USA
3037 Posts

Posted - 05/16/2007 : 1:18:31 PM

quote:
I used the 'list -s' command and received 256 files. Let me know how I can prevent this in the script itself, if possible, because when I go to 'save' the project it also takes a few minutes! (In the project there should be 17 data files or data sheets that have over 262,000 rows.)

list s returns a list of datasets (not files) and each dataset is probably a column in a worksheet (e.g., Data1_B). If your project contains 256 datasets and 17 worksheets that would mean each worksheet has approximately 15 columns. If that's correct then you probably don't have extra datasets. I created a worksheet with 256 columns and 262800 rows. It took 60s to save the worksheet as a project so "a few minutes" might not be much an exaggeration for your project. (My project file was 640MB.)

On the other hand, it looks like each of your worksheets starts out with 7 or 8 columns and your script adds two more. So your project should have 9 or 10 columns per worksheet instead of 17. You add the columns with wks.addCol() so each time you run the script on the same worksheet you get two more columns. This has at least two consequences...

>> Even if the extra columns are empty they contribute to file size. I created a worksheet with 10 columns filled with 262800 row numbers and saved as project of size 25MB. When I added 10 empty columns the file size jumped to 50MB.

>> You name the new columns illegally. Column names must be one word but your script uses commands like wks.addCol(Month Global) to create columns with 2-word names. (The fact that you can do that is a bug.) Many LabTalk commands do not work on such columns.

Here are a few suggestions for speeding up your analysis.

1. If each of worksheet represents a different imported data file my first suggestion is to process your data one file at a time and save the results in separate projects or as separate worksheet files (OGW).

2. The command work -v colName creates a column named colName only if the column doesn't already exist. Instead of wks.addCol(...) use work -v Month and work -v Average. This will limit the number of empty columns that take up memory.

3. Depending on the structure of your Month column (col 2) you might be able to establish your group indices much more efficiently than with your LOOP loop.

4. If you have Origin 7.0 or 7.5 you'll benefit greatly by rewriting your script in Origin C, whose loops can be 20x faster than LabTalk loops.

5. If Origin has been running continuously for a long time you should restart Origin and possibly reboot.

Mike Buess
Origin WebRing Member

Edited by - Mike Buess on 05/16/2007 1:58:51 PM

Topic

New Topic

Reply to Topic

Printer Friendly

Jump To:

The Origin Forum

Snitz Forums 2000