Author |
Topic  |
|
chiacchi
USA
Posts |
Posted - 05/13/2007 : 9:36:50 PM
|
I have been trying to use 'loops' instead of 'for' loops in my scripts using LabTalk but the execution still takes quite a while to complete. The script I am using averages a column of values (total of 262,800 rows) in one dataset, which takes about 8 minutes. Is this typical for Origin using LabTalk or could the reason be that I have only 10 GB left on my harddrive? Any suggestions? |
|
Mike Buess
USA
3037 Posts |
Posted - 05/13/2007 : 11:21:31 PM
|
Disk space should have nothing to do with it unless your physical memory is low and you rely on the page file. 8 minutes seems unreasonably long to average a column of 262,800 rows. On my PC (1.3GHz/1.2GB) the LOOP loop below takes approximately 16 seconds and the SUM command does the same thing in 0.5 seconds. (O50 and O75) Note that using FOR loop instead of LOOP loop increases execution time to only 22 seconds so the advantage of LOOP over FOR is not dramatic. The SUM function does much better than either type of loop.
nn=262800; mean=0; loop(ii,1,nn) {mean += col(A)[ii]}; mean /= nn; mean=; MEAN=average value
sum(col(A)); sum.mean=; SUM.MEAN=average value
Mike Buess Origin WebRing Member
Edited by - Mike Buess on 05/13/2007 11:46:50 PM |
 |
|
chiacchi
USA
Posts |
Posted - 05/14/2007 : 9:18:45 PM
|
Thanks Mike! However, my PC is also (1.3GHz and 1.2GB), but not sure why it takes so long to run the script. At times there is a warning that says something like, low virtual memory. Not sure what's going on. Any clues? |
 |
|
larry_lan
China
Posts |
Posted - 05/14/2007 : 10:06:51 PM
|
What's your script to average a column of data?
Larry OriginLab Technical Services |
 |
|
Mike Buess
USA
3037 Posts |
Posted - 05/14/2007 : 11:39:58 PM
|
Of course we really do need to see your script (I'm surprised you haven't shown it already) but it's possible that your project has collected a lot of "junk" (temporary datasets, windows, etc). The easiest way to test that is to try your script in a new project. If you need to run in an existing project search the index of the programming guide for the list command with which you can find out how many datasets and variables exist and the delete command which deletes them.
http://www.originlab.com/forum/topic.asp?TOPIC_ID=4579
Mike Buess Origin WebRing Member |
 |
|
chiacchi
USA
Posts |
Posted - 05/15/2007 : 6:33:24 PM
|
Hi Mike (or Larry),
I used the 'list -s' command and received 256 files. Let me know how I can prevent this in the script itself, if possible, because when I go to 'save' the project it also takes a few minutes! (In the project there should be 17 data files or data sheets that have over 262,000 rows.)
I used the same script in a new project with only one dataset and it took about 2 minutes. I am wondering if this is more normal or can it be faster. Here is the script that I am using to compute the monthly averages for one of the columns in my data sheet. It was originally written by someone else and I modified it slightly:
jj = 2; // Index of the Month column; get %(%H,jj) -e end; //get number of rows of data in month column ncols = wks.ncols; //number of columns in worksheet
ii=8; //renames the column name if (ii==8) { work -n $(ii) Global; } %L = wks.col$(ii).name$; month1 = wcol(jj)[1]; ;
count = 0; group = 1;
wks.addcol(Month %L); grpcol = wks.ncols; wks.col$(grpcol).width=14;
wks.AddCol(Average %L); avecol = wks.ncols; wks.col$(avecol).width=14;
//set bad data as missing data// wcol(ii)=wcol(ii) > 2000 ? 0/0:wcol(ii);
// Define what gets done at the end of each group def NewGroup { set wcol(ii) -bs $(nn - count); set wcol(ii) -es $(nn - 1); sum(wcol(ii)); cell(group,grpcol) = month1; cell(group,avecol) = sum.mean; group++; }
// Just keep a count of members in the group loop(nn,1,end) { month2 = wcol(jj)[nn]; //get next month values if(month2 != month1) { NewGroup; month1 = month2; count = 1; } else { count += 1; } } NewGroup;
|
 |
|
Mike Buess
USA
3037 Posts |
Posted - 05/16/2007 : 1:18:31 PM
|
quote: I used the 'list -s' command and received 256 files. Let me know how I can prevent this in the script itself, if possible, because when I go to 'save' the project it also takes a few minutes! (In the project there should be 17 data files or data sheets that have over 262,000 rows.)
list s returns a list of datasets (not files) and each dataset is probably a column in a worksheet (e.g., Data1_B). If your project contains 256 datasets and 17 worksheets that would mean each worksheet has approximately 15 columns. If that's correct then you probably don't have extra datasets. I created a worksheet with 256 columns and 262800 rows. It took 60s to save the worksheet as a project so "a few minutes" might not be much an exaggeration for your project. (My project file was 640MB.)
On the other hand, it looks like each of your worksheets starts out with 7 or 8 columns and your script adds two more. So your project should have 9 or 10 columns per worksheet instead of 17. You add the columns with wks.addCol() so each time you run the script on the same worksheet you get two more columns. This has at least two consequences...
>> Even if the extra columns are empty they contribute to file size. I created a worksheet with 10 columns filled with 262800 row numbers and saved as project of size 25MB. When I added 10 empty columns the file size jumped to 50MB.
>> You name the new columns illegally. Column names must be one word but your script uses commands like wks.addCol(Month Global) to create columns with 2-word names. (The fact that you can do that is a bug.) Many LabTalk commands do not work on such columns.
Here are a few suggestions for speeding up your analysis.
1. If each of worksheet represents a different imported data file my first suggestion is to process your data one file at a time and save the results in separate projects or as separate worksheet files (OGW).
2. The command work -v colName creates a column named colName only if the column doesn't already exist. Instead of wks.addCol(...) use work -v Month and work -v Average. This will limit the number of empty columns that take up memory.
3. Depending on the structure of your Month column (col 2) you might be able to establish your group indices much more efficiently than with your LOOP loop.
4. If you have Origin 7.0 or 7.5 you'll benefit greatly by rewriting your script in Origin C, whose loops can be 20x faster than LabTalk loops.
5. If Origin has been running continuously for a long time you should restart Origin and possibly reboot.
Mike Buess Origin WebRing Member
Edited by - Mike Buess on 05/16/2007 1:58:51 PM |
 |
|
|
Topic  |
|