Author |
Topic |
|
ten0k
France
8 Posts |
Posted - 08/25/2010 : 05:10:50 AM
|
Origin Ver. and Service Release (Select Help-->About Origin): Operating System: OriginPro 8.0.63.988 SR6, Windows XP
Hi,
I am not sure whether this issue has already been reported since, the forum "search" function does not seem to work ...
I developed a small piece of code in Origin C dedicated to the import of an big amount of data from files in my own format. Data consists of columns of integers, doubles or strings.
I tried to load a dataset of about 200Mo (when it is stored in my own format) and I came across an error reported by Origin (via a popup) : "Can not allocate any more memory for Book2_J".
It is obviously memory related. What is the memory allocation limit ? How can an Origin C code detect this kind of error ?
Thanks |
|
TreeNode
64 Posts |
Posted - 08/25/2010 : 09:39:21 AM
|
Hi tenOk,
Please offer more details about your issue. Best would be to offer your code. Because I cant reproduce your problem.
I tested to import a big file (size: 250 MB) called BIGdata.dat with the import function: void Datasheet_ImportASCII_Ex2()
You can find it here http://ocwiki.originlab.comindex.php?title=OriginC:Import_ASCII My file BIGdata.dat holds 7.810.735 rows with 8 columns of three-digit integers. Importing this data takes time, but works fine, using function from above.
|-- TreeNode ...|-- a?? ...|-- ha!! |
|
|
ten0k
France
8 Posts |
Posted - 08/26/2010 : 11:38:16 AM
|
Hi TreeNode,
Thanks for your answer. I did not specificcaly ran your test because I did not have access to a text file big anough. But I ran some basic tests like :
void test()
{
Worksheet ws;
ws.Create();
int i = 0;
while (1)
{
ws.AddCol();
Dataset ds(ws, i);
printf("%d\n", i);
ds.SetSize(1000000);
i++;
}
}
... and it gives me an error after about 160 iterations. That sounds logical to me, because it is pretty huge.
Anyway, I can't find why my import code produces the very same error even when using a small subset (about 40MB) of the entire dataset. Here is the code I use :
void import(string arcName, vector<int>& vars)
{
if (!VSArchive_open(arcName.GetBuffer(arcName.GetLength())))
return;
int nbVar = vars.GetSize();
out_int("nbVar = ", nbVar);
progressBox progress("VSArchive import", PBOX_TOPMOST);
time_t beginTime, endTime;
time(&beginTime);
// Array<Worksheet&> wsList;
// wsList.SetAsOwner(true);
Worksheet ws;// = new Worksheet();
ws.Create();
double globalT0 = -1;
int i;
for (int id = 0; id < nbVar; id++)
{
i = vars[id];
//ws.DeleteCol(1);
ws.Columns(COLUMN_T0).SetExtendedLabel("T0", RCLT_LONG_NAME);
ws.Columns(COLUMN_T0).SetExtendedLabel("Seconds", RCLT_UNIT);
ws.Columns(COLUMN_T0).SetType(OKDATAOBJ_DESIGNATION_L);
ws.Columns(COLUMN_T0).SetWidth(15);
ws.Columns(COLUMN_T0).SetDigitMode(DIGITS_SIGNIFICANT);
ws.Columns(COLUMN_T0).SetDigits(10);
//wsList.Add(ws);
string name = varDefs[i].name;
ws.SetName(name);
ws.AddCol();
ws.AddCol();
ws.Columns(COLUMN_TIMESTAMP + 2 * id).SetExtendedLabel("Timestamp from T0", RCLT_LONG_NAME);
ws.Columns(COLUMN_TIMESTAMP + 2 * id).SetExtendedLabel("Seconds", RCLT_UNIT);
ws.Columns(COLUMN_TIMESTAMP + 2 * id).SetType(OKDATAOBJ_DESIGNATION_X);
ws.Columns(COLUMN_TIMESTAMP + 2 * id).SetDigitMode(DIGITS_SIGNIFICANT);
ws.Columns(COLUMN_TIMESTAMP + 2 * id).SetDigits(10);
ws.Columns(COLUMN_DATA + 2 * id).SetExtendedLabel(name, RCLT_LONG_NAME);
ws.Columns(COLUMN_DATA + 2 * id).SetDigitMode(DIGITS_SIGNIFICANT);
ws.Columns(COLUMN_DATA + 2 * id).SetDigits(10);
string msg;
msg.Format("Loading variable %s (%d/%d) ... ", name, id+1, nbVar);
progress.SetText(msg + " [reading from disk]", PBOXT_MIDCENTER);
progress.Set(0);
if (progress.IsAbort())
break;
long idx1, idxN;
double ts1, tsN;
VSArchive_getSamplesInfo(i, &idx1, &idxN, &ts1, &tsN);
long nbS = (idxN - idx1) + 1;
// progressive request sample, in order to update the progress bar
// when it is loading
int ns = 1000;
int s = 0;
for (s = 0; s < nbS; s += ns)
{
VSArchive_requestSamples(i, idx1 + s, idx1 + s + ns);
int p = (int)((s + .0) / nbS * 100.0);
progress.Set(p);
if (progress.IsAbort())
break;
}
short * sizes = (short *)malloc(sizeof(short) * nbS);
short maxSize;
long totalSize;
VSArchive_getRequestedSampleSizes(sizes, &maxSize, &totalSize);
printf("totalSize = %ld, maxSize = %d, nbS = %d\n", totalSize, maxSize, nbS);
printf("frequency : %f\n", varDefs[i].frequency);
char *samples = (char*)malloc(totalSize);
double *ts = (double*)malloc(nbS * sizeof(double));
VSArchive_getRequestedSamples(samples, ts);
progress.SetText(msg + " [copying to worksheet]", PBOXT_MIDCENTER);
progress.Set(0);
int nbCol;
//------------------ data --------------------
switch (varDefs[i].type)
{
case VSARCHIVE_TYPE_DOUBLE:
case VSARCHIVE_TYPE_DOUBLE_VECTOR:
{
if (varDefs[i].frequency > 0)
{
Dataset ds;
ds.Attach(ws,COLUMN_DATA + 2 * id);
ds.SetSize(nbCol * nbS);
/* double vectors */
nbCol = maxSize / sizeof(double);
vector<double> vec(nbCol * nbS);
size_t len = nbCol * nbS * sizeof(double);
//memcpy(vec, samples, len);
//ds = vec;
ds.SetSubVector(vec, 0);
}
else
{
MatrixPage mp;
mp.Create("M"+name);
MatrixLayer ml = mp.Layers(0);
Matrix m;
m.Attach(ml);
// Perf issue ?
nbCol = maxSize / sizeof(double);
m.SetSize(nbS, nbCol);
size_t rowLen = nbS * nbCol * sizeof(double);
memcpy(&m, samples, rowLen);
ws.EmbedMatrix(0, COLUMN_DATA + 2 * id, mp);
}
break;
}
case VSARCHIVE_TYPE_INTEGER:
case VSARCHIVE_TYPE_INTEGER_VECTOR:
{
Dataset ds;
ds.Attach(ws,COLUMN_DATA + 2 * id);
ds.SetSize(nbCol * nbS);
/* integer vectors */
nbCol = maxSize / sizeof(long);
vector<long> vec(nbCol * nbS);
printf("nbCol = %d, nbS = %d\n", nbCol, nbS);
size_t len = nbCol * nbS * sizeof(long);
memcpy(vec, samples, len);
ds.SetSubVector(vec, 0);
break;
}
case VSARCHIVE_TYPE_OPAQUE:
{
Dataset ds;
ds.Attach(ws,COLUMN_DATA + 2 * id);
// FIXME
StringArray arr;
size_t k = 0;
for (int i = 0; i < nbS; i++)
{
short s = sizes[i];
string str(samples + k, s);
k += s;
arr.Add(str);
}
ds.PutStringArray(arr);
}
}
progress.SetText(msg + " [adding timestamp]", PBOXT_MIDCENTER);
progress.Set(0);
if (progress.IsAbort())
break;
// ---------------- timestamps ------------------
Dataset tsds;
tsds.Attach(ws, COLUMN_TIMESTAMP + 2 * id);
vector<double> tsV(nbS);
memcpy(tsV, ts, nbS * sizeof(double));
// substract t0 from all timestamps
double t0 = tsV[0];
if (globalT0 == -1)
globalT0 = t0;
if (t0 < globalT0)
globalT0 = t0;
if ((nbCol == 1) || (varDefs[i].frequency == 0))
{
tsds.SetSubVector(tsV, 0);
}
else
{
// We have to build a nbS * nbCol vector of timestamps
// like :
// t0
// t0 + 1 / f
// t0 + 2 / f
// ...
// t1
// t1 + 1 / f
// ...
// in order to do that, and because Origin does not like loops,
// we use matrix operations instead
//vector<double> delta(nbCol * nbS);
printf("Allocate %d elements\n", nbCol * nbS);
tsds.SetSize(nbCol * nbS);
vector<double> delta1(nbCol);
int j;
for (j = 0; j < nbCol; j++)
{
delta1[j] = (j + .0) / varDefs[i].frequency;
}
for (j = 0; j < nbS; j++)
{
double ts = tsV[j];
tsds.SetSubVector(ts + delta1, j * nbCol);
if ((j % ns) == 0)
{
int p = (int)(j / (nbS + .0) * 100.0);
progress.Set(p);
}
}
//tsds = delta;
}
progress.Set(100);
if (progress.IsAbort())
break;
ws.Columns(2 * id + COLUMN_TIMESTAMP).SetWidth(15);
ws.Columns(2 * id + COLUMN_DATA).SetWidth(15);
// ----------------------------------------------
free(samples);
free(ts);
free(sizes);
}
progress.SetText("Final timestamp computation ...", PBOXT_MIDCENTER);
//for (int wsi = 0; wsi < wsList.GetSize(); wsi++)
{
//Worksheet& ws = wsList[wsi];
// timestamp normalisation
ws.SetCell(0, COLUMN_T0, globalT0);
for (i = 0; i < nbVar; i++)
{
Dataset ds;
ds.Attach(ws, COLUMN_TIMESTAMP + 2 * i);
ds = ds - globalT0;
}
}
out_str("Done");
time(&endTime);
printf("Import of variables : %d s.\n", endTime - beginTime);
VSArchive_close();
}
It relies on external functions in a DLL (VSArchive_*). The error happens frequently on "vector.SetSize" or on "DataSet = Vector", or DataSet.SetSubVector(). Apparently something is leaking memory in my code. Is there a way to know how much memory has been allocated so far ?
|
|
|
Penn
China
644 Posts |
Posted - 08/26/2010 : 10:36:22 PM
|
Hi,
I can reproduce your problem by using the simple test code you provided. That is because all available memory has been allocated and then an error message box pops up to tell you no more memory can be allocated.
Currently, there is no such function to get how much memory has been allocated. You can see the information from the Windows Task Manager instead.
In our coming version (Origin 8.5), some system variables are added for getting the processes information, such as memory usage, virtual memory usage, etc.
Penn |
|
|
ten0k
France
8 Posts |
Posted - 08/31/2010 : 09:38:30 AM
|
You're right.
But I can't understand this behaviour. I am more used to Linux system, where when you ask to allocate some memory, it will always work except when all the *virtual* memory have been allocated (4GB on 32bits system by process). If you ask more than physical memory, the system swaps.
It seems like Origin (or maybe Windows) tries to allocate only on available *physical* memory, i.e. something shared among all the processes. And when no memory is left, the system does not swap !
It leads to: even if I've managed to estimate the amount of memory that will be needed for my import and ensured that the current available memory is enough for the import, my user must NOT execute anything else on the system during it or she will see an awful popup saying "Cannot allocate any more memory". When the import runs for minutes it is just boring.
Just for my understanding : do you know if it's related to the way Windows manages memory or to the way Origin deals with it ? Thanks.
|
|
|
Penn
China
644 Posts |
Posted - 08/31/2010 : 11:07:36 PM
|
Hi,
The key problem is about the function malloc you have used in your code to assign memory. This function is operating system thing, and it will use the way of Windows to assign memory.
You can try to change the way you assign memory in your code by using the vector, which will use the way of Origin to assign memory. Such as:
char *samples = (char*)malloc(totalSize);
by using "vector<char>":
vector<char> samples; samples.SetSize(size);
You can refer to this page about the vector class.
Hope it can help.
Penn |
|
|
ten0k
France
8 Posts |
Posted - 09/01/2010 : 09:26:01 AM
|
Hi,
Well ... I replaced all the occurrences of "malloc(sizeof(X) * N)" by a "vector<X) t(N)" ... and it does not change anything about the way memory is allocated. I still have to predict how much my import program will take in main memory ... |
|
|
ten0k
France
8 Posts |
Posted - 09/01/2010 : 11:08:30 AM
|
Can you tell me how the memory allocation of Origin C is done ? When the "Cannot allocate any memory" message pops up, what kind of memory is it related to ? I've added some tests to my import function, bsaed on the available physical memory, thanks to an external function in a DLL that calls the "GlobalMemoryStatusEx" of Windows API and returns the "uilAvailPhys" member.
But after some experiments, I am not sure this is the good number that represent the amount of available memory Origin C tests.
|
|
|
ten0k
France
8 Posts |
Posted - 09/03/2010 : 12:04:23 PM
|
For people that may have a similar problem : I've found something interesting.
I think my problem is related to the way big amount of data is allocated : if you ask for N bytes, the Windows memory manager has to find an area of virtual memory where N bytes are present in a contiguous way.
I've used this piece of code to determine the larget block of memory that can currently be allocated :
unsigned long getLargestMemoryAvailable()
{
unsigned long minSize = 0;
unsigned long maxSize = 0x80000000;
unsigned long size = maxSize;
// get the larget block of memory available
// with a precision of 64kB (2 ^ 16)
for (int i = 0; i < 16; i++)
{
char * p = (char*)malloc(size);
if (p == 0)
{
maxSize = size;
}
else
{
free(p);
minSize = size;
}
size = (minSize + maxSize) / 2;
}
printf("size = %d kB\n", minSize / 1024);
return minSize / 1024;
}
First, when I launch Origin Pro with an empty projet and the only code builder, after compiling my code, the previous function returns about 600 MB. That is pretty low compared to the 2GB that can be theoretically available (well 2GB minus the data already allocated by the Origin process), it means the memory is a bit fragmented.
I've added a call to this function each time I have to allocate new memory (via a DataSet.SetSize or similar) and the infamous "cannot allocate any more memory" popup occurs when the largest available block is smaller then the requested allocation size.
Solution : test the availability of memory before each call that may allocate something (vector declaration, call to SetSize, = operator, etc.) by a test with malloc of the estimated size.
To prevent this from happening, it would be useful to reduce the amount of memory that is requested at the same time. But, it will need loops that are poorly processed by Origin C (you loose in speed what you gain in memory) and I'm not sure I can access part of a Worksheet column (except by using inefficient Worksheet.SetCell). Maybe someone has other ideas ?
|
|
|
ML
USA
63 Posts |
Posted - 09/03/2010 : 12:09:35 PM
|
Hi ten0k,
In Origin in general memory is allocated in different ways depending on the purpose. What typically takes up most memory, though, is the allocation of memory needed to store values held in columns (datasets). In Origin 8.0 for this purpose the API pair GlobalAlloc()-GlobalFree() is used.
It is difficult to tell from your code sample how much memory is allocated inside datasets, and how much is outside (such as in your custom DLL) + all the malloc()s/calloc()s.
Origin 8.0 does contain the following script command, which might be relevant here:
type -dv;
which will dump some memory information of the Origin process in which you execute it. For example:
Total physical memory = 0x7fdfe000 2145378304
Avail. physical memory = 0x49b40000 1236533248
Total page file (?) = 0xd5902000 3582992384
Avail. page file (?) = 0xac035000 2885898240
Total virtual space = 0x7ffe0000 2147352576
Avail. virtual space = 0x6c96c000 1821818880
Used virtual space = 0x13674000 325533696
Largest free region size = 0x29ad0000 699203584 at address 0x14500000
For allocating large chunks of memory, the last number ("Largest free region size =...") should be the most relevant, as it provides the size of the currently largest free chunk.
You can try in your OC code to insert this line of code at oportune times (such as once every few iterations, etc.):
LT_execute("type -dv;");
which will simply execute the script command, and watch the info be dumped, and see how the numbers change as your code execute.
Of course, if the memory is really tight due to too much allocating, you ough to try, if possible, to free up the columns/allocated memory which you do not need along the way.
ML |
|
|
ML
USA
63 Posts |
Posted - 09/03/2010 : 12:27:36 PM
|
Apparently something is leaking memory in my code. Is there a way to know how much memory has been allocated so far ?
Hi ten0K,
There is a tour-de-force way which sometimes points at the source of the leak (if there is any): try to gradually comment out pieces of code in your loops and see if the supposed leak goes away (just make sure you restart Origin for every new test).
ML
|
|
|
ten0k
France
8 Posts |
Posted - 09/06/2010 : 03:43:49 AM
|
Hi,
Thanks ML for your answers. I did not know this "type -dv" command, it could be useful.
However, I get exactly the same result with my little piece of code that approximate the largest free chunk by means of malloc. I now have to choose between these two options.
Now to prevent my code from failing without catching errors, I have to insert a test on available memory before each instruction that might involve memory allocation (vector creation, SetSize, etc.) and estimate how much memory it will demand.
|
|
|
ten0k
France
8 Posts |
Posted - 09/06/2010 : 03:48:12 AM
|
Oh ... and by the way, I think my "memory leak" problem did not really exist. The reducing free memory was due to memory fragmentation |
|
|
additive
Germany
109 Posts |
|
|
Topic |
|
|
|