The Origin Forum
File Exchange
Try Origin for Free
The Origin Forum
Home | Profile | Register | Active Topics | Members | Search | FAQ | Send File to Tech support
Username:
Password:
Save Password
Forgot your Password? | Admin Options

 All Forums
 Origin Forum
 Origin Forum
 is there a function to do this?
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic Lock Topic Edit Topic Delete Topic New Topic Reply to Topic

wallacej

UK
Posts

Posted - 02/03/2005 :  06:39:45 AM  Show Profile  Edit Topic  Reply with Quote  View user's IP address  Delete Topic
Origin Version (7.5)
Operating System: Win2k

My knowledge on statistics and data manipulation is not very good and there probably is a general function available to solve my problem.

If I have a 2 column table of data as follows:
X Y
1 2
3 3
1 2
3 2
6 1
3 2
3 2
3 2
1 2

How do I calculate the the number of occurences of each number combination, generating the results as below:

Combination = f
1 2 = 3
3 3 = 1
3 2 = 4
6 1 = 1

The way that the results are presented is not important, the above is for illustration only.

Thank You

cpyang

USA
1406 Posts

Posted - 02/03/2005 :  07:53:59 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
You can make a histogram plot, here is how

1. add the following Origin C function that will generate a new column containing string values of X-Y

 
vector<string> combine(vector& x, vector& y)
{
string str;
vector<string> vs;

for(int ii = 0; ii < x.GetSize(); ii++)
{
str.Format("%d-%d",(int)x[ii],(int)y[ii]);
vs.Add(str);
}
return vs;
}



2. Use Set Column Value to make the combined column
Assuming that you have X and Y in Col(A) and Col(B), then add Col(C) and in Set Column Value, type

combine(col(a),col(b))

This will generate the X-Y column

3. Plot this new Col(C) as a histogram, Origin will ask and confirm that you want to convert the Col(C) into categorical, and after that, you should get the histogram.


CP



Go to Top of Page

wallacej

UK
Posts

Posted - 02/03/2005 :  09:18:52 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Thank you that worked very well, a simple-genius solution.

I would like the combination to combine the two numbers as floats to preserve accuracy. I changed the following line:

 str.Format("%f-%f",(float)x[ii],(float)y[ii]);


And this worked perfectly, the strings were combined and inserted into the worksheet e.g.:
92.256547-0.568545

When I plotted the histogram no data appeared.
I checked the bin and only 25 values were reported each with frequencies of 1.0.

Do you have any ideas as to why this would happen?

Many thanks
Go to Top of Page

cpyang

USA
1406 Posts

Posted - 02/03/2005 :  09:53:48 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
if frequencies all 1 that means they are all unique. When you use float then that will all be different, you need to decide on a resolution, so maybe change into

str.Format("%5.2f-%5.2f",(float)x[ii],(float)y[ii]);

with 2 decimal places?

CP



Go to Top of Page

wallacej

UK
Posts

Posted - 02/03/2005 :  10:12:18 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
That still doesn't work. I still only get 25 values. If I use float %5.0f it will work, so float is not the problem.

When I increase one of the values to %5.1f I get this problem. Do you think it is possible that I am overloading some resource?

My dataset has on average 30,000 rows where the combined values are generated from and it is not rare to get a high frequency of repeat combined values.
Go to Top of Page

easwar

USA
1965 Posts

Posted - 02/03/2005 :  10:20:48 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi,

Would your data always be only two columns and you want to find the count for each combination of values in the two columns? If yes, then what would work for you is 2D binning which would first create a matrix with the counts in two dimensions, which can then be converted to a wks to show how many counts there are for each combination of x,y.

If this would work, we can post code to do this.

Easwar
OriginLab

Go to Top of Page

wallacej

UK
Posts

Posted - 02/03/2005 :  10:33:40 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Yes, the two columns will always be the same and will always be the two for x/y combination count.

Can you provide some code/information on 2D binning please?

Thank You

ps
Another thing I would like to do following success at this operation is to put the combination count values in a column on the worksheet. The count value should be placed on every row that made up the count. I'm not sure if this will affect what you are going to send me, or if its at all possible

Thanks again
Go to Top of Page

cpyang

USA
1406 Posts

Posted - 02/03/2005 :  10:35:32 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Yes, looks like categorical column making histogram has internal size limitation. I tried and 1000 points work but 5000 did not. So looks like pure Origin C solution with 2D binning is the way to go.

CP


Go to Top of Page

easwar

USA
1965 Posts

Posted - 02/03/2005 :  11:15:56 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi,

Try the OC function pasted here.
For the specific dataset you pasted in this thread at the beginning, you can then call this function from script window with the command:
bin_2d 1 6 1 1 3 1
which essentially says, bins in x should be centered at 1, 2, 3, 4, 5, 6 and bins in y should be centered at 1, 2, 3
These values can be non-integers so you can decide what precision you want etc.

Now, the function assumes a wks is active and it takes the first two cols of the wks to be containing the raw data.

I did not test this function extensively. So please test and let me know if you find bugs and I will fix.

The final result is a new wks that shows the counts in the way you specified in your original post. There is also an intermediate matrix that holds the counts in 2D, and I have left the matrix as is. You can modify the code to delete the matrix at the end if you do not want the matrix. The matrix can be used to say plot 3D bar plot which would then be the "histogram" in 2D, or it can be viewed as image to "see" the count distribution etc.

Easwar
OriginLab



////////////////////////////////////////////////////////////////////////////////////
// This function takes the data in the first two columns of the active worksheet
// and performs 2D binning of the data.
// Parameters:
// dXStart: bin center for 1st X bin
// dXEnd: bin center for last X bin
// dXStep: width of each bin
// dYStart: bin center for 1st Y bin
// dYEnd: bin center for last Y bin
// dYStep: width of each bin
//
void bin_2d(double dXStart, double dXEnd, double dXStep, double dYStart, double dYEnd, double dYStep)
{
// Declare worksheet with active layer
Worksheet wksData = Project.ActiveLayer();
if( !wksData )
{
out_str("Active layer is not a worksheet!");
return;
}

// First copy data from 1st and 2nd cols of wks to a matrix
matrix matData;
matData.CopyFrom(wksData, 0, 0, -1, 1);

// Compute number of bins needed in x and y directions
int nXBins = 1 + (dXEnd - dXStart) / dXStep;
int nYBins = 1 + (dYEnd - dYStart) / dYStep;

// Create a matrix object to hold 2D bin counts
MatrixPage pgMat;
pgMat.Create("Origin");
MatrixLayer lyMat = pgMat.Layers(0);
Matrix MatBins(lyMat);
// Set the size and X, Y coordinates of the matrix
MatBins.SetSize(nYBins, nXBins);
MatBins.SetXMin(dXStart);
MatBins.SetXMax(dXEnd);
MatBins.SetYMin(dYStart);
MatBins.SetYMax(dYEnd);
MatBins = 0;

// Loop over all rows of the data matrix - this corresponds to rows in wks
int nRows = matData.GetNumRows();
for(int iRow = 0; iRow < nRows; iRow++)
{
// Get X, Y values to be binned
double dX = matData[iRow][0];
double dY = matData[iRow][1];
// Update matrix count at the appropriate cell
MatBins.SetCellValue(dX, dY, MatBins.GetCellValue(dX, dY) + 1);
}

// Create new wks to hold counts
WorksheetPage wpg;
wpg.Create("Origin");
Worksheet wksCounts = wpg.Layers(0);
wpg.Label = "2D Bin counts for " + wksData.GetPage().GetName();
wpg.TitleShow = WIN_TITLE_SHOW_BOTH;
// Delete all columns and add three new cols
while(wksCounts.DeleteCol(0));
wksCounts.AddCol("XValue");
wksCounts.AddCol("YValue");
wksCounts.AddCol("Counts");

// Loop over all cells of count matrix and enter non-zero cell values into counts wks
int iNumCountRows = 0;
for(int ir = 0; ir < MatBins.GetNumRows(); ir++)
{
for(int ic = 0; ic < MatBins.GetNumCols(); ic++)
{
int iCount = MatBins[ir][ic];
// If count not zero...
if( 0 != iCount )
{
// Set 1st col of count wks to corresponding X value
wksCounts.SetCell(iNumCountRows, 0 , MatBins.GetXValue(ic));
// Set 2nd col of count wks to corresponding Y value
wksCounts.SetCell(iNumCountRows, 1 , MatBins.GetYValue(ir));
// Set 3rd col of count wks to number of counts
wksCounts.SetCell(iNumCountRows, 2, iCount);
iNumCountRows++;
}
}
}
}
////////////////////////////////////////////////////////////////////////////////////


Go to Top of Page

wallacej

UK
Posts

Posted - 02/03/2005 :  12:09:03 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
thank you for the help. The code ran without any bugs. It takes a very long time when the step accuracies go beyond 0.1 but I suppose there is nothing that can be done with such a large dataset.

Thanks
Go to Top of Page

easwar

USA
1965 Posts

Posted - 02/03/2005 :  12:11:21 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi,

I will check with others to see if I am doing any loops that can be replaced with something more efficient.

Also, once you compile and link, close Code Builder. That will make the function execute faster.

Easwar
OriginLab

Go to Top of Page

wallacej

UK
Posts

Posted - 02/03/2005 :  12:29:08 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
tried that, and it runs much faster, if you could find some quicker loops it would be much appreaciated.

Thank You, today you've been most helpfull
Go to Top of Page

easwar

USA
1965 Posts

Posted - 02/03/2005 :  1:00:46 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi,

This modified code should run faster - the nested loop at the end has been replaced with a direct way to convert matrix to worksheet XYZ and then remove zero-valued rows from wks using a single loop. I will look again to see if it can be made more efficient.

Easwar
OriginLab


////////////////////////////////////////////////////////////////////////////////////
// This function takes the data in the first two columns of the active worksheet
// and performs 2D binning of the data.
// Parameters:
// dXStart: bin center for 1st X bin
// dXEnd: bin center for last X bin
// dXStep: width of each bin
// dYStart: bin center for 1st Y bin
// dYEnd: bin center for last Y bin
// dYStep: width of each bin
//
void bin_2d(double dXStart, double dXEnd, double dXStep, double dYStart, double dYEnd, double dYStep)
{
// Declare worksheet with active layer
Worksheet wksData = Project.ActiveLayer();
if( !wksData )
{
out_str("Active layer is not a worksheet!");
return;
}

// First copy data from 1st and 2nd cols of wks to a matrix
matrix matData;
matData.CopyFrom(wksData, 0, 0, -1, 1);

// Compute number of bins needed in x and y directions
int nXBins = 1 + (dXEnd - dXStart) / dXStep;
int nYBins = 1 + (dYEnd - dYStart) / dYStep;

// Create a matrix object to hold 2D bin counts
MatrixPage pgMat;
pgMat.Create("Origin");
MatrixLayer lyMat = pgMat.Layers(0);
Matrix MatBins(lyMat);
// Set the size and X, Y coordinates of the matrix
MatBins.SetSize(nYBins, nXBins);
MatBins.SetXMin(dXStart);
MatBins.SetXMax(dXEnd);
MatBins.SetYMin(dYStart);
MatBins.SetYMax(dYEnd);
MatBins = 0;

// Loop over all rows of the data matrix - this corresponds to rows in wks
int nRows = matData.GetNumRows();
for(int iRow = 0; iRow < nRows; iRow++)
{
// Get X, Y values to be binned
double dX = matData[iRow][0];
double dY = matData[iRow][1];
// Update matrix count at the appropriate cell
MatBins.SetCellValue(dX, dY, MatBins.GetCellValue(dX, dY) + 1);
}

// Create new wks to hold counts
WorksheetPage wpg;
wpg.Create("Origin");
Worksheet wksCounts = wpg.Layers(0);
wpg.Label = "2D Bin counts for " + wksData.GetPage().GetName();
wpg.TitleShow = WIN_TITLE_SHOW_BOTH;
// Delete all columns and add three new cols
while(wksCounts.DeleteCol(0));
wksCounts.AddCol("XValue");
wksCounts.AddCol("YValue");
wksCounts.AddCol("Counts");

// Convert the matrix to XYZ columns of the counts wks
LabTalk.mat.matname$ = pgMat.GetName();
LabTalk.mat.wksname$ = wpg.GetName();
LabTalk.mat.m2xyz();

// Loop over all rows of wks backwards and delete rows that have 0 in 3rd col
for(int ir = wksCounts.GetNumRows() - 1; ir >= 0; ir--)
{
if( 0 == wksCounts.Cell(ir, 2) ) wksCounts.DeleteRow(ir);
}
}
////////////////////////////////////////////////////////////////////////////////////


Go to Top of Page

wallacej

UK
Posts

Posted - 02/04/2005 :  08:37:01 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
That helps a lot, it runs faster when called from LabTalk and the modified function is faster still. I think my machine is running out of memory when i get to steps of 0.001 or end x values towards 1000.


Cheers for your help on this
Go to Top of Page
  Previous Topic Topic Next Topic Lock Topic Edit Topic Delete Topic New Topic Reply to Topic
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
The Origin Forum © 2020 Originlab Corporation Go To Top Of Page
Snitz Forums 2000