T O P I C R E V I E W |
wallacej |
Posted - 02/03/2005 : 06:39:45 AM Origin Version (7.5) Operating System: Win2k
My knowledge on statistics and data manipulation is not very good and there probably is a general function available to solve my problem.
If I have a 2 column table of data as follows: X Y 1 2 3 3 1 2 3 2 6 1 3 2 3 2 3 2 1 2
How do I calculate the the number of occurences of each number combination, generating the results as below:
Combination = f 1 2 = 3 3 3 = 1 3 2 = 4 6 1 = 1
The way that the results are presented is not important, the above is for illustration only.
Thank You
|
13 L A T E S T R E P L I E S (Newest First) |
wallacej |
Posted - 02/04/2005 : 08:37:01 AM That helps a lot, it runs faster when called from LabTalk and the modified function is faster still. I think my machine is running out of memory when i get to steps of 0.001 or end x values towards 1000.
Cheers for your help on this |
easwar |
Posted - 02/03/2005 : 1:00:46 PM Hi,
This modified code should run faster - the nested loop at the end has been replaced with a direct way to convert matrix to worksheet XYZ and then remove zero-valued rows from wks using a single loop. I will look again to see if it can be made more efficient.
Easwar OriginLab
//////////////////////////////////////////////////////////////////////////////////// // This function takes the data in the first two columns of the active worksheet // and performs 2D binning of the data. // Parameters: // dXStart: bin center for 1st X bin // dXEnd: bin center for last X bin // dXStep: width of each bin // dYStart: bin center for 1st Y bin // dYEnd: bin center for last Y bin // dYStep: width of each bin // void bin_2d(double dXStart, double dXEnd, double dXStep, double dYStart, double dYEnd, double dYStep) { // Declare worksheet with active layer Worksheet wksData = Project.ActiveLayer(); if( !wksData ) { out_str("Active layer is not a worksheet!"); return; } // First copy data from 1st and 2nd cols of wks to a matrix matrix matData; matData.CopyFrom(wksData, 0, 0, -1, 1); // Compute number of bins needed in x and y directions int nXBins = 1 + (dXEnd - dXStart) / dXStep; int nYBins = 1 + (dYEnd - dYStart) / dYStep; // Create a matrix object to hold 2D bin counts MatrixPage pgMat; pgMat.Create("Origin"); MatrixLayer lyMat = pgMat.Layers(0); Matrix MatBins(lyMat); // Set the size and X, Y coordinates of the matrix MatBins.SetSize(nYBins, nXBins); MatBins.SetXMin(dXStart); MatBins.SetXMax(dXEnd); MatBins.SetYMin(dYStart); MatBins.SetYMax(dYEnd); MatBins = 0; // Loop over all rows of the data matrix - this corresponds to rows in wks int nRows = matData.GetNumRows(); for(int iRow = 0; iRow < nRows; iRow++) { // Get X, Y values to be binned double dX = matData[iRow][0]; double dY = matData[iRow][1]; // Update matrix count at the appropriate cell MatBins.SetCellValue(dX, dY, MatBins.GetCellValue(dX, dY) + 1); }
// Create new wks to hold counts WorksheetPage wpg; wpg.Create("Origin"); Worksheet wksCounts = wpg.Layers(0); wpg.Label = "2D Bin counts for " + wksData.GetPage().GetName(); wpg.TitleShow = WIN_TITLE_SHOW_BOTH; // Delete all columns and add three new cols while(wksCounts.DeleteCol(0)); wksCounts.AddCol("XValue"); wksCounts.AddCol("YValue"); wksCounts.AddCol("Counts"); // Convert the matrix to XYZ columns of the counts wks LabTalk.mat.matname$ = pgMat.GetName(); LabTalk.mat.wksname$ = wpg.GetName(); LabTalk.mat.m2xyz(); // Loop over all rows of wks backwards and delete rows that have 0 in 3rd col for(int ir = wksCounts.GetNumRows() - 1; ir >= 0; ir--) { if( 0 == wksCounts.Cell(ir, 2) ) wksCounts.DeleteRow(ir); } } ////////////////////////////////////////////////////////////////////////////////////
|
wallacej |
Posted - 02/03/2005 : 12:29:08 PM tried that, and it runs much faster, if you could find some quicker loops it would be much appreaciated.
Thank You, today you've been most helpfull |
easwar |
Posted - 02/03/2005 : 12:11:21 PM Hi,
I will check with others to see if I am doing any loops that can be replaced with something more efficient.
Also, once you compile and link, close Code Builder. That will make the function execute faster.
Easwar OriginLab
|
wallacej |
Posted - 02/03/2005 : 12:09:03 PM thank you for the help. The code ran without any bugs. It takes a very long time when the step accuracies go beyond 0.1 but I suppose there is nothing that can be done with such a large dataset.
Thanks |
easwar |
Posted - 02/03/2005 : 11:15:56 AM Hi,
Try the OC function pasted here. For the specific dataset you pasted in this thread at the beginning, you can then call this function from script window with the command: bin_2d 1 6 1 1 3 1 which essentially says, bins in x should be centered at 1, 2, 3, 4, 5, 6 and bins in y should be centered at 1, 2, 3 These values can be non-integers so you can decide what precision you want etc.
Now, the function assumes a wks is active and it takes the first two cols of the wks to be containing the raw data.
I did not test this function extensively. So please test and let me know if you find bugs and I will fix.
The final result is a new wks that shows the counts in the way you specified in your original post. There is also an intermediate matrix that holds the counts in 2D, and I have left the matrix as is. You can modify the code to delete the matrix at the end if you do not want the matrix. The matrix can be used to say plot 3D bar plot which would then be the "histogram" in 2D, or it can be viewed as image to "see" the count distribution etc.
Easwar OriginLab
//////////////////////////////////////////////////////////////////////////////////// // This function takes the data in the first two columns of the active worksheet // and performs 2D binning of the data. // Parameters: // dXStart: bin center for 1st X bin // dXEnd: bin center for last X bin // dXStep: width of each bin // dYStart: bin center for 1st Y bin // dYEnd: bin center for last Y bin // dYStep: width of each bin // void bin_2d(double dXStart, double dXEnd, double dXStep, double dYStart, double dYEnd, double dYStep) { // Declare worksheet with active layer Worksheet wksData = Project.ActiveLayer(); if( !wksData ) { out_str("Active layer is not a worksheet!"); return; } // First copy data from 1st and 2nd cols of wks to a matrix matrix matData; matData.CopyFrom(wksData, 0, 0, -1, 1); // Compute number of bins needed in x and y directions int nXBins = 1 + (dXEnd - dXStart) / dXStep; int nYBins = 1 + (dYEnd - dYStart) / dYStep; // Create a matrix object to hold 2D bin counts MatrixPage pgMat; pgMat.Create("Origin"); MatrixLayer lyMat = pgMat.Layers(0); Matrix MatBins(lyMat); // Set the size and X, Y coordinates of the matrix MatBins.SetSize(nYBins, nXBins); MatBins.SetXMin(dXStart); MatBins.SetXMax(dXEnd); MatBins.SetYMin(dYStart); MatBins.SetYMax(dYEnd); MatBins = 0; // Loop over all rows of the data matrix - this corresponds to rows in wks int nRows = matData.GetNumRows(); for(int iRow = 0; iRow < nRows; iRow++) { // Get X, Y values to be binned double dX = matData[iRow][0]; double dY = matData[iRow][1]; // Update matrix count at the appropriate cell MatBins.SetCellValue(dX, dY, MatBins.GetCellValue(dX, dY) + 1); }
// Create new wks to hold counts WorksheetPage wpg; wpg.Create("Origin"); Worksheet wksCounts = wpg.Layers(0); wpg.Label = "2D Bin counts for " + wksData.GetPage().GetName(); wpg.TitleShow = WIN_TITLE_SHOW_BOTH; // Delete all columns and add three new cols while(wksCounts.DeleteCol(0)); wksCounts.AddCol("XValue"); wksCounts.AddCol("YValue"); wksCounts.AddCol("Counts"); // Loop over all cells of count matrix and enter non-zero cell values into counts wks int iNumCountRows = 0; for(int ir = 0; ir < MatBins.GetNumRows(); ir++) { for(int ic = 0; ic < MatBins.GetNumCols(); ic++) { int iCount = MatBins[ir][ic]; // If count not zero... if( 0 != iCount ) { // Set 1st col of count wks to corresponding X value wksCounts.SetCell(iNumCountRows, 0 , MatBins.GetXValue(ic)); // Set 2nd col of count wks to corresponding Y value wksCounts.SetCell(iNumCountRows, 1 , MatBins.GetYValue(ir)); // Set 3rd col of count wks to number of counts wksCounts.SetCell(iNumCountRows, 2, iCount); iNumCountRows++; } } } } ////////////////////////////////////////////////////////////////////////////////////
|
cpyang |
Posted - 02/03/2005 : 10:35:32 AM Yes, looks like categorical column making histogram has internal size limitation. I tried and 1000 points work but 5000 did not. So looks like pure Origin C solution with 2D binning is the way to go.
CP
|
wallacej |
Posted - 02/03/2005 : 10:33:40 AM Yes, the two columns will always be the same and will always be the two for x/y combination count.
Can you provide some code/information on 2D binning please?
Thank You
ps Another thing I would like to do following success at this operation is to put the combination count values in a column on the worksheet. The count value should be placed on every row that made up the count. I'm not sure if this will affect what you are going to send me, or if its at all possible
Thanks again |
easwar |
Posted - 02/03/2005 : 10:20:48 AM Hi,
Would your data always be only two columns and you want to find the count for each combination of values in the two columns? If yes, then what would work for you is 2D binning which would first create a matrix with the counts in two dimensions, which can then be converted to a wks to show how many counts there are for each combination of x,y.
If this would work, we can post code to do this.
Easwar OriginLab
|
wallacej |
Posted - 02/03/2005 : 10:12:18 AM That still doesn't work. I still only get 25 values. If I use float %5.0f it will work, so float is not the problem.
When I increase one of the values to %5.1f I get this problem. Do you think it is possible that I am overloading some resource?
My dataset has on average 30,000 rows where the combined values are generated from and it is not rare to get a high frequency of repeat combined values. |
cpyang |
Posted - 02/03/2005 : 09:53:48 AM if frequencies all 1 that means they are all unique. When you use float then that will all be different, you need to decide on a resolution, so maybe change into
str.Format("%5.2f-%5.2f",(float)x[ii],(float)y[ii]);
with 2 decimal places?
CP
|
wallacej |
Posted - 02/03/2005 : 09:18:52 AM Thank you that worked very well, a simple-genius solution.
I would like the combination to combine the two numbers as floats to preserve accuracy. I changed the following line:
str.Format("%f-%f",(float)x[ii],(float)y[ii]);
And this worked perfectly, the strings were combined and inserted into the worksheet e.g.: 92.256547-0.568545
When I plotted the histogram no data appeared. I checked the bin and only 25 values were reported each with frequencies of 1.0.
Do you have any ideas as to why this would happen?
Many thanks |
cpyang |
Posted - 02/03/2005 : 07:53:59 AM You can make a histogram plot, here is how
1. add the following Origin C function that will generate a new column containing string values of X-Y
vector<string> combine(vector& x, vector& y) { string str; vector<string> vs; for(int ii = 0; ii < x.GetSize(); ii++) { str.Format("%d-%d",(int)x[ii],(int)y[ii]); vs.Add(str); } return vs; }
2. Use Set Column Value to make the combined column Assuming that you have X and Y in Col(A) and Col(B), then add Col(C) and in Set Column Value, type
combine(col(a),col(b))
This will generate the X-Y column
3. Plot this new Col(C) as a histogram, Origin will ask and confirm that you want to convert the Col(C) into categorical, and after that, you should get the histogram.
CP
|
|
|