Author |
Topic |
|
dpgg94
Singapore
5 Posts |
Posted - 06/14/2018 : 12:39:55 PM
|
Hello all, I would like to find out if there's any way to remove duplicated values from particular columns in a sheet. For example, i have 10 columns of data, but some columns may have repeated values due to multiple scans by an instrument. I would like to remove these data because they would affect the average and standard deviation if I were to include them while carrying out statistics.
Thanks for any help! Have tried googling for this but did not manage to find any info. |
|
YimingChen
1609 Posts |
Posted - 06/14/2018 : 3:11:46 PM
|
Hi,
Can you try Worksheet:Remove/Combine Duplicated Rows. This tool only scan one single column though. You may consider stacking your 10 columns before running the this tool.
James |
|
|
Hideo Fujii
USA
1582 Posts |
Posted - 06/14/2018 : 3:37:42 PM
|
Hi dpgg94,
Origin provides three methods to remove the duplicated data, which are: 1) "Worksheet> Remove/Combine Duplicated Rows" menu (wdeldup), and 2) "Analysis> Data Manipulation> Reduce Duplicated X Data" menu (reducedup) 3) "Unique(<column>)" command
The differences between first two are mainly as follows:
- Wdeldup works for a whole worksheet, but not for a plot. Reducedup works on two datasets of XY. It works for two columns of XY in a worksheet as well as for a plot in a graph.
- Wdeldup reduces the rows in the rest of columns on a reference column by taking options: Keep 1st/Last, Average, Min/Max, etc. Reducedup does similar, but no Keep 1st/Last option.
- Wdeldup works for duplication by exact numbers, or by the text strings. Reducedup works on numbers possibly with some tolerance.
The Unique command can be applied to only one dataset, i.e., a column in a worksheet. There is Keep 1st/Last option, but no Tolerance.
You can choose either method of above which is suitable for your case. If more than one columns in your data may possibly contain duplication, you may have to apply above method multiple times.
References: [1] Blog: http://blog.originlab.com/data-handling/delete-every-nth-row-or-column-from-worksheet-2 [2] Origin Help: https://www.originlab.com/doc/Origin-Help/Wks-DataReduction https://www.originlab.com/doc/Origin-Help/reduce-duplicate-rows https://www.originlab.com/doc/LabTalk/ref/Unique-func [3] Video Tutorial: https://www.originlab.com/index.aspx?go=Products/Origin/DataAnalysis&ss=chm&pid=1575
Hope this helps.
--Hideo Fujii OriginLab |
Edited by - Hideo Fujii on 06/14/2018 3:41:29 PM |
|
|
dpgg94
Singapore
5 Posts |
Posted - 06/14/2018 : 10:44:11 PM
|
Here's the kind of data I have, with the original form of values and manually edited to remove duplicated values within each column. https://imgur.com/a/rCZlUu1
The other methods suggested seem to remove all the same rows for the remaining columns even if i select a specific column to work on. As for my data, there would be random numbers of duplicated values within each row, and at varying positions.
I have no clue about scripting and writing the command line, but will be trying to learn how to use it now. If possible, could anyone help me to write a script for this purpose?
Thanks a bunch! |
|
|
Hideo Fujii
USA
1582 Posts |
Posted - 06/15/2018 : 5:09:58 PM
|
Hi dpgg94,
Okay, so you want to remove redundant rows at each column in the worksheet, right? Then you can use the 3rd option, "Unique(<column>)" command. Just applying this command every column in your worksheet doesn't need to make a script; rather you can do the following:
1) Enter unique(this) in the F(X)= header row in the 1st column, followed by pressing ENTER key. ==> You get the 1st column non-redundant. 2) Select the F(X)= header row in the 1st column, and drag the black dot at the bottom corner of the cell (cursor shape changes to the "+" sign) rightward until the last column. ==> All columns get the same operation.
Hope this helps.
--Hideo Fujii OriginLab |
Edited by - Hideo Fujii on 06/15/2018 5:11:32 PM |
|
|
dpgg94
Singapore
5 Posts |
Posted - 06/15/2018 : 9:00:09 PM
|
Hello, thank you for the reply.
What does "this" refer to when you mention unique(this)? Should i be putting the column label in it?
I've tried it with column label and it doesn't seem to do anything after pressing enter.
https://imgur.com/a/6tpfbza
Apologies if I'm kind of slow on this, had started using Origin few weeks back and am trying to learn the tricks in it. |
Edited by - dpgg94 on 06/15/2018 10:22:33 PM |
|
|
Hideo Fujii
USA
1582 Posts |
|
dpgg94
Singapore
5 Posts |
Posted - 06/19/2018 : 08:16:50 AM
|
I'm on OriginPro2017 version |
|
|
Hideo Fujii
USA
1582 Posts |
Posted - 06/19/2018 : 09:38:31 AM
|
Hi dpgg94,
> I'm on OriginPro2017 version
I have just confirmed that Unique() function became available in Origin 2018b unfortunately. Please consider either: - download and try 2018b Evaluation version, or - upgrade to 2018b (if you have maintenance, it is free), or - make a script to substitute.
Regards,
--Hideo Fujii OriginLab |
|
|
dpgg94
Singapore
5 Posts |
Posted - 06/22/2018 : 11:54:43 PM
|
Would it be possible if you could assist me with creating a script for that function? I'm clueless about how to make a script even after going through the online tutorials. |
|
|
Hideo Fujii
USA
1582 Posts |
Posted - 06/25/2018 : 12:15:38 PM
|
Hi dpgg94,
Learning any programming language including LabTalk, is an incremental process. So, just keep learning patiently from the basics.
First, try the following "Hello World" example: https://www.originlab.com/doc/LabTalk/guide/Getting-Started-with-LT
Try the following Getting Started tutorial for how to run: https://www.originlab.com/doc/LabTalk/Tutorials/Tutorial-Get-Started
Try the following tutorial for the basic elements in LabTalk: https://www.originlab.com/doc/LabTalk/Tutorials/Tutorial-Language-Fundamental
To handle the data in a worksheet, Range notation is essential. Try the following tutorial about the range: https://www.originlab.com/doc/LabTalk/Tutorials/Tutorial-Range-Notation
For the rest, explore Origin documents about LabTalk when you need something to do, or to understand what the given sample is doing: https://www.originlab.com/doc/LabTalk and about X-functions: https://www.originlab.com/doc/X-Function/ref
Finally, you can try to copy the following sample script, paste into the Script window, highlight all pasted lines, and press ENTER key when Sheet1 in Book1 is active to see what happened. Read the comments (after //) to grasp what the script roughly does, but don't be worried about exact meaning. Don't be deterred by the details. Learning by examples by trying by yourself is a way to go, I can say.
//////////////////////////////////////////////
double NA=0/0; //Missing value constant
range rA=[Book1]Sheet1!col(A); //Data range for col(A) in Sheet1 in Book1
rA=sort(rA); //Sort the range of column A
range rT=[??]!_tmpdata; //Range for temporary dataset
int nirows=rA.getSize(); //Number of input rows
int norows=0; //Number of output rows
for(ii=1; ii<=nirows; ii=ii+1) { //Loop over with ii
if(ii==1 || rA[ii]!=rA[ii-1]) {
rT[ii]=rA[ii]; //If 1st row, or unequal to previous row, set the same
norows=norows+1; //Increment number of output rows
}
else rT[ii]=NA; //Otherwise set a missing value
}
rA=sort(rT); //Put the sorted temp dataset back
rA.setSize(norows); //Reset the size of column A
///////////////////////////////////////////// Good luck, and enjoy!
--Hideo Fujii OriginLab |
Edited by - Hideo Fujii on 06/25/2018 1:46:20 PM |
|
|
|
Topic |
|
|
|