T O P I C R E V I E W |
coll@inia. |
Posted - 04/29/2017 : 08:21:48 AM Origin Ver. Origin pro 2017 windows 8
I operate very often with means and standard deviations of 6 to 10 columns of data. I manually eliminate the outliers by trial and error to minimize the standard deviations. Then I choose the corresponding mean for graphic representation. I do that for each one of the rows of a worksheet with < 10 rows.
I wonder whether there is an script which could calculate all possibilities of columns combinations to calculate the corresponding mean and standard deviation for each row and select the combination with > 2 columns which results in the lowest standard deviation.
Could anyone help?
Thank you for your attention, julio |
14 L A T E S T R E P L I E S (Newest First) |
Hideo Fujii |
Posted - 05/24/2017 : 2:54:04 PM Hi Julio,
If you organize your data column-wise, you can mask the outlier point using colmask x-function (http://www.originlab.com/doc/X-Function/ref/colmask), or "Column: Mask Cells by Condition" menu from user interface. (Thank you, Yuki for this good suggestion.) Then, make the Box chart in Bar-chart style.
--Hideo Fujii OriginLab |
coll@inia. |
Posted - 05/11/2017 : 12:29:08 PM Yes, good point!
I will consider that possibility for future experimental designs....
THANKS AGAIN,
sincerly, Julio |
Hideo Fujii |
Posted - 05/09/2017 : 5:23:59 PM Thank you, Julio.
I have looked at your sample graph, and I have realized that the graph type itself is equivalent to the Bar Chart style "Box chart". ("Plot: Box: Bar Chart" menu) So, if you prepare the data in column-wise rather than row-wise, you can first apply the above remoutliers function to each data column, then simply plot this type of box chart. The advantages of this approach are - no need of transposing:
1) You can select various error bar types like SD, SE, percentile range,confidence interval, etc. 2) You can overlap data points over the bars of the bar chart.
You can consider this approach next time.
--Hideo Fujii OriginLab |
coll@inia. |
Posted - 05/09/2017 : 1:44:33 PM dear Hideo, the type of representation depends on the data . In this case we present the data as bar graphs but it can be anything else....
Just the representation will be like the figure uploaded (I hope).
In the figure legend we state that the means and sd were calculated from by eliminating outliers from n=5, briefly indicating how we did removed the outliers. Up to now we were using as outliers the values > mean+sd. The mean was then recalculated.
In the present case, you develop, we will probably have to state that the outliers were eliminated by minimizing sd using n=5 values. Will be that all right? We may have to mention the threshold and the formula used for a more accurate description in methodology though. In addition, we will experiment more with the formula and the effects of the threshold value.
Hope you can access the data, final script and figure with which I experiment your scripts that I uploaded at:
http://www.originlab.com/ftp/forum_and_kbase/Images/Minimizing.sd.example.opj
Thank you again for your valuable help!!
sincerely julio |
Hideo Fujii |
Posted - 05/09/2017 : 09:34:26 AM Hi Julio,
mucho gusto.
By the way, I am wondering just curiously, how you present the data to be filtered out the outliers in each row. Like a scatter plot? Or something else? If you don't mind, could you please share a sample chart image here?
--Hideo Fujii OriginLab |
coll@inia. |
Posted - 05/08/2017 : 5:54:04 PM Yes it does! The funny thing is that despite the misspelling still worked out!
THANK YOU SO MUCH !
Hasta la vista Hideo, julio |
Hideo Fujii |
Posted - 05/08/2017 : 5:45:56 PM Hi Julio,
1) Sorry for my misspelling of the parameter. I have fixed my post above.
2) As an expression with a "Conditional Operator (? :), the line:abs(ds1[ii]-summean)/sumsd<=zthresh?ds1[ii]:0/0 can be read as a pseudo code:if(((abs(ds1[ii]-summean)/sumsd)<=zthresh)
then return ds1[ii];
else return MISSING; Hope this makes sense.
--Hideo Fujii, OriginLab |
coll@inia. |
Posted - 05/08/2017 : 4:25:20 PM YES!! BEAUTIFUL!!! THANK YOU VERY MUCH!!
Just two more questions to fully understand your script.
first) I do not understand why the remoutliers function 2nd parameter is called zthreash and not threshold? is it a simplication? if it is, why not thresh? what the z is for?
2nd) related to the first What zthreash?ds[ii]:0/0 means? I understand the ds[ii]:0/0 part but the use of zthreast and the ? are meaningless to me!!
Thank you Hideo |
Hideo Fujii |
Posted - 05/08/2017 : 11:10:13 AM Hi Julio,
<Condition>?<Return-value-on-success>:<Return-value-on-failure> is the notation of conditional operator: http://www.originlab.com/doc/LabTalk/guide/Operators#Conditional_Operator_.28.3F:.29
As a missing value(=0/0) is a value, but mask is a state of a cell, you cannot simply replace 0/0 by some value to mask. Instead, you can add a code after setting missings to mask a missing cell's state to masked (as by range<index>=1; ). Here is a modified sample:
////////////////////////////////////////////////////
//Function to Remove Outliers
//////////////////////////////
function dataset remoutliers(dataset ds1, double zthresh) {
dataset ds2;
double ii;
sum(ds1); //collect stats of ds1
double summean=sum.mean; //store mean
double sumsd=sum.SD; //store SD
double nr=ds1.getSize(); //number of rows
for(ii=1; ii<=nr; ii++) {
ds2[ii]=abs(ds1[ii]-summean)/sumsd<=zthresh?ds1[ii]:0/0; //set missing if outlier
}
return ds2;
}
/////////////////// Main //////////////////////////////;
zthreshold=1.0; //Set a proper Z threshold to determine outliers
pa=page.active; //Active sheet#
wtranspose ow:=<new>; //transpose
nc=wks.ncols; //number of columns
for(jj=1; jj<=nc; jj++)
wcol(jj)=remoutliers(wcol(jj),zthreshold); //Replace outliers by NIL
wtranspose ow:=<new>; //retranspose
//////////////////// Mask Missings ///////////////////////
nr=wks.maxrows; //number of rows
nc=wks.ncols; //number of columns
for(ii=1; ii<=nc; ii++) {
range rs=$(pa)!wcol(ii); //column in original sheet
range rc=wcol(ii);
for(jj=1; jj<=nr; jj++) {
if(rc[jj]==0/0) {
rc[jj]=rs[jj]; //recover the data from original
rc<jj>=1; //if cell is missing, mask it
}
}
}
//////////////////////////////////////////////////////////// Could you please to try?
--Hideo Fujii OriginLab |
coll@inia. |
Posted - 05/07/2017 : 12:43:43 PM dear Hideo,
Is it possible to mask the value to be discarted instead of making it NIL 0/0? it would help for a final inspection of which values were removed. perhaps locating the final ds3 on the initial ds1?
I try wcellmask and mark functions but none of them works inside the "<=zthreash?ds1[ii]:0/0;}" final part of the function.
I cannot interpret correctly this final part
Otherwise, the function you kindly sent works perfectly well.
Thanks julio |
coll@inia. |
Posted - 05/07/2017 : 03:22:41 AM dear Hideo, THANK YOU very much for the code!!.
I applied to some data and it is working or at least it eliminated the outliers very well.
However, I cannot follow your code, specially the removeoutliers function. My labtalk is very limited. Could you explain it a little bit adding comentaries? I would like, with my limited knowledge on labtalk, to understand it..............
Thank you again, julio |
Hideo Fujii |
Posted - 05/02/2017 : 1:52:33 PM Hi Julio,
As far as I understood from what you wrote, your procedure doesn't make sense for me. More the number of samples (in a row), larger their standard deviation. So, the procedure would tend to select just two neighbor values which are closest in the row. If the standard error, instead of SD is used, that might be. Or, I may do the following procedure:
1) At each cell in each row, calculate the absolute Z value (=abs((x-mean)/SD)) in the row 2) Mask the value (or set to a missing) when the Z value is larger than a certain preset threshold.
The following code is a sample of the implementation: ////////////////////////////////////////////////////
//Function to Remove Outliers
//////////////////////////////
function dataset remoutliers(dataset ds1, double zthreash) {
dataset ds2;
double ii;
sum(ds1);
double summean=sum.mean;
double sumsd=sum.SD;
double nr=ds1.getSize();
for(ii=1; ii<=nr; ii++) {
ds2[ii]=abs(ds1[ii]-summean)/sumsd<=zthreash?ds1[ii]:0/0; //reset if it's an outlier
}
return ds2;
}
/////////////////// Main //////////////////////////////
zthreshold=1.0; //Set the threshold Z
wtranspose ow:=<new>; //transpose
nc=wks.ncols; //number of columns
for(jj=1; jj<=nc; jj++)
wcol(jj)=remoutliers(wcol(jj),zthreshold); //Replace outliers by NIL
wtranspose ow:=<new>; //retranspose
///////////////////////////////////////////////////////// I hope I didn't misunderstood your idea, and this gives some hint.
--Hideo Fujii OriginLab |
coll@inia. |
Posted - 05/02/2017 : 08:41:21 AM dear Yuki, Thank you for your efforts!
However I do not see how do you chose the mean with the corresponding minimal sd value for each row!
I think I did not explained well enough what I am trying to do, or I do not understand what you sent me!.
I try to explain my objetive better: lets take one single row of 5 columns. If I calculate the mean1+2+3+4+5 & sd1+2+3+4+5 (n=5), I would take account of all the column values. That will probably result in the maximal sd value in most rows.
I will calculate then the mean again but by masking now the value in the first colum to obtain the mean2+3+4+5 & sd2+3+4+5 (n=4).
next I will mask the column value 2 and calculate the mean1+3+4+5 & sd1+3+4+5 (n=4).
and mask all the column combinations masking one at the time for every column (n=4).
Then I will mask 2 columns: for instance mean3+4+5 & sd3+4+5 (n=3), mean 1+4+5 & sd 1+4+5 (n=3), etc......... again all the combinations of columns masking 2 every time and therfore n=3.
Then I will mask 3 columns (.... leaving at least 2 columns to make a mean & sd (n=2)
After having calculated all the mena+sd of all possible combinations of column values except at least two, I will chose the combination with the lower sd and write their mean+sd (n=x) in an empty column.
I need to make those calculations again and again for every single row to make the corresponding graphic with the X axis of rows and the Y axes with the corresponding mean + minimal sd (n=x) values.
In this example, it will be a way to minimize the sd of 5 values for each row masking 1, 2 or 3 values.
of course it gets more complicated when having 10-12 columns!
I am not sure if I explained better but I think I mtself got clearer what I want to do thanks to you !
Please let me know if that is what you did and if so please explain it a little bit more
thanks again, julio |
yuki_wu |
Posted - 05/02/2017 : 02:45:03 AM Hi Julio,
Did you want to calculate the Mean or SD for every row? I wrote a simple example that may help:
//Transpose worksheet,
//and then output to a new worksheet
//so that you can calculate the Mean for every column instead of every row
wtranspose ow:=<new>;
//Get the number of column
//which is the same with the row number of original data
int ncols = wks.ncols;
dataset ds;
loop(ii, 1, ncols)
{
ds[ii] = mean(col($(ii)));
}
wks.addcol();
col($(ncols+1))[L]$ = Mean;
//Output the result
col($(ncols+1))=ds;
Yuki OriginLab
|
|
|