Author |
Topic  |
|
vasiukov
Italy
3 Posts |
Posted - 12/02/2022 : 06:56:51 AM
|
Origin Ver. and Service Release (Select Help-->About Origin): OriginPro 2018 (64-bit) SR1 Operating System: Windows 10 (64-bit)
Dear Colleagues,
I need to understand how the program determines the maximum and minimum value of the x-axis and the bin size when drawing a histogram in automatic binning mode. Who knows the algorithm?
Why does automatic binning give a more extensive range than the max-min values in the dataset? 1st example: there is a dataset [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (all points are unique). The minimum value is -1, the maximum value is 10. However, automatic binning builds a histogram from -3 to 12 with a bin size of 3.
2nd example: there is a dataset [-10, -5, 0, 0, 11, 5, 0, 2, 1, -1, 3, -2, 0, 10]. The minimum value is -10, the maximum value is 11. However, automatic binning builds a histogram from -15 to 15 with a bin size of 5.
Why? What determines the range and bin size?
I really appreciate any help you can provide. |
|
snowli
USA
1426 Posts |
Posted - 12/09/2022 : 10:44:12 AM
|
Hello, See this page of how auto bin size is decided.
https://www.originlab.com/doc/origin-help/create-histogram Programming Notes: Bin size and number are controllable via these system variables: To set bin size, @HBS = value; value = -1 if not set. To set bin number, @HBN = value; rounding is used and value = 0 if not set. To force bin number, @HBM = value; value may be a non-integer (rounding is not used). Priority sequence: @HBS > @HBN > @HBM If neither @HBS and @HBN are specified, @HBF = value will determine bin number automatically as per the following expression: number of bins = 1 + nint(value* log10(npts))
I suppose you didn't set any system variable, so we can run @hbf= in Script window to find out the value is 4. for your dataset, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, the number of points is 11. Run the following in Script window. 1+nint(4*log(11))= //press enter
it gives bin number 5
And usually when plotting graphs in Origin, we leave some margin %8 so data will not show at the axis frame and we try to find a good bin start and end to fit all these requirements so get -3 and 12.
U can uncheck the auto bin to customize your bins.
Or if u have uneven bins etc. u can use Statistics: Descriptive Statistics: Frequency Count and then plot column/bar plot from it.
Thanks, Snow
|
 |
|
vasiukov
Italy
3 Posts |
Posted - 12/16/2022 : 04:48:32 AM
|
Dear Snow,
Thanks for your reply. Indeed, the formula for calculating the bin size clears things up a bit. However, I still do not understand how to determine the minimum and maximum values of the scale. Please clarify this part of your comment: "... we leave some margin %8 so data will not show...". What exactly does "%8" mean? Is it 8 percents of max-min?
Sergii
quote: Originally posted by snowli
Hello, See this page of how auto bin size is decided.
https://www.originlab.com/doc/origin-help/create-histogram Programming Notes: Bin size and number are controllable via these system variables: To set bin size, @HBS = value; value = -1 if not set. To set bin number, @HBN = value; rounding is used and value = 0 if not set. To force bin number, @HBM = value; value may be a non-integer (rounding is not used). Priority sequence: @HBS > @HBN > @HBM If neither @HBS and @HBN are specified, @HBF = value will determine bin number automatically as per the following expression: number of bins = 1 + nint(value* log10(npts))
I suppose you didn't set any system variable, so we can run @hbf= in Script window to find out the value is 4. for your dataset, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, the number of points is 11. Run the following in Script window. 1+nint(4*log(11))= //press enter
it gives bin number 5
And usually when plotting graphs in Origin, we leave some margin %8 so data will not show at the axis frame and we try to find a good bin start and end to fit all these requirements so get -3 and 12.
U can uncheck the auto bin to customize your bins.
Or if u have uneven bins etc. u can use Statistics: Descriptive Statistics: Frequency Count and then plot column/bar plot from it.
Thanks, Snow
|
 |
|
snowli
USA
1426 Posts |
Posted - 12/16/2022 : 09:09:26 AM
|
Hi Sergii,
Here is some doc i found https://www.originlab.com/doc/Origin-Help/AxesRef-Scale?f=dl#Rescale_Margin.28.25.29
It's hard to explain how we calculate it since we also need to consider tick increment, etc. to get a good value to show for ticks.
It looks for line/scatter plots, E.g. default @RRT=4, while our Rescale Margin default is 8% so it auto calculate the axis begin & end. If u set the rescale margin to be e.g. 3% (smaller than @RRT), it will add 3%* (max-min) as padding. If u set margin to be 0%, it should be min & max of data.
For histogram/column bar/box chart, we need to add some padding anyway so the full bar or box can show.
E.g. for your case, do u want bars center at -1, 0, 1, ... or do u want to show bar between -1, 0, 1, ... If u wants bars to center at -1, the axis will need to start at maybe -2 or -1.5.
I will check with developer how we decide bin begin, end, etc. since e.g. for your case, not sure why it pick to start from -3 instead of -2.
I may be wrong for histogram since when i play with the example 1, the axis begins and ends at auto calculated bin begin & end instead of adding margin.
But if i have many data from 0 to 100, if i set bin begin & end to be from 0 to 110, when i rescale axis. it will still use 0 to 120 so again, it was doing some auto margin.
Thanks, Snow |
 |
|
Echo_Chu
China
Posts |
Posted - 12/19/2022 : 06:09:20 AM
|
Hi, Sergii
We first find the min (minimum value), max(maximum value) and count (data size) from dataset. then calculate the min, max, bin and size again from the equations below
bins = int(4*log(count))+1; size = int((max - min)/bins)+1; min1 = size * int(min/size) - size; max1 = size * int(max/size) + size;
Echo OriginLab Technical Service |
 |
|
vasiukov
Italy
3 Posts |
Posted - 12/19/2022 : 7:11:51 PM
|
Dear Snow and Echo,
Thanks a lot! Looks like you gave me a complete explanation. So, I can continue my work now!
Merry Christmas and Happy New Year! Sergii |
 |
|
snowli
USA
1426 Posts |
Posted - 12/22/2022 : 10:12:04 AM
|
You are very welcome Sergii.
I also suggested our developer to improve on this since i can see the confusion.
I guess for user who has a lot of data, it doesn't matter that much where the axis begin, end, bin center is. But for user with less data, it is important.
In 2023, when user click on histogram, there are mini toolbars to increase/decrease number of bins. I am suggesting them to add a button to open a dialog to set bin begin, end, increment, etc. ORG-26197
Hope it will be implemented in coming version.
Thanks, Snow |
 |
|
snowli
USA
1426 Posts |
Posted - 05/09/2023 : 10:08:02 AM
|
Hi Sergii, Just FYI. In Origin2023b we just released, when clicking on histogram, a new button is added on the mini toolbar to set Bin Settings so user doesn't need to go to Plot Details, find the tab and edit.
Thanks, Snow |
 |
|
|
Topic  |
|
|
|