The Origin Forum
File Exchange
Try Origin for Free
The Origin Forum
Home | Profile | Register | Active Topics | Members | Search | FAQ | Send File to Tech support
Username:
Password:
Save Password
Forgot your Password? | Admin Options

 All Forums
 Origin Forum
 Origin Forum
 ks2density on large data vectors
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic Lock Topic Edit Topic Delete Topic New Topic Reply to Topic

mikkomaek

22 Posts

Posted - 09/19/2017 :  05:36:38 AM  Show Profile  Edit Topic  Reply with Quote  View user's IP address  Delete Topic
Origin Ver. and Service Release (Select Help-->About Origin): 2017 SR2
Operating System: Windows 7 on Parallels 12

Hi,

I'm trying to calculate the kernel densities of a dataset that consists of two data vectors (~1,700,000 rows). The calculation has been slow with smaller datasets, but with this one won't go through at all. I've assigned half of my 16 GB Mac RAM to the virtual machine. Any ideas how to make this happen?

The formula:
ks2density(Col(1), Col(2), Col(1), Col(2), wx, wy)

Before script:
double wx, wy;
kernel2width(Col(1), Col(2), wx, wy);

Thanks!
Mikko

arstern

USA
237 Posts

Posted - 09/19/2017 :  09:26:39 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi,

Could you please e-mail your opj file to Tech Support via tech@originlab.com.

Thanks,
Aviel
OriginLab
Go to Top of Page

Hideo Fujii

USA
1582 Posts

Posted - 09/19/2017 :  09:43:52 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Mikko,

> my 16 GB Mac RAM to the virtual machine

Beside the intrinsic issue of your problem, if you are using Parallel, VMWare, etc., you can try "Boot Camp"
dual-boot system of Apple, if it can be an option in your situation, because it should run much faster as the
"native" Windows without performance penalty (though I guess that dealing with your 1,700,000 rows data
may not be helped enough by this way).

http://www.originlab.com/index.aspx?go=Support/DocumentationAndHelpCenter/Installation/RunOriginonaMac

--Hideo Fujii
OriginLab
Go to Top of Page

arstern

USA
237 Posts

Posted - 09/20/2017 :  09:39:33 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Mikko,

We took a look at your project file and it was very slow to use. Unfortunately, It seems that the slow calculation is reasonable. Improvements have already been made on the performance on Kernel Density plotting, therefore it seems that with the combination of using Parallels and plotting with a large dataset you have reached the limitation for plotting a kernel density plot.

Aviel
OriginLab
Go to Top of Page

mikkomaek

22 Posts

Posted - 09/21/2017 :  03:47:45 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Aviel,

Thanks for your reply. You wrote the calculation was slow, but did it actually go through? If it did, was it on VM with OriginPro or a Windows computer?

I am ready to wait if I am able to get the plots in the end.

Thanks!
Mikko
Go to Top of Page

mikkomaek

22 Posts

Posted - 09/28/2017 :  02:21:24 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Origin Support,

Any reply to this question?

BR
Mikko

quote:
Originally posted by mikkomaek

Hi Aviel,

Thanks for your reply. You wrote the calculation was slow, but did it actually go through? If it did, was it on VM with OriginPro or a Windows computer?

I am ready to wait if I am able to get the plots in the end.

Thanks!
Mikko

Go to Top of Page

Hideo Fujii

USA
1582 Posts

Posted - 09/28/2017 :  5:14:33 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Mikko,

I have tried 1,700,000 data points by normal random numbers with various numbers of grids at each direction on my
machine (Intel Core Duo 3.16GHz 4GB memory).
#Grids    Elapsed Time(min)
   25  =>   0.6
   32  =>   1.1
   50  =>   2.6
   75  =>   6.5
  100  ->  30.2
Based on this, it seems time costs of the power of around 5, and rapidly becomes too slow with more than 50x50 grids.
(Then, if #Grids=200, it may take ~1000 min.)
So, at least for now, I suggest you to set the wx and wy to set the the number of grids to 50 or so (as well as to use
the native Windows machine or Boot Camp on Mac).

--Hideo Fujii
OriginLab
Go to Top of Page

mikkomaek

22 Posts

Posted - 10/17/2017 :  02:14:09 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Hideo,

Thanks for your reply! Could you write an example how to adjust the grid size?

BR
Mikko
Go to Top of Page

Hideo Fujii

USA
1582 Posts

Posted - 10/18/2017 :  1:38:53 PM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Mikko,

As described in our document for ks2density(http://www.originlab.com/doc/LabTalk/ref/ks2density-func),
your formula: ks2density(Col(1), Col(2), Col(1), Col(2), wx, wy); calculates the density value at 1,700,000
points. unless you need the density value at every point, it sounds overkilling. Otherwise, you can create
grid XY points in the col(3) and col(4) (such by making a matrix of, say 25x25 with proper XY ranges
having values no matter, convert to XYZ worksheet, then paste XY columns to col(3) and col(4)). In col(5),
you can run: ks2density(col(3), col(4), Col(1), Col(2), wx, wy); In this method, it took around 12 minutes.
(In my previous test, I measured the time of 2D kernel density plotting, and this way performed much faster
to produce the output matrix - very efficient!)
If you need the density values at the input datasets, giving up to get all, but you can sample to reduce the
output data points.

Hope this suggestion helps.

--Hideo Fujii
OriginLab

Edited by - Hideo Fujii on 10/18/2017 1:44:30 PM
Go to Top of Page

mikkomaek

22 Posts

Posted - 11/06/2017 :  02:25:49 AM  Show Profile  Edit Reply  Reply with Quote  View user's IP address  Delete Reply
Hi Hideo,

Sorry but I have to bother you once more on the gridding.

I tried it but didn't really succeed in calculating the densities for a specified grid. If I calculate symmetric grid intervals based on the xy-scale and XYZ grid them, I'll only have a matrix with the Z values on numbered rows and columns. I also tried to calculate the densities based on my symmetric grid, but the plotting doesn't really turn out right.

Your right, calculating the densities at 1,700,000 locations is an overkill. I can plot the densities based on less as long as the plot still includes all the points.

Thanks!
Mikko
Go to Top of Page
  Previous Topic Topic Next Topic Lock Topic Edit Topic Delete Topic New Topic Reply to Topic
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
The Origin Forum © 2020 Originlab Corporation Go To Top Of Page
Snitz Forums 2000