Sunday, April 22, 2012

Jacket Install and Experiments

What is Jacket?
http://www.accelereyes.com/jacket_tour?idx=0

A problem with install:

When installing 64bit Jacket in a 64bit OS, with a 32bit matlab student version (Matlab student version doesn't provide 64bit version for windows.) The Jacket will not run correct. Matlab will call Jacket's mexw32 files which are actually for 32bit OS's NVidia drivers. An error will occur and the program will fail.

How to solve that?

According to the Jacket's widi, we can use something in "<Jacket_Root>/engine/bin" to overwrite those default dll/mexw32 so that they can work correctly in a 32-bit MATLAB and a 64-bit OS.
However, in current version of Jacket, the "bin" folder is actually missing after install.

Where is it?

I traced the Jacket's install wizzard step by step and found something interesting.
When installing, the wizzard actually put something called "bin" and "bin64" under "engine" folder.
But, they will be deleted in the end of install for unknown reason.
So, I stopped the install wizzard and rescued the "bin" folder from its destine of doom. Then, I used everything there to overwrite everything under "engine" folder.
Finally, it seems work.

Experiments

Jacket provides a powerful "gfor". We can use it similar with matlab's "parfor".
However, not all codes are happy with gfor.
We still need to do memory copies between GPU and CPU.

There are 2 ways to handle those memory copies:
1, Initialize data directly on GPU by using "gzeros","gdouble","gint32".... instead of matlab's default data types.
2, Use "LOCAL" in the "gfor" to give each keneral a copy of some data from CPU. However, it seems the performance will be terrible if we send some large array to each keneral as "local" data.

Some other problems:
Logical indexing cannot be used, using mask to modify the code.
Subscripting into CPU variables will fail the program for no directly covertion from gfor's GPU data to CPU data, using "LOCAL" for now.

After handled all those, we made the first loop successfully modified to a "gfor" loop.

Result
However, the timing is slower than "parfor".
I guess it is because the "LOCAL" data sends to each keneral is too large.

No comments:

Post a Comment