Here's the final results. Wasn't what we expected, but we achieved a 10x speedup through non-GPU means anyway.
Paper: https://github.com/Keripo/DancingMonkeysAccelerated/raw/master/final/paper.docx
Slides: https://github.com/Keripo/DancingMonkeysAccelerated/raw/master/final/presentation.pptx
Video: http://beatsportable.com/static/misc/cis565/DancingMonkeysAccelerated/final/presentation_demo-ver.wmv
Source: https://github.com/Keripo/DancingMonkeysAccelerated
GPU-Accelerated Beat Detection for Dancing Monkeys - UPenn CIS565 Spring 2012 Final Project
Tuesday, April 24, 2012
Sunday, April 22, 2012
Jacket Install and Experiments
What is Jacket?
http://www.accelereyes.com/jacket_tour?idx=0
A problem with install:
When installing 64bit Jacket in a 64bit OS, with a 32bit matlab student version (Matlab student version doesn't provide 64bit version for windows.) The Jacket will not run correct. Matlab will call Jacket's mexw32 files which are actually for 32bit OS's NVidia drivers. An error will occur and the program will fail.
How to solve that?
According to the Jacket's widi, we can use something in "<Jacket_Root>/engine/bin" to overwrite those default dll/mexw32 so that they can work correctly in a 32-bit MATLAB and a 64-bit OS.
However, in current version of Jacket, the "bin" folder is actually missing after install.
Where is it?
I traced the Jacket's install wizzard step by step and found something interesting.
When installing, the wizzard actually put something called "bin" and "bin64" under "engine" folder.
But, they will be deleted in the end of install for unknown reason.
So, I stopped the install wizzard and rescued the "bin" folder from its destine of doom. Then, I used everything there to overwrite everything under "engine" folder.
Finally, it seems work.
Experiments
Jacket provides a powerful "gfor". We can use it similar with matlab's "parfor".
However, not all codes are happy with gfor.
We still need to do memory copies between GPU and CPU.
There are 2 ways to handle those memory copies:
1, Initialize data directly on GPU by using "gzeros","gdouble","gint32".... instead of matlab's default data types.
2, Use "LOCAL" in the "gfor" to give each keneral a copy of some data from CPU. However, it seems the performance will be terrible if we send some large array to each keneral as "local" data.
Some other problems:
Logical indexing cannot be used, using mask to modify the code.
Subscripting into CPU variables will fail the program for no directly covertion from gfor's GPU data to CPU data, using "LOCAL" for now.
After handled all those, we made the first loop successfully modified to a "gfor" loop.
Result
However, the timing is slower than "parfor".
I guess it is because the "LOCAL" data sends to each keneral is too large.
http://www.accelereyes.com/jacket_tour?idx=0
A problem with install:
When installing 64bit Jacket in a 64bit OS, with a 32bit matlab student version (Matlab student version doesn't provide 64bit version for windows.) The Jacket will not run correct. Matlab will call Jacket's mexw32 files which are actually for 32bit OS's NVidia drivers. An error will occur and the program will fail.
How to solve that?
According to the Jacket's widi, we can use something in "<Jacket_Root>/engine/bin" to overwrite those default dll/mexw32 so that they can work correctly in a 32-bit MATLAB and a 64-bit OS.
However, in current version of Jacket, the "bin" folder is actually missing after install.
Where is it?
I traced the Jacket's install wizzard step by step and found something interesting.
When installing, the wizzard actually put something called "bin" and "bin64" under "engine" folder.
But, they will be deleted in the end of install for unknown reason.
So, I stopped the install wizzard and rescued the "bin" folder from its destine of doom. Then, I used everything there to overwrite everything under "engine" folder.
Finally, it seems work.
Experiments
Jacket provides a powerful "gfor". We can use it similar with matlab's "parfor".
However, not all codes are happy with gfor.
We still need to do memory copies between GPU and CPU.
There are 2 ways to handle those memory copies:
1, Initialize data directly on GPU by using "gzeros","gdouble","gint32".... instead of matlab's default data types.
2, Use "LOCAL" in the "gfor" to give each keneral a copy of some data from CPU. However, it seems the performance will be terrible if we send some large array to each keneral as "local" data.
Some other problems:
Logical indexing cannot be used, using mask to modify the code.
Subscripting into CPU variables will fail the program for no directly covertion from gfor's GPU data to CPU data, using "LOCAL" for now.
After handled all those, we made the first loop successfully modified to a "gfor" loop.
Result
However, the timing is slower than "parfor".
I guess it is because the "LOCAL" data sends to each keneral is too large.
Sunday, April 1, 2012
CPU Parallel Computing via parfor
To have a second baseline to compare to, we modified the code to execute the BPM tests and fit checks in parallel via MATLAB's parfor. This will effectively divide the loop among a pool of workers defined by matlabpool (8 for me since my Dell XPS 17 has 8 hyperthreaded CPU cores). The result was roughly a 2.4x speedup (77s vs 186s). Comparison charts and output below. Next up, experiments with GPU acceleration!
Update: Re-ran tests with different matlabpool Worker counts
(graph below). Turns out that (for my computer at least), using a Worker
count of 6 yielded the shortest total time (1 Worker = 1 CPU core = 1 dedicated MATLAB thread). For timeTest, 4 threads gave
the best result. This is probably due to the multi-threading overhead
catching up after 4 threads; dividing the task from 1 to 2 threads and 2 to 4
threadshalved the run time, but after 4, the times increased linearly.
For timeFit, however, 8 threads gave the best result. The pattern for timeFit, however, isn't always a perfect fit but does decreases linearly. The deviating behaviour can be explained by the fact that not every iteration in the fit checking loop has the same runtime - some iterations end prematurely while those that don't undergo various size-dependent operations including sorting. Without more than 8 cores, however, it is hard to tell the exact timeFits trend and at what point the returns are balanced by the threading overhead.
Complete output of running DancingMonkeys_parfor.m with "smooooch・∀・" |
Timing comparisons between the original base program and the modified program using parfor |
Timing comparisons with different CPU core counts |
Monday, March 19, 2012
It works!
Current working setup now on git; code analysis time!
https://github.com/Keripo/DancingMonkeysAccelerated
https://github.com/Keripo/DancingMonkeysAccelerated
Wednesday, March 14, 2012
Git for Windows with TortoiseGit and GitHub
For this project, we are required to use Git for version control and host code on GitHub. As a person who has always preferred SVN (for its linear/incremental nature), and have only had experience with Google Code and SourceForge, setting up a new GitHub repo was a new experience. And so, for future reference, I document the process here. In this tutorial, we install msysgit and TortoiseGit and host code on GitHub.
Installing TortoiseGit:
http://code.google.com/p/tortoisegit/ (yes, its hosted on Google Code)
TortoiseGit is a shell extension for Windows Explorer that allows for easy management of Git projects. Its a port of the popular TortoiseSVN project, but for Git. First step would be installing Git for Windows (msysgit), which will also give you a nice, simplistic bash shell that you can work with (and execute the above commands just like on Linux if you'd prefer that over TortoiseGit). I personally just downloaded and installed Git 1.7.9 with all default settings. Next was installing TortoiseGit itself, which should autodetect the installed Git setup (though you can configure it later if you haven't) and add itself to the Windows Explorer right-click context menu. If it all worked nicely, you should see something like this:
Windows Explorer context menu with TortoiseGit (and TortoiseSVN) installed |
Generating your public key:
To associate your computer with GitHub, you will need a "public key/private key" pair. You could do this entire process manually with the bash shell, but TortoiseGit gives you a handy GUI tool for it: Puttygen (PuTTY Key Generator). Run Puttygen via your Start menu: "Start -> Programs -> TortoiseGit -> Puttygen". Click "Generate", move your mouse around the blank box for a bit, and wait for the box to fill with your newly generated key. Copy the entire text in the box (right-click -> Select All, right-click -> Copy). This is your public key. Then click "Save private key" and choose a safe place to store the .pkk file (e.g. "TortoiseGit.ppk"), passphrase optional. This file stores your private key.
Run PuTTY Key Generator, copy the public key to your clipboard, and save your private key |
Adding your private key to TortoiseGit:
Next you want make your private key accessible to TortoiseGit for authenticating with GitHub. For this, we'll manage our key list with Pageant: "Start -> TortoiseGit -> Pageant". This will add an icon to your taskbar (see picture below). Right click the icon and select "View Keys". Click "Add Keys" and navigate to your previously saved PuTTY key that is storing your private key file (e.g. "D:\Development\Sandbox\TortoiseGit.ppk"). If added properly, you should see the new key added to the list.
Run Pageant and add your private key file. |
Registering for GitHub:
https://github.com/signup/free
GitHub is a project hosting website that focuses mainly on Git. As of the moment, they offer free public repositories and unlimited public collaborators for open source projects. Registration is pretty straightforward - just sign up for a free personal account.
Make sure to choose "Create a free account" |
Adding your public key to GitHub:
https://github.com/settings/ssh
Go to your "Account Settings" on GitHub and go to the "SSH Keys" section. Click "Add SSH key" and paste your public key into the box. Give the key a descriptive title (it doesn't really matter) and then click "Add key". After confirming your GitHub password, you should see the new key added to your "SSH Keys" list.
Add your generated public key to GitHub |
Creating a GitHub repository:
https://github.com/repositories/new
Go to your GitHub profile page (click your username on the top right) and click "New repository". Proceed to fill out the project information. Note that the "Project Name" becomes the repos name, so you may want to use a clean, space-less name (e.g. "DancingMonkeysAccelerated" for this project). After clicking "Create repository", you'll be presented with a list of commands to execute to create your first commit. Since we'll be using TortoiseGit instead of a command-line to manage files, you can ignore
Go to your profile page and click "New repository" |
Copy the URL for your Git repository (you can ignore the rest of the page) |
Creating your local repo:
Now that the project has been created on GitHub, its time to create it on your computer! The GitHub page will give you bash commands for creating the new repos, but we'll use TortoiseGit instead. Navigate to the folder/drive where you want your code to be stored (e.g. "D:\Development\Sandbox"). In that folder, right click to open up the Windows Explorer context menu and select "Git clone...". In the popup box, copy-paste the Git repo URL from GitHub (see above picture) and click "OK". Since you've already set up the authentication earlier and its an empty repository, it should be quick (the log will warn you about cloning an empty repo though). If a new folder has been created with the name of your project (e.g. "D:\Development\Sandbox\DancingMonkeysAccelerated"), the clone was successful!
Clone your Git project to your local computer |
What you have now is a clone of a branch of your repository. Your current branch is probably the master branch assuming you just created a clean project. Go ahead and add files and your code to the folder. Once you're happy with your code/changes, it's time to commit these changes. You can do so by right clicking to open up the context menu in the clone's folder and selecting "Git Commit -> "master"...". This will open up a window that will allow you to add the change message and select what changes you want to commit. If its your first commit, you will want to check the "Select/Deselect All" checkbox to add all your new files. Click "OK" when you're happy to commit the changes to your branch. Note that unlike SVN, this commit is LOCAL ONLY!
Committing your changes to your local branch |
Synchronizing changes with GitHub:
After a few commits, you might want to push your changes to GitHub. Open up the context menu in the clone's folder and selecting "TortoiseGit -> Push". In the new window, if you also added tags, don't forget to check the "Include Tags" checkbox before clicking "OK". If other project collaborators pushed out changes, you can pull their changes from GitHub with "TortoiseGit -> Pull".
Pushing your local commits to GitHub |
Associating your private key with the git repos:
To avoid having to manually start up the Pageant daemon every time you want to sync your git repos with GitHub, you can add your PuTTY key to TortoiseGit's Remote settings. Open up the context menu in the clone's folder and selecting "TortoiseGit -> Settings". In the new window, select "Git -> Remote" and select your Remote item (in my case, "origin"). Locate and select the .ppk file from earlier for the "Putty Key" field and click "OK". Now you don't have to manually start up Pageant every time you reboot your computer!
Associating your PuTTY key with your git repos |
You're done!
And that's it! If you check your GitHub project page, you should see your latest commits and navigate your code's different branches and tags. For this project ("Dancing Monkeys: Accelerated"), the project page can be found at https://github.com/Keripo/DancingMonkeysAccelerated
It works! |
Monday, March 12, 2012
Project Plan (WIP)
Stage 1: Comprehension (1-2 weeks?)
Stage 2: Modification (until all rough implementations complete)
Stage 3: Optimization (until project end date)
- Read through Dancing Monkeys project report, particularly Chapter 6
- Read through MATLAB code and understand what code correspond to the different parts of the project
- Locate the specific section(s) of code related to beat detection
- Compile and test MATLAB code on SIGLAB computers with time testing
- Set up Github repository and experiment with code synchronization/branching
- Brainstorm GPU acceleration techniques
- Learn MATLAB!
Stage 2: Modification (until all rough implementations complete)
- Implement the various brainstormed GPU techniques
- Document all approaches taken and all benchmark results
- Compare results, select most efficient to work off of
Stage 3: Optimization (until project end date)
- Analyze technique in more detail (why it works, why better than others, etc.)
- Optimize, optimize, optimize!
- Target minimum 5x speedup, ideally 10x speedup
- Clean up as a patch to the original Dancing Monkeys project source code
- Write report with results
- Prepare demo ; )
Sunday, March 11, 2012
Project Proposal
PDF version: http://beatsportable.com/static/monkeys/2012-03-12%20CIS%20565%20Project%20Proposal.pdf
Copy-pasta text:
GPU-Accelerated Beat Detection
for Dancing Monkeys
Philip Peng - CIS 565 Project Proposal
for Dancing Monkeys
Philip Peng - CIS 565 Project Proposal
In music-based rhythm games such as Dance Dance Revolution (DDR), a player must accurately perform actions based on visual patterns that match the beat of the background song. These visual patterns are often created by manually by the game developers. This manual process can be eliminated through the usage of software that use beat detection algorithms to aid in the automated generation of such visual patterns. The open source program Dancing Monkeys, written by Karl O’Keefe of Imperial College London, employs beat detection to generate DDR-style stepfiles for arbitrary songs [1]. In this project, I propose the modification of the beat detection algorithm in Dancing Monkeys to take advantage of the parallel processing capabilities of the GPU..
Figure 1. Dancing Monkeys system architecture |
The details of O’Keefe’s Dancing Monkeys program are outlined in his project report [2]. For the beat detection portion of his project, O’Keefe implements a brute-force algorithm similar to the one described by Will Archer Arentz’s paper, “Beat Extraction from Digital Music” [3]. First, a waveform is generated consisting of the highest peaks from the song’s original waveform. This waveform is then smoothed via a low pass filter for easier analysis. The BPM (beats per minute) of the waveform is then determined through brute force comparison tests across the range of 90-200 BPM and checked for best fit. These comparison and fit test are then repeated with narrower ranges. The closest fit is determined to be the BPM if it falls within the acceptable error margin.
Figure 2. Arentz’s BPM determination algorithm; Dancing Monkey uses something similar |
The beat detection part of Dancing Monkeys is written in MATLAB, compiled to run on the CPU, and linear in execution. Due to the brute-force nature of the algorithm and its multiple passes, the process can be very lengthy, often taking twice the duration of the song itself or more (depending on your CPU). Because the comparison and fit tests across the BPM range being checked are independent events, however, this process can be parallelized. In addition, the test process itself is pure arithmetic calculations, which GPU units are well suited for. MATLAB also supports some CUDA kernel integration [4] and previous works have shown over 10x speedups in GPU-accelerated music beat analysis [5]. By modifying the beat detection algorithm to run tests in parallel and on the GPU, it is possible to significantly reduce the time needed to accurate detect the BPM of the loaded song files. This would allow for significantly faster pattern generation times (compounded if the user wants to batch analyze an entire song library).
Subscribe to:
Posts (Atom)