Lately I have been experimenting with Microsoft’s Kinect and accessing it from a PC. The Kinect sensor provides a 640×480 color image and a 320×240 depth image, each at 30 frames per second(*). With all of that data and with all of the potential applications– even above and beyond the obvious gaming on the Xbox 360– the prospect of working with the Kinect certainly generates a large “wow factor” for programming students.
However, it is not exactly easy to get started. One of the most challenging (and most interesting) problems is “skeleton tracking”: given a human body in the field of view of the sensor, how to turn all of that depth and color information into a representation of the body?
Fortunately, that hard problem is already solved for us onboard the sensor. In addition to the color and depth images, the Kinect API also provides tracking information for “skeletons” in the field of view, consisting of 3D positions of 20 “joints” throughout the body, as shown in the following figure:

The 20 joints that make up a Kinect skeleton.
The next problem is how to take this skeleton tracking information and turn it into a useful visualization. This is where Visual Python (or VPython) comes in. As I have commented before, I think one of the greatest strengths of VPython is the ease with which students can create reasonably polished 3D scenes, without the complexity of window event loops or the need to separate rendering from business logic. There is only business logic; rendering is done under the hood. You simply create/manipulate the objects in the scene, and they appear/move automatically.
My goal was to take a Python wrapper for the Kinect API, and Visual Python, and combine them into a simple Python module that allows students to (1) quickly get up and running with a visualization of skeleton tracking, and (2) make it easier to write their own code to experiment with gesture recognition, interact with other 3D objects in a “virtual world,” etc.
The result is a module called vpykinect; following is example code and some screenshots of using vpykinect to display a tracked skeleton as it stands (or moves) in front of the Kinect sensor:
from visual import *
import vpykinect
vpykinect.draw_sensor(frame())
skeleton = vpykinect.Skeleton(frame(visible=False))
while True:
rate(30)
skeleton.frame.visible = skeleton.update()

Screenshots of a Kinect skeleton (that’s me!) facing the sensor. All images are rotated and/or zoomed views of the same still scene.
Finally, following are instructions for installing all of the necessary software:
- Install the Kinect SDK 1.0 here
- Install Visual Python here: http://www.vpython.org/
- Install PyKinect 1.0 here: http://pypi.python.org/pypi/pykinect/1.0 (I chose PyKinect simply because it required zero setup. Either libfreenect or OpenNI would work equally well, at least for the Xbox version of the Kinect that they support.)
- Run or import my vpykinect module here: https://sites.google.com/site/erfarmer/downloads
The vpykinect module is currently of the “dumbest thing that could possibly work” variety. For me, it was an experiment in just how little code I had to write to get something useful working. Because of this, it has at least a couple of obvious limitations:
- It does not directly include the constants for indexing into the list of individual joints. These are available in the PyKinect module as JointIds (e.g., the position of the right hand is skeleton.joints[vpykinect.JointId.HandRight.value].pos, which is more verbose than it needs to be)… but this could also be an interesting cooperative exercise: for example, students could systematically change the color of individual joints (i.e., spheres) to work out the mapping themselves.
- It does not support tracking multiple skeletons simultaneously. Again, the PyKinect module supports this, but I did not include it here for simplicity.
(*) Higher resolutions and more capabilities are available, especially with the newer Kinect for Windows sensor, which differs in some respects from the original Kinect for Xbox 360.
Edit: In response to questions from Kristians in the comments, following is another example program demonstrating simple gesture recognition. This program recognizes “waves” of your right hand, where a wave consists of raising your right hand above your shoulder.
from visual import *
import vpykinect
skeleton = vpykinect.Skeleton(frame(visible=False))
skeleton.frame.visible = False
raised = False
while True:
rate(30)
skeleton.frame.visible = skeleton.update()
if skeleton.frame.visible:
right_hand = skeleton.joints[11]
right_shoulder = skeleton.joints[8]
spine = skeleton.joints[1]
if right_hand.y > right_shoulder.y and not raised:
raised = True
print('Recognized right hand wave.')
elif right_hand.y < spine.y and raised:
raised = False
Note the hard-coded indexing into the list of joints mentioned in the limitations above. Also, we must “debounce” the recognition of each wave, by requiring the hand to be lowered farther than the threshold for raising it. I chose the “spine” joint as a convenient threshold.