Figure 2.2: Paidia (free play) and ludus (game) as two extremes in the spectrum of Play. Figure 2.3: Fun, pleasure and fun as overlapping components of enjoyment. In her 2007 article, Costello describes the development of the framework, consisting of thirteen pleasure categories of play, based on the work of six theorists [26]. The categories of the pleasure framework and the concepts they build upon can be seen in figure 2.4, and a thorough examination of the work of the theorists and its relation to the framework is the subject of Costello and Edmonds’ article from 2009 [24]. The theorists in question were philosophers Karl Groos and Roger Callois, the psychologists Mihaly Csikszentmihalyi and Michael Apter and the game designers Pierre Garneau and Marc LeBlanc. Before describing our design process it is necessary to clarify the intentions of our design. These intentions were developed over time and involved a string of discussions and brainstorming sessions ((1) and (2) Figure 4.1), resulting in the goals stated in the introduction (Section 1.1.1). Figure 4.2: Output from the two visual sensors on the Kinect. (1) Depth camera. (2) RGB camera. Figure 4.3: A top down view of the approximate area a Kinect sensor can cover along the X and Z-axes Figure 4.4: A simple Processing program that draws a red circle (1) The source code in the Processing development environment. (2) The application window of the running program. Processing is a programming language, development environment, and online community that since 2001 has promoted software literacy within the visual arts. Initially created to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing quickly developed into a tool for creating finished professional work as well. (http://processing.org /about/) Although we were able to communicate between the two programs, it quickly dawned on us that this would severely complicate the design process, and introduce a range of potential communication errors between the systems. It would also be much harder to divide the workload between us and we would have to share much of the same codebase. This would demand a much tighter cooperation on a micro level and we would end up spending a lot of time on integration of audio and visual parts of the code, instead of focusing on concept work and exciting functionality. Furthermore, the tools and programming languages best suited for audio differs fundamentally from the ones suited for complex graphics. Last, but not least, running both audio and visuals on the same computer would demand considerably more powerful hardware, with plenty of processing power. Two computers, two Kinects On the other hand, by running audio and visuals from separate computers, we would in effect be building two separate installations. Each installation would use a dedicated computer connected to a Kinect, and would output its feedback (audio or visuals) independent of the other. This option left us both free to explore more openly, and to build our separate code according to our specific needs. The challenge for us would be to ensure that the two installations would play ‘in harmony’ and be perceived by the user as one single system. Interference But a more serious issue was the trouble of interference. Using two Kinects in the same room introduces interference between the IR signals of the two devices. This leads to a degradation of both signals, thus making the tracking-data less reliable and makes the tracked joint positions ‘jump’ at irregular intervals. As this was still quite early in the exploration process it was difficult for us to determine how serious this problem would become, but after doing some research we found a couple of articles [45, 56] describing a low-tech remedy to the problem: By introducing a small amount of movement, or vibration, to one of the Kinects, the interference can be significantly reduced. This vibration can be achieved by mounting a small electrical motor with variable speed on one of the Kinects. With a small weight mounted slightly off-center on the propeller shaft of the motor, engaging the motor will introduce small vibrations to the Kinect. The speed of the motor is used to tune the vibrations (approximately) to the framerate of the Kinect, ensuring maximum effect of the vibrations. We used a processor fan from a PC as electrical motor with a piece of tape as an offset weight on one of the rotor blades, and had some help assembling an ad-hoc voltage-regulator acting as a speed control (Figure 4.7). By strapping the fan to one Kinect, and then running both Kinects at the same time we could switch the motor on and off, and watch the degradation come and go in the depth-images of both Kinects (Figure 4.6). RGB and depth image finger drawing This prototype consists of displaying an image from each of the Kinect’s two sensors (Figure 4.9), displaying the depth sensor image (1) and the RGB camera (2). The functionality of this prototype enables a user to draw a line on top of the depth camera image with the point closest in the depth data returned by the sensor. The line drawn varies in colour according to the distance from the sensor to the closest point. Point cloud and button in space The aim of this prototype was to try combining data from the depth sensor and the RGB camera to create a visualisation of what the Kinect actually sees in 3D space, while also testing out very simplistic interaction methods. Stickman skeleton A potential problem we experienced with virtual buttons as a three dimensional interface elements, is that they can be quite hard to hit. They require you to look at the screen instead of the world around you, even though its position is actually in real world space. This experience can be compared to the process of picking up an object only aided visually by looking in a mirror, in which you have no eye-hand coordination to help you. A partial solution to this problem could be to aid the users with on-screen representation of themselves, or at least their hands, in the way a mouse-pointer does. This way, the problem would be solved for 2D interfaces on the X and Y axes, but the issue of depth would still be present for 3D interfaces. This is definitely a challenge when mirroring or mimicking the real, three-dimensional world on a two-dimensional screen. The skeleton prototype was a quick implementation of an example in Borenstein’s book [48]. In all its simplicity it enables full skeleton tracking of a human being. It finds all the joints trackable by the OpenNI driver and draws the limbs joining them on top of a depth image. When the driver is set to use skeleton tracking, it returns a list of users with points corresponding to the users’ joint-positions for every update cycle. These positions are a lot more convenient to work with than the raw depth data mentioned earlier, as it is no longer necessary to parse and interpret the complete data set, since OpenNI does this job for you. This is the main way of tracking users in the final prototvve. Figure 4.12: Four screenshots of the red line drawing prototype with an on screen user representation Figure 4.13: Four screenshots of the red line drawing prototype with an on screen user representation. Figure 4.15: The screen of the installation when in idle mode Figure 4.17: Installation in active mode with one user (A), both hands active (B) When there were no users currently interacting with the installation and a user walked into the installation’s field of view, music faded in, the eye stared straight ahead and it and the triangle turned bright red while greeting the user with a text at the top of the screen, saying “HELLO HUMAN, MOVE ABOUT” (Figure 4.16). This meant a user had been calibrated and the installation went into active mode. Figure 4.18: A top down view the mode sections in the Kinects field of view. Figure 4.21: A user interacts with the “Trilling spiderweb’ mode "Twittering bubbles’ mode When active, the middle mode consisted of an abstract sustained twittering sound accompanied by small particles or bubbles of variable size and colour, which were generated as a user’s hands crossed the previously mentioned threshold. The sounds lasted for 8 seconds each and were layered on top of each other as the user retriggered new notes by moving their hands. The on-screen visuals were a part of a particle system that was controlled by a physics engine and the position of the user’s hand functioned both as an emitter and a point of attraction for all particles currently visible on screen. This was the case regardless of which user had created them, so all active users could interact with and affect all visible particles on screen. "Trilling spiderweb’ mode The mode furthest away from the sensors functioned much in the same way as the middle mode, in the way that the visual system was based on a particle system affected by a physics engine and the user’s hand positions. Visually the particles were small and of equal size and were connected by lines, forming a sort of organically moving spiderweb. The sounds accompanying the visuals can be thought of as a trilling sound ascending in pitch, almost like dragging a mallet from left to right on the keys of a xylophone. Each triggered sound lasted for 2 seconds. Figure 4.23: Examples of background colour generated by user position Figure 4.26: Process sketches of possible grid solutions, both visual and rule based ideas Hand triggers, threshold Figure 4.30: Visual ideas. (1) Early interface thumbnail sketches. (2) Digital interface mock-up from thumbnails. (3) Refined interface mock-up. Establishing a consistent visual expression The visual expression we went for was to a great extent dictated by the limitations of the performance and tools of Processing, as mentioned in section 4.2.4. With these limitations in mind, we decided to go for a very clean and simplistic expression, focusing on heavy use of colours and movement instead of complex graphics. We were heavily inspired by 80ies computer games, where the graphic expression was also limited by the comparatively weak hardware and software of the time. Screen as acanvas_ The rest of the screen real estate was intended to act as a blan canvas for the visual behaviours triggered by the participants. To give the canva a feeling of depth and space, we created a background texture, consisting of whit triangular polygons with different transparency values to overlay a flat backgroun: colour ((3) Figure 4.30). In this way we could change the background colour easily an keep the illusion of space created by the texture, without changing the texture itsel The background colour itself was generated based on a participant’s torso position i the grid or the combination of all positions if several participant were present in th installation. Defining behaviour of the modes To identify suitable behaviours for the modes of the installation we conducted another brainstorming session where we mapped out different types of algorithms using a mind map (Figure 4.31). A significant part of the consequent process consisted of quickly creating modes with the algorithms most likely to be appropriate and then testing them with the audio system, as described in section 4.6.2. This also meant consulting a myriad of online tutorials, books and Processing libraries on generative algorithms. The most prominent of these are discussed in section 4.2.4. and all third party library code, this will be a very superficial and quite simplified presentation of the inner workings of the visual system application. Figure 4.32 shows an overview of the Java classes handling the main functionality of the application. Not shown in this diagram is the range of utility-classes, value object classes and the third party libraries. The libraries used were the following: Figure 4.34: PLEX score and importance rating of categories, sorted descending from left to right, based on importance. Max. achievable combined score is 54 (36 for PLEX, 18 for Importance), and min. score is zero. Scores are based on the responses of nine participants. Figure 4.35: The continuous process of iteratively improving the installation with the help of testing it and evaluating needs Modifications were made to both the visual and the audio system in iterations between sessions to continually improve the experience as much as possible. Time available was scarce, so the improvements made had to be prioritised in a strict manner. So even though we were made aware of several important shortcomings, we could only improve the ones that fit into our schedule. Figure 5.2: Minutes spent interacting with installation by active participants (34). Time spent Figure 5.6: Time spent interacting with installation by active participants (17). Figure 5.8: Distribution of facial expressions of all observations (22), according to coding scheme. Facial expressions Figure 6.2: Facial expressions of participants (1) in Science Library and (2) at Oslo Mini Maker Faire. In terms of time spent by participants interacting with the installation of the two locations, we see the time spent at the Oslo Mini Maker Faire was significantly higher. At the Science Library, no one spent more than three minutes with the installation, 41% spent less than one minute and 72% of the observed spent two minutes or less. At the Maker Faire, the time spent interacting is spread much more evenly across the intervals noted, 59% spent two minutes or more interacting, and some people seen outside the time frame of observations were exceeding the intervals noted sionificantlvy. Looking at the distribution of facial expressions observed in the two different contexts, expressions of a positive nature are the predominant ones in both settings, but at the Oslo Mini Maker faire as many as 86% were smiling and even though 5% were noted as indifferent, 95% of the observed were deemed positive. Comparing the observations of body language between the contexts, a high degree of curiosity is observed in both settings, with 47% recorded as displaying a body language suggesting curiosity in the library setting while 37% were recorded at the museum. The most striking difference between the library and the museum contexts was the high percentage of joyfulness (27%) and the low percentage of shyness (4%) of the museum setting, in contrast to the low degree of joyfulness (3%) and high degree of self-consciousness (15%) and shyness (5%) combined (20%). The reason for combining self-consciousness and shyness is that they are very similar traits, and as mentioned in section 5.2.2, self-consciousness was recorded as being completely in control of your own body in the Oslo Mini Maker Faire observations. Seen in retrospect, separating these terms in two coding categories might have been unnecessary, considering their similarities and the fallibility of observation.