Bill Buxton on making user interfaces “natural”

Bill Buxton always leaves an impression when you see him talk. Recently at the closing keynote of Microsoft’s MIX10 conference, Bill explored the concept of it means to be “natural” in the context of natural user interfaces (NUI).

Whilst it was an interesting presentation – Bill’s larger-than-life character is hard to miss, I don’t think some of the points came across very clear which is why this new 8-minute video by Microsoft Research is a must watch for all user experience-enthusiasts.

If you’re looking for a quick bite, skip ahead to 2:08 in the video where there’s a very interesting concept and prototype demonstration on how we can make panning and zooming virtual documents more “natural”. Without spoiling how, it really makes the multitouch equivalent seem like yesterday’s technology.

31 insightful thoughts

  1. His pen glows!

    Anyway, re: panning by moving the device seems kinda too much IMHO Imagine you showing images to your friends and you keep moving the phone and yourself from left right, I fear a friend of mine will probably smack me and tell me stay put. 😐

  2. manan, I am positive there are simple measures that prevent this from happening. e.g. a certain tilt of the device will prevent panning. or a simple touch with one finger on the device. I think this is cool. I can also see this being transferred to the desktop PC and monitor (I’d be shocked if this is not patented yet). a camera follows the motion of your eyes and pans the display accordingly. you have a virtual display that is many times the size of your actual monitor’s display.

  3. Thanks for the posting and the comments. Actually, I had originally planned to use some of this material in my MIX10 keynote. But when I knew that this video was going to be released, I wanted to avoid the redundancy. I figured that one would complement the other and help fill in the gaps.

    As for Maman’s comments – for sure – you are right that there are times when this is completely the wrong thing to do. But this fits into my belief that everything is best for something and worst for something else. We don’t have to see interaction techniques in competition; rather, the finger-stroke panning and pinch zooming can peacefully coexist with the camera viewfinder metaphor that I demonstrate here. The benefits of the technique – when properly implemented – accrue when you want to navigate in a single integrated gesture, do it with one hand, and be able to use motor-memory to get back to a previous location. None of these work with the finger-swipe pan and pinch gesture. On the other hand – as you point out – the price paid is moving the hand. How far and how much is a function of the Control:Display gain, and making sure that you have a “clutch” so that you can lock in on a location.

    The one thing that I regret not making clear in the video is that this “PDA-as-Camera Viewfinder” metaphor is something that originated with one of my one-time graduate student, current friend, and past colleague: George Fitzmaurice – one of the most creative people that I have worked with. He pioneered the technique in 1992. It is really worth looking at his 1993 article for a much deeper treatment that I provided in this video:

    George W. Fitzmaurice (1993). Situated information spaces and spatially aware palmtop computers. Communications of the ACM, July 1993, 36(7), 38-49.
    http://www.dgp.toronto.edu/~gf/papers/Chameleon%20-%20Situated%20Info%20Spaces.pdf

    One of the key things that I tried to demonstrate with this video is how in using the Chameleon technique (as George named it) in addtion to mashing up both pan and zoom with one hand, one can simultaneously use the other hand to interact with that which you are displaying thereby. The use of the touch or stylus activated click-through tools that I demonstrate is just one way of doing so.

    Thanks for the comments, and I hope that the above – especially the pointer to George’s work – helps shed additional perspective on the video.

    Happy to engage in any follow-up questions, comments, etc.

    Bill

    1. “One of the key things that I tried to demonstrate with this video is how in using the Chameleon technique (as George named it) in addtion to mashing up both pan and zoom with one hand, one can simultaneously use the other hand to interact with that which you are displaying thereby. The use of the touch or stylus activated click-through tools that I demonstrate is just one way of doing so.”

      I would love to see you or anyone else demonstrating interacting with the content with finger swipes or stylus inputs while you SIMULTANEOUSLY wave it around in front of your face and moving it in and out and the content is actually changing the entire time because “your example” actually shows you STOPPING navigation every time you actually interact with the content. Oh wait, not only is that not truly very feasible or NATURAL at all, there is not even a problem there. If you actual need to navigate while interacting with content that extends beyond the screen, the content should simply scroll when your input reaches the edge of the screen, no need for additional gestures — a solution arrived at at the very beginning of GUIs.

  4. This is silly. I think this is an obvious thing that everyone with access to an accelerometer and with an interest in showing a document larger than the display has tried, but realized right away that it was completely unusable. I know I did. And there are “sample projects” out there for every phone with an accelerometer that provide exactly that (lookup iphone’s code sample called “tea pot”). It’s weird to see it pitched by “MS research”… This is completely unremarkable for anyone with a minimum of experience with accelerometers.

  5. The document reading by moving the device was the coolest part. It’s as if the device the motion sensing understand physics.

  6. John, thanks for your comments, but think that it is possible that you may have missed the point.

    As I hoped was clear, my objective was to use simple examples to explain something about the subtle nature of human skill, and the need to understand and respect it in design. I’m not sure how many more times I could have said, “It’s not about the technology.” As my comments above make abundantly clear, no claim whatsoever is made that the “PDA-as-Camera Viewfinder” is novel or new research – by Microsoft or anyone else. (And, in keeping with that, the Guiard study that I cite as the model for the carbon-paper part of the video was done in 1987. Also not new.)

    As an aside, I would suggest that the digital camera users of the world may slightly question your characterizing the use of this kind of technique as “unusable.” I would also suggest that God is in the detail, and that the quality of experience and the degree of usability of such an approach differs greatly depending on implementation – among other things, according to whether one has a linear position-sensitive C:D ratio (as we do and George did in 1993) or one with a motion-sensitive relative gain as typical with an accelerometer).

    You and I may have different views on the nature of research. For me, at least, it is at least as much about viewing the familiar in the world with new eyes and perspective as it is about demonstrating some technology that nobody has seen before.

    Virtually every one of us has the innate skills to simultaneously zoom and pan a view of a document or space, while simultaneously using one hand to interact with objects in that space. Assuming that supporting such a skill has value for some people in some tasks using mobile technology, then my hope would be that amongst the “’sample projects’ out there for every phone” that you mention, a better way has already been demonstrated. If so, I would welcome a pointer to it. I have not been able to find one – but that may be my problem.

    Ultimately, if the only benefit of this video is that it raises the issue, and helps find or discover a means to support this well, then none of our time has been wasted.

    Thanks for taking the time to comment.

    Bill

    1. “Assuming that supporting such a skill has value for some people in some tasks using mobile technology, then my hope would be that amongst the “’sample projects’ out there for every phone” that you mention, a better way has already been demonstrated.”

      Well, in fact many do already exist and are in use today. The simple fact is: Microsoft trots you out to talk about 30 year old research in a vague, general, non-real world way that doesn’t come close to addressing real products to cover up the fact they are 5 years behind. Many apps on the iPhone do support accelerometer based panning/scrolling so rather than flailing around in a semi-circle in front of you, the same navigation you are demonstrating is produced in small, subtle tilts of the device. Done. Last year. (Of course, this doesn’t give you a sense of your position in a document, but you kind of neglect the fact that this method doesn’t work on anything but images of a known size (or an electronic document “captured” as if it were a physical “image” — how do I know a desired position in an ebook version of a 500 page novel, for example?)

      Equating this navigation technique with digital camera usage is nonsensical — camera usage is actually about using the viewscreen to view the world around you, not about navigating electronic files in a virtual manner. (Moving a device further away from me to zoom in may work conceptually, but then again, if I am myopic, the document may now appear bigger but now blurry because I’m holding a device at arms length away from me for example.) Acknowledging that there are as many problems with this implementation as benefits or that this research has been done by others decades ago with very few real, successful, useful implementations betrays how little substance MSR can provide the UI community.

      Oh, and the dumbasses who decided every screen on WP7S (and that name!) should be shifted a quarter of an inch because you think your users are too dumb to realize they can swipe a touchscreen should be shot.

      Time to get something real out of the labs, Microsoft, that doesn’t suck or completely mimic others successes.

  7. I really thought the human interaction ideas were great. The focus on humans rather than technology, makes the technology even better. It is a multi-layered approach and that is very interesting. I just wish you could put this all together in the new Windows Phone 7 Series so we could use all of the layers in a transparent manner.

  8. I misunderstood indeed. For some reason, I was viewing this as an answer rather than a question. I see the point now. This is asking exactly the right questions and provides good “food for thought”, but unfortunately provides no answers.

    Thank you for taking the time to respond and for being so nice. Sorry I was a bit harsh, I did indeed completely misunderstand the point of this video.

  9. Hey John, you weren’t harsh, you were just being what a lot of people would call a jerk. It seems like a common trait among “techies”–to crap all over someone making a point because you think they’re trying to be smarter than you. A suggestion: therapy.

    1. Okay, you have been presumptuous and rather rude and are failing to recognise it. Logically speaking, you are in fact the one who should be seeking assistance and this is the last place you should be looking for it. So please, be silent on this matter forever more.

  10. Very interesting video! Thanks Bill – I watched your keynote in person at Mix09. I am very glad you are with the company!

    I started out as a designer that went to programming, apparently a rare path, but I really do appreciate these takes on interface problems. Are you working with the Windows Mobile 7 team?

    1. Building off what Ryan said, I can very easily imagine this method of panning being used in conjunction with WP7’s “Metro” interface, where the UI is wider than the display.

      Amazing stuff in any case! As a Geomatics student, I was particularly excited by the mapping application.

  11. Excuse me if im wrong but i think this video makes a link between EASY UI and Natural UI.
    I believe there is a difference :)

    Danny.

  12. Tim F.
    You pose some legitimate questions in your two postings. I will attempt to address them as best I can in the space available.

    To begin, let me reiterate that my objective in this demo was to illustrate human capacity, and argue that the better designers understand the specific motor-sensory, cognitive, social and emotional skills of their users/customers, the better they can design systems that leverage those skills.
    There are a few things that emerge from this:

    • The notion that a reasonable way to think about what makes an interface “natural” might be the extent to which the system enables the user to use their particular existing skills in an appropriate way.
    • Since skills are expensive to acquire (due to the power law of practice), considering the use of existing skills, rather than forcing the acquisition of new ones, should be an important aspect of design.
    • It is obvious, however, that there are times when the acquisition of new skills, or the further development of new ones, is the right choice.
    • Different people have different skills, hence one-size-fits-all design solution do not, and cannot prove optimal for everyone or anyone.
    • None of these are new notions, nor do they originate with me. They are, however, notions that are too often forgotten by some, and which if paid more heed by others, could be used to improve even the best designs of today. Hence, I judge them worth reiterating.

    Now to the specific points that you raised in your two notes:
    (1) The feasibility and desirability to interact with data at the same time as one is changing view, or navigating through it.
    You state that you “…would love to see you [viz. me] or anyone else demonstrating interacting with the content with finger swipes or stylus inputs while you SIMULTANEOUSLY wave it around in front of your face and moving it in and out and the content is actually changing the entire time.” That one is easy. That is exactly how the camera in my mobile phone functions, and it works well. The touch screen is the viewfinder, and the camera controls, including the shutter, are all superimposed on the screen and activated by the finger. Shooting while panning – to keep a moving object of interest in frame – uses exactly the skill that you ask to see demonstrated.
    But there is something more here. There appears to be a misunderstanding implied by how you use the word “simultaneously” in your request. I take some responsibility for that fact, due to my not making the following point – due to there being a limited number of things that one can get across in a short video. In how you phrase your request, you appear to miss the fact that a significant number of benefits accrue even if the action task (finger swipe/push, etc.) is done sequentially relative to, rather than in parallel with, the navigation task (pan/zoom). Simply stated, these accrue due to the allocation of the navigation tasks to one hand, and the command to the other. Hence, one is in “home position”, so to speak, all the time (viz., simultaneously), even if the resulting actions are sequential. This has been shown to reduce both the time to perform the compound task – due to time-motion efficiencies – as well as cognitive load, due to a reduction in cost of mode switching and maintenance due to the distribution of the two classes of task over the two limbs. I would be happy to point you to the experimental literature around this topic. Sorry for not making this clear in the video, but it was an 8-minute video, not a text book.
    Finally, you take me to task for stopping the movement while interacting with the screen in the video. Good observation. However, concluding that the reason is that I can’t do it is mistaken. Let me give two quick points of explanation. First, it is hard enough for the videographer to capture action on a small screen, plus context, when the screen is static, much less when it is moving. It is equally hard, or harder, for the viewer to see the result, even if the videographer tries. Second, this is not a problem for the person actually using the device, since the hand motion and head/eye motion can be easily coordinated. Thus, their spatial relationship does not change relative to one-another. It is the background that is a blur, not the interaction on the small screen.
    Could we have rehearsed the moves over and over so that the camera tracked my hand-motion of the mobile in the same way as my head/eyes? Yes, of course. Would it have been worth the extra time and effort to do so? One can debate that issue. Given the time that we had available, and the incremental value that would be added, the choice was made not to. My decision was based on the assumption that people would believe what I said and would understand some of the logistics of shooting video. From your comment, it appears that that assumption may not have been correct, and hence my decision may have been wrong.

    (2) Auto-Scroll-at -Boundaries
    You assert “…that there is not even a problem there. If you actual need to navigate while interacting with content that extends beyond the screen, the content should simply scroll when your input reaches the edge of the screen, no need for additional gestures — a solution arrived at the very beginning of GUIs.”
    Yes, the auto-scroll feature at screen boundaries is an important technique used in GUIs = one that provides real value, and one that could likely be used more to benefit mobile devices. But if that technique was optimal for all cases where one was dragging an object to a location that was currently outside of the screen boundaries, then why did the GUI that you revere so much also support other techniques for doing it as well? I would argue that the reason is that each technique has its own strengths and weaknesses, and that overall, everything, and every technique, is best for something and worst for something else.
    A new technique rarely replaces an old one. If a technique survives, it is generally because it provides a strong solution to a relevant class of situations for which the previous techniques perform less well. Let me give you an example that I believe demonstrates a case where the auto-scroll-at-boundaries technique works less well than the one that I demonstrate: trace along a curvy road on a map, constantly in the direction that is off-screen. By the way, to be balanced, I would normally give a counter example – one that demonstrates a class of task where the auto-scroll-on-boundary technique performs better. But there is no need: we both know that there are many. But in no way do these examples prove that either is best all the time.
    If you concede that, then I would suggest that logic insists that you also concede that – contrary to your assertion – there is a problem, i.e., potential to improve upon current practive.

    (3) Position-Motion-Tilt Sensing
    You correctly state, “Many apps on the iPhone do support accelerometer based panning/scrolling.” I fully agree. I also know that the use of tilt and/or accelerometers to control navigation pre-date even the iPhone (which takes nothing away from the validity of its use in the iPhone and other mobile devices).
    I would, however, question your next assertion, that “… the same navigation you are demonstrating is produced in small, subtle tilts of the device. Done. Last year.” Yes, as is obvious, one can navigate to areas off screen by tilt or by lateral motion in such applications by virtue of accelerometers. But, no, it is not the same thing. To my reading, your assertion is comparable to saying that controlling a video game with a joystick is the same, regardless of whether the joystick is a position-sensitive isotonic joystick, a spring-loaded self-returning joystick, or a static force-sensitive isometric joystick. Yes, there is a possibility that one could control the game with any or all of the three. But your score will likely vary widely depending on which one you use, as will likely the quality of the resulting gaming experience.
    Tilt, by the way, works very well with the technique that I demonstrated and therefore is subsumed by the technique that I demonstrated (again, due to limits of time, I did not demonstrate that feature – but as I said – I had about 8 minutes, and the video it was not about the technology, per se, so no real loss).
    But as interesting and useful as it is, the use of tilt, or non-linear mapping of motion, do not let you take advantage of the potential to exploit motor-memory to navigate rapidly to certain locations, largely eyes-free, using ballistic rather than closed-loop motor action. As you clearly acknowledge, tilt “… doesn’t give you a sense of your position in a document.” This is a big deal in many very-real situations!
    But then you go on to say something that suggests that you very much mis-understood what I was talking about, namely, “… you kind of neglect the fact that this method doesn’t work on anything but images of a known size (or an electronic document “captured” as if it were a physical “image” — how do I know a desired position in an ebook version of a 500 page novel, for example?)”
    First, I don’t recall saying anything even vaguely like, “this technique only works on images of a known size.” In fact, I didn’t say that it only worked on images, since it works on a range of document and data types. Perhaps you are reading too much into my words – things that I never said nor implied. It is not about navigating to specific pages in a 500 page novel. Rather, it is about moving about on any single page, or imagining an open book, two facing pages. It is about moving about, with agility, around a large spread-sheet, map, technical drawing, photograph, or web-page (as opposed to web-site). I assumed that this was clear from my words. I apologize if I was not, and if assumed too much. I hope that this helps clear up this particular misunderstanding of my intent.
    (4) On Viewing the Physical World vs Documents
    You state, “camera usage is actually about using the viewscreen to view the world around you, not about navigating electronic files in a virtual manner”.
    Actually, I would say this quite differently:
    Camera usage is about taking pictures of the world around you, and in order to do so, you need to use the viewfinder to focus on the subject in that world that you want to photograph, at the scale and the framing that you want to photograph it at.
    Stated this way, there is a parallel statement that could be made:
    Navigating around a single page document that is larger than the screen boundaries is about using the controls, coupled with the display, to focus on the area of interest in the document, at the scale and the framing that you want to meet your interest.
    In computer graphics terminology, both involve setting the viewport over the area of interest for the performance of the task.
    I believe that my characterization is both legitimate and relevant. But even if that is so, it does not necessarily follow that the same mechanism should, therefore, be used for both, and even if so, when, for whom, etc. That is why the industry builds prototypes, and does research prior to commercializing ideas –even old ones.
    Finally, you state, “Moving a device further away from me to zoom in may work conceptually, but then again, if I am myopic, the document may now appear bigger but now blurry because I’m holding a device at arms length away from me ….”. This comment makes clear that you have not looked the previously cited 1993 Chameleon work of George Fitzmaurice. If you look at the video of this work (search: “The Chameleon: Spatially aware palmtop computers”), you will see that the system – like the one on which I was annotating in the video – had a “clutch”. That is, it only tracks when the clutch is engaged. Then, much like a mouse, one can reposition the device/your arm to a position of comfort, without changing the view. And, if the control:display ratio of the device is linear, unlike a mouse, while the origin of the space is thus moved, a movement of the same distance in the same direction will take you to the same view as it would have had you not repositioned the origin by clutching. In case that mouthful was not clear, the succinct answer is that the situation that you describe is and was easily dealt with.

    (5). On Acknowledging things
    There must be some misunderstanding that needs to be cleared up. You state that I acknowledge, “… that there are as many problems with this implementation as benefits.” If I said that, or gave that impression, then I am sorry. That was certainly not my intent, and that is certainly not my belief.
    Finally, you appear to discount what I state because I acknowledged that the work and ideas that I presented were partially based on, “…research [that] has been done by others decades ago.” I hope that that I misinterpreted your remark, since virtually all research, and all new products, are likewise based in part on work that typically dates back 10-20 years. (I am happy to provide detailed sources that back this statement up). Acknowledging one’s debt to those whose work one has drawn on is simply good scholarship, good manners, and being honest.
    To conclude, thanks for your note and bringing up these issues, since, if you brought them up, there is a good chance that others may have similar questions. Hopefully this helps address them somewhat.

    Sincerely
    Bill Buxton

  13. Bill Buxton asked a question to you: it is not the issue here but I could say if the courier is a real or false idea and if true, when we will have something we can enjoy?

Leave a Reply