Prototyping Interactions With a Woz Skill

Andrew Rapo
February 15, 2017

Prototyping Interactions With a Woz Skill

See: Managing Code with Custom Flow Activities

See: Flows- A Dungeon Example

See: MIMs

Enabling Rapid Iteration

The best way to converge on a successful strategy for interaction is to iterate rapidly, testing the way users respond to Jibo’s voice, body animations, screen animations, speech recognition, touch interface, etc. One way to shorten the iteration cycle is to set up a remote control mechanism that allows designers and programmers to play the role of the ‘man behind the curtain’ - as in the Wizard of Oz (Woz). This post describes a woz skill that implements a mechanism for controlling Jibo via a Web interface. The design makes it easy to guide or override Jibo’s autonomous behavior.



The woz skill is most valuable when it is running on an actual robot, but it can be fully developed and tested in the Jibo SDK simulator. The screenshot below shows what the woz control interface looks like when running in the simulator. Note: The address of the woz page in this example is localhost:9494. When running on a robot the woz page address is [robot-id.local]:9494.

Woz_image.png

The woz page in this example provides an open input field that is used to send text phrases to Jibo’s TTS system. It also provides a way to define buttons that can send command messages to the woz skill to trigger actions or guide an interaction.



The Main Flow

The main control logic for the woz skill is defined in main.flow. There are two main control loops. One for commands that are generated by clicking on buttons on the woz page and one for messages that are exchanged with the page via socket.io (more on that below).

Woz_main_flow.png

main.flow

The onCommand and onMessage Loops

The onCommand loop handles button clicks from the woz page by routing them to one of several command handlers. In this example there are handlers for command tokens including:

  • tts
  • animation
  • video
  • image
  • lookAt
  • audio

In this example, the onMessage loop has just one important job which is to send the woz page a description of the buttons that should be displayed. The woz page, which is described in detail below, is a static html page that uses React to render its content. By sending some button information to it (via socket.io) part of that page is controlled dynamically.



The Project Files

The woz-skill has a typical skill project structure. Media assets are in the animations, audio, images, and videos folders. MIM metadata is in the mims folder. The JavaScript/TypeScript source code for the skill is in the src folder. Unlike most skills, the woz-skill also has a static folder which contains the html and JavaScript files that define the woz page. The JavaScript used by the static woz page is authored using ES6 and React keywords so it has to be compiled into ES5 JavaScript using gulp. The .gulpfile contains the gulp instruction to accomplish this.

Woz_project_tree.png



The src Folder

The project’s src folder looks like most skills. It has a folder for flows (.flow files) and a folder for rules (NLU .rule files). The http-service folder contains classes used by the woz page server. The individual files include:

  • index.ts - The entry point for the skill
  • WozSkill.ts - The main skill class
  • WozServer.ts - A class that extends HTTPSocketService.ts
  • DisplayImageJS.ts - A custom flow activity that displays an image on Jibo’s screen
  • LookAtAnimator.ts - A class that animates Jibo’s body to orient toward a point in space
  • PlayAnimationJS.ts - A class that animates Jibo using a pre-made animation
  • PlayAudioJS.ts - A class that plays an audio file through Jibo’s speakers
  • PlayVideoJS.ts - A class that plays a video file on Jibo’s screen

The Play[Something]JS classes make it easy to play a media file using a filename that is specified at runtime. The implementation details are beyond the scope of this post. (See the post: Managing Code with Custom Flow Activities.) The important point is that the onCommand loop can be used to trigger any action including custom flow activities, standard flow activities, sub-flows, etc.

Woz_src_tree.png



HTTPSocketService

The source code for the http-service classes is provided at the end of this post. It will look familiar to Web developers who have used node’s standard http capabilities along with npm modules like socket.io, connect, router, etc.

The WozServer extends the HTTPSocketService which wraps an HTTP-only HTTPService with additional methods that allow it to also manage socket connections. The socket connections give the woz skill a mechanism for two-way communication with the static woz page (HTML) that is served by the WozServer.

The HTTPSocketService constructor instantiates a new Web server using a the port number provided in the options object and passes along the path to the static HTML page that will be served, staticDir. It also sets up handlers for incoming socket connections and socket messages. The init() method instantiates the sockeio listener and creates an event handler for socket connection events.



Woz_http_socket_service.png

HTTPSocketService.ts

The onConnection method sets up handlers for other socket events including close and message. Message events are handled by the onMessage method. Notice that the onMessage method is abstract, meaning that it is not implemented here. Instead it needs to be implemented in subclasses of HTTPSocketService, like the WozServer class. When a close event is received over the socket, the client sending the close event is removed from the list of active connections.

The broadcastMessage method is used to send a message to all connected clients.

HTTPSocketService.ts



The WozServer Class

By extending the HTTPSocketService class, the WozServer class inherits all of its socket and HTTP functionality. The WozServer constructor takes a reference to the WozSkill which it uses to access the WozSkill’s methods. In this example, WozServer overrides the abstract onMessage method of HTTPSocketService and relays messages to the WozSkill via the WozSkill’s own onMessage method.



Woz_server_constructor.png

WozServer.ts

In addition to managing socket messages, the WozServer class manages HTTP Post requests which are generated by clicking buttons on the woz page. These post requests (requests to the /command URL) are relayed the WozSkill via the WozSkill’s own onCommand method.

Woz_server_routes_command.png

WozServer.ts

The WozSkill Class

Like all skills, the WozSkill class has a postInit() method that is called early in its lifecycle - before the skill is opened. This is a good place for setup code and this is where the WozServer is instantiated and initialized. Once its init() method is called, the server will be running and actively serving the static woz page.



Woz_skill_instantiate_server.png

WozSkill.ts

As seen above, the WozServer calls the WozSkill’s onMessage() and onCommand() methods whenever it receives HTTP requests (button clicks) and socket messages. These messages and commands are relayed to the skill’s main flow (main.flow) via the flow’s emitter and handled by the flow logic.



Woz_skill_messages_to_flow.png

WozSkill.ts

Handling Commands in the Main Flow

The OnCommand (Flow.EvalAsync) block of the main flow sets up a listener which waits for one command event (emitter.once()). When a command event is received the associated command object is parsed to determine the name of the command. The name of the command becomes the output transition for the OnCommand block. So, a command with the name, tts, will be routed to the woz-tts announcement MIM block which will read the text in the command’s data.text property. This will cause Jibo to speak whenever a text phrase is submitted in the woz page’s TTS input field.



Woz_main_flow_on_commands.png

main.flow



Similarly, commands with the name, lookAt, will be handled by the flow’s LookAt block which invokes the skill’s LookAtAnimator instance with the angle specified in the command’s data.angle property.

Woz_main_flow_look_at.png

main.flow

Handling Socket Messages in the Main Flow

The OnMessage (Flow.EvalAsync) block of the main flow sets up a listener which waits for one message event (emitter.once()). When a message event is received the associated message object is parsed to determine the type of the message. The type of the message becomes the output transition for the OnMessage block. So, a message with the type, getCommands, will be routed to the SendCommands block.



Woz_main_flow_on_messages.png

main.flow

The SendCommands block will send a reply message to the static page using the WozServer’s broadcastMessage() method. This reply message contains an object which describes all of the commands that should be displayed as buttons on the static page.



Woz_main_flow_send_commands.png

main.flow

The Static Woz Page

The woz page served by the WozServer class is a typical HTML page that lives (by convention) in the project’s static folder.



Woz_static_src_tree.png

In index.html, a DIV is created with the id, ‘grid’. This DIV will be used as the container for all of the page’s UI elements. The script tags references a number of JavaScript libraries that are required by the page including React libraries, the socket.io library and the woz page’s index.js which controls the page’s rendering, interactivity and socket communication.



Woz_static_index.png

index.html

Gulp and Transpilation for ES6 and React Elements

The reference to index.js is ‘lib/index.js’ rather than ‘src/index.js’ because the src/index.js file is written in ES6 which is not fully supported by browsers and it contains React tags which are also not part of standard JavaScript. In order for the page to work correctly src/index.js has to be transpiled into ES5 and output as lib/index.js. The npm module, gulp, is used to automate the transpilation process and gulp gets its instructions from the file named .gulpfile.

Woz_static_gulp.png

.gulpfile

The .gulpfile defines a gulp task called woz-page which takes all of the JavaScript files in the src tree and converts the React elements to ES5 [.pipe(react())], then converts all the ES6 elements to ES5 [.pipe(babel())]  and then writes the converted files to the ‘static/woz-page/lib’ folder  [.pipe(gulp.dest(dest))]. In this project there is only one JavaScript file, src/index.js.

The Woz Page’s Socket IO Methods

The index.js file contains a number of sections, one of which sets up the socket interface that is used for two-way communication with the skill. The call to io.connect() opens the socket connection. The socket.on(‘connect’, …) handler sends a getCommands message to the woz skill when the connection is first established. The socket.on(‘message’, …) handler process messages that are received from the skill and if the message type is setCommands, the data of the message is used to define the page’s UI buttons.

Woz_static_page_socket.png

Part of the index.js file

The Woz Page Commands React Class

The index.js file defines a React class called Commands which is used to render the UI buttons and the text field for TTS commands. The handleChange() and handleSubmit() methods turn anything typed into the TTS text field into a tts command. The onCommand() method communicates with the WozServer by making HTTP Post requests and passing the TTS and button command data in the body of the request .

Woz_static_page_commands.png

index.js

The Woz Page Commands React Class - Render Method

The render() method of the Commands class uses the data in the commands object to generate all of the UI buttons. Each button is implemented as a ReactBootstrap.Button instance and each has the onCommand() method as its onClick handler.

Woz_static_page_render.png

index.js

Testing in the Simulator

The result of all of this web server setup is a web page that can be accessed using the URL: http://localhost:9494.



Woz_woz_page.png



The LookAt: Behind button will make Jibo look backwards...



Woz_look_behind.png

The TTS: Hello button will make Jibo say, “Hello, world”...



Woz_tts_hello_world.png

Typing “For example, <anim name='Emoji_Pizza' nonBlocking='true'/>I can help you order pizza.” into the TTS field will cause Jibo to say, For example,I can help you order pizza while displaying a pizza emoji. This is a good way to test ESML (Embodied Speech Markup Language) tags.



Woz_tts_pizza.png

Building the Skill

The woz skill requires two build steps:

  1. The usual: ‘npm run build’
  2. And an extra step to build the static page: ‘gulp woz-page’

Conclusion

The woz skill described in this post provides a convenient way to test new skill concepts using a helpful remote control mechanism - along with Jibo’s core VUI and GUI capabilities. This post just scratches the surface of what can be done with this kind of tool.

A Note to Users of Jibo’s Publicly-released SDK

Many features described in this post, including MIMs and Flows, are not available it the current, publicly released SDK. However, the woz concept can be easily adapted. Because they are not Jibo-specific, the web server, socket and React code examples will work with any version of the SDK. 





 

Source Code

HTTPService.ts

import * as fs from 'fs';
import * as path from 'path';
import * as http from 'http';
import * as mime from 'mime';

import connect = require('connect');
import * as bodyParser from 'body-parser';
import serveStatic = require('serve-static');
import {EventEmitter} from 'events';
import Router = require('router');

import {ServiceOptions, HandlerFunction, ServerResponse} from './index';

class HTTPService extends EventEmitter {
    public app:connect.Server;
    public server:http.Server;
    public staticDir:string;
    public router:Router|HandlerFunction;
    public name: string;
    public options: ServiceOptions;
    public rootDir: string;

    constructor(name:string, options:ServiceOptions, rootDir:string) {
        super();
        this.name = name;
        this.options = options;
        this.rootDir = rootDir;
    }


    init(callback: (err?:Error)=>void) {
        this.app = connect();

        this.app.use(<any>bodyParser.json({ limit: '10mb' }));

        this.router = new Router();
        this.routes(<Router>this.router);
        this.app.use(<any>this.router); // <HandlerFunction>this.router

        this.staticDir = path.join(this.rootDir, 'static', this.name);
        this.app.use( serveStatic(this.staticDir) );  // html and friends

        this.server = http.createServer(this.app);
        this.server.listen(this.options.port, () => {
            this.port = this.server.address().port;
            console.log(`${this.name} service listening on port ${this.port}`);
            callback();
        });
    }

    get port():number {
        return this.options.port ? this.options.port : 0;
    }

    set port(value:number) {
        this.options.port = value;
    }

    enableDebug() {
        this.app.use( (req, res, next) => {
            console.log(this.name, req.method, req.url);
            next();
        });
    }


    routes(url:Router):void {
        return;
    }


    finish(res:ServerResponse, err?:Error, data?:any, contentType?:string, statusCode?:number):void {
        if (err) {
            res.statusCode = 500;
            res.end('Error: ' + err);
            console.error(err);
        } else {
            statusCode = statusCode || 200;
            if (contentType) {
                res.writeHead(statusCode, {'Content-Type': contentType});
            } else {
                res.writeHead(statusCode);
            }
            if (data) {
                res.end(data);
            } else {
                res.end();
            }
        }
    }


    finishNoContent(res:ServerResponse, status?:number, err?:Error) {
        this.finish(res, err, null, null, status);
    }


    sendFile(res:ServerResponse, filename:string, contentType?:string):void {
        console.log(this.name, 'sending ' + filename);
        contentType = contentType || mime.lookup(filename);
        let file = fs.createReadStream(filename);
        file.on('open', () => {
            res.writeHead(200, {'Content-Type': contentType});
            file.pipe(res);
        });
        file.on('error', (err) => {
            this.finish(res, err);
        });
    }


    sendJson(res:ServerResponse, json:Object|string, statusCode?:number):void {
        let err;
        if (typeof json !== 'string') {
            try {
                json = JSON.stringify(json);
            }
            catch (e) {
                console.error('JSON.stringify: ', e);
                json = null;
                err = e;
            }
        }
        this.finish(res, err, json, 'application/json', statusCode);
    }
}


export default HTTPService;



HTTPSocketService.ts

import HTTPService from './HTTPService';
import http = require('http');

import {ServiceOptions} from './index';

const socketio = require('socket.io')();

abstract class HTTPSocketService extends HTTPService {

    public io: any;
    public connections: any[];

    constructor(name:string, options:ServiceOptions, rootDir:string) {
        super(name, options, rootDir);
        this.onConnection = this.onConnection.bind(this);
        this.onMessage = this.onMessage.bind(this);
        this.connections = [];
    }


    init(callback:(err?) => void) {
        super.init((err?) => {
            this.io = socketio.listen(this.server, {});
            this.io.on('connection', (socket) => {
                this.onConnection(socket);
            });
            callback(err);
        });
    }

    verifyClient(info: {origin:string; secure:boolean; req:http.ServerRequest}, callback:(res: boolean) => void):void {
        //accept all connections for now
        callback(true);
    }

    onConnection(client:any):void {

        client.on('disconnect', () => {
            console.log('on disconnect');
            let i:number;
            for(i = 0; i < this.connections.length; i++) {
                if(this.connections[i] === client) {
                    break;
                }
            }
            this.connections.splice(i, 1);
            client.removeAllListeners();
            this.onClose(client);
        });
        this.connections.push(client);
        client.on('message', (message:any) => {
            console.log('on message: ', message);
            this.onMessage(message, client);
        });
    }

    broadcastMessage(message:any): void {
        this.connections.forEach(client => {
            client.emit('message', message);
        });
    }

    protected abstract onMessage(message:any, client:any):void;

    protected onClose(client:any):void {
        return;
    }

    sendWsJson(client:any, json:Object|string):void {
        let err;
        if (typeof json !== 'string') {
            try {
                json = JSON.stringify(json);
            }
            catch (e) {
                console.error('JSON.stringify: ', e);
                json = null;
                err = e;
            }
        }
        client.send(json);
    }
}


export default HTTPSocketService;

Andrew Rapo
Executive Producer, Business Development & Marketing