Deep Learning has introduced methods capable of extracting latent variables from an amazing variety of inputs. Raw facial image data can be quickly converted into features such as emotional expression, face orientation, or even suspect identity. Deep Learning has similarly proven capable in the application of motor control and near-term action planning. With these technologies, and an abundance of training, we are discernably closer to Sci-Fi level Artifical Intelligence. However, there remains a large gap between the input and output applications. Here we propose a Natural Language model capable of interfacing between the two applications in the most general sense. The basic outline is that natural language becomes the output of recognizers and input of planners. The model must also take into account the appropriate usage of various models available.
To start let's imagine an app that maps the emotions of a user's face caught through a webcam onto that of a generic face. This app could be written as below, eliding the models for face recognition and generation: