Learning by Observation and Instruction

One of the primary factors limiting the capabilities of advanced computer generated forces is the time and effort required to extract, encode, debug, maintain, and extend the knowledge that drives their behavior. The goal of this research is to explore, develop, and evaluate automated techniques for extending and correcting the knowledge of advance synthetic forces. These techniques will not only improve the development of synthetic forces, but will also make it possible to quickly correct and customize tactics, through both instruction and learning by observation. Such techniques can lead to more capable, and more realistic synthetic forces that in turn support more accurate and more realistic simulation environments for training, mission rehearsal, and analysis.

The current approach for building synthetic forces relies on an iteration of multiple stages. The knowledge engineer(s) starts by consulting existing manuals for all “formal” information available on the desired behavior of the synthetic forces (SFs). Formal documents, such as field and training manuals, provide only bare-bones specification of behavior. They rightly assume that there will be other forms of instruction (classroom and briefings) as well as field training and experience. Thus, the knowledge engineer must rely on a subject matter expert (SME) to “fill in the details.” This involves extensive interviews followed by the development of the SF. However, once the SF is built, there must be additional rounds of interviews as the knowledge engineer discovers areas in which the knowledge is incomplete or incorrect. In addition, it is necessary for the expert to view the behavior to verify the correctness of the behavior. This is critical because it is extremely difficult for the knowledge engineer and the SME to specify all aspects of behavior, especially when there are interactions between different goals/objectives in the SF. This complete process is very time consuming and has the additional flaw that at some point it stops, not providing a means for continual improvements which are crucial given the dynamic nature of both available weapons systems and doctrine.

We are pursuing two basic approaches to automatically and semi-automatically construct synthetic forces: learning by observation and learning by instruction.

Learning by Observation

We are developing technology that involves detailed monitoring of a human performing the desired tasks. Machine learning techniques are used to induce and extract the knowledge required to generate that behavior in a SF. This technique is called “learning by observation,” or “behavioral cloning” and has been demonstrated for learning to perform simple maneuvers in a simulated plane by Claude Sammut and his colleagues. However, his techniques required manual segmentation of the human’s behavior. We have recently demonstrated an extension to his technique for a small bit of tactical behavior where real-time annotations are provided, making the manual segmentation unnecessary. We see this as a very rich area of research on issues dealing with directly extracting knowledge and performance data from SMEs. This will allow us to not only build synthetic forces quickly, but to build synthetic forces customized to specific individuals or groups of individuals. Thus, this may make it possible to quickly construct synthetic forces that use a specific country’s tactics, or to create a group of synthetic forces that model the diversity and variety of behavior of an actual fighting group.

Learning by Instruction

We are also developing technology to create SFs that can receive instruction directly from an SME to correct their behavior for the current situation. In this approach, either the SF recognizes that it needs help and requests assistance from an SME, or the SME notices some errors in the SF’s behavior. In either case, the SF momentarily pauses the simulation and requests instruction. The SF then follows instructions from the SME as it is performing its task, and then learns from them, thereby avoiding future instruction. This focuses the knowledge acquisition process on those places that need correction. Learning from instruction is an extremely complex process to automate. However, restricting the instruction so that it comes while the learner is performing a task, greatly simplifies the problem. This approach makes it easier for the instructor because there is no need to pre-organize the material into a coherent presentation, and think of all possible cases. Instead the instructor just views the behavior of the learner, and only has to determine what he would do in the same situation. The instructor can then provide assistance when requested (because of lack of knowledge of the learner), or correct the learner when mistakes are made. The learner has a much easier job because there is no need to try to interpret general instructions and determine how they apply to arbitrary future situations. Instead, the instruction can almost always be applied directly to the current situation (or to close hypothetical situations). From a specific example, the learner can use its existing knowledge of the domain to explain to itself why the instruction was appropriate, and then generalize the experience so it can use it in the future (using a form of learning called Explanation-Based learning). Instructo-Soar is a system developed at the University of Michigan under the supervision of the PI, which learns from instruction.

Instructo-Soar uses a specialization of the technique described above to extend its knowledge of how to solve problems. Instructo-Soar learns new procedures and extensions of procedures for novel situations, and other domain knowledge such as control knowledge, knowledge of operator effects, and state inferences. In this project, we will be taking the basic structure of Instructo-Soar and applying it to a much more dynamic domain, which requires significantly more domain knowledge.