Interactive visual event-sequence mining
Project leader: Katerina Vrotsou
Grant supported from: CENIIT
Background and industrial relevance
The number of systems which produce high-dimensional event-sequence data is vast, many of them being significant to both society and industry. Examples can be found in the financial and bank sectors as sequences of transactions; in web logs as sequences of web page views; in business and administration sectors where processes comprise sequences of discrete events; in industrial processes where monitoring and early warning systems produce sequences of alarm-events; in the medical sector through the use of medical records for diagnostics and treatment planning; in the gaming industry where a player’s actions are logged as sequences of events; and in the social and transportation sciences through, for example, activity and travel diaries.
The analysis of event-sequence data is most often concerned with the identification and exploration of relationships as patterns between data items. For example, questions such as which combination of events has led to the current situation, or given a certain sequence of events what are the possible outcomes, can lead to the inference of rules of the form: if event a is followed by event b then, with an 85% support, event c will occur next. Sequential pattern mining can thus be defined as the problem of identifying collections of events that occur close to each other in a certain order, and a sequential pattern is, in this context, revealed as a sub-sequence of events.
The large number of systems that produce data which are inherently sequential in nature has made it apparent that sequential pattern mining is an important and challenging data mining problem with applications in a large number of disciplines. As a result, algorithms that aim at extracting sequential patterns from large complex datasets are created and improved continuously by data mining researchers. However, the end user, who will inspect and benefit from the results of such algorithms, is usually not a data mining expert. For them such algorithms operate as a “black box” and their contribution is limited to perhaps adjusting some initial parameters which act as constraints on the algorithms. In general, most existing algorithms offer limited interactivity with the user and they tend to output long lists of patterns that often suffer from a lack of focus. It is, therefore, not uncommon that the user is overwhelmed in the end by a large number of irrelevant patterns.
This is one of the core issues we want to address within this project, namely to increase the interestingness of sequence mining results by interactively adding domain knowledge to the process. Our goal is to add interactivity into sequential pattern mining by allowing the user to guide the execution of the algorithms at suitable points. To this end, we plan to investigate the possibility of using modern visualization techniques to create a “transparent box” execution model for the algorithms, in contrast to the existing “black box” execution model. The results of this research will contribute to create interactive visual sequence mining systems that can more easily incorporate expert knowledge and thus have as outcome a set of user relevant patterns.
We intend to pursue a visual analytics approach in our research which will integrate the processing power of computers with the interpretation skills and domain knowledge of the human user. To achieve this we intend to combine interactive visualization with algorithmic data mining techniques. Incorporating interactive visualization in the data mining process is not limited to visually representing the identified results, even though doing so may help a user in gaining insight into these. We intend to go beyond this and incorporate visualization also in the computation phase of algorithms.
Interactivity to increase interestingness. One of the main issues with many sequential mining algorithms is that they extract too many patterns, which a user then has to go through and make sense of. Many of these patterns are, in fact, irrelevant or uninteresting. We want to increase the interestingness and significance of the extracted patterns by taking into consideration the domain knowledge of the user. Before investigating the creation of new algorithms, we plan to initially break down existing algorithms into their components by stopping the mining process, displaying the current status and allowing a user to intervene. This will be beneficial in several ways. First, it uncovers the mining process successively, giving increased understanding of the effect user-constraints have at each level of the process. Second, it provides insight as to the direction that the search is taking and allows a user to adjust or redirect it by tuning the parameters and thus produce more targeted and interesting results. Adding interactivity to the process in this manner will create a “transparent box” execution model instead of the current “black box”.
Including context in the mining process. Even though patterns are usually mined depending on their event type and order, the space in which they appear in is multidimensional. Often sequences and events are associated with surrounding information setting a semantic context. For instance, the age or sex of a patient, the location where events occurred, or the day of the week when events took place. Mining sequences from multidimensional sequence data can substantially benefit from the use of the “transparent box” execution model to be investigated in this project. Adding interactivity to the mining process will enable domain knowledge to be incorporated into the search and thus define a context for it.
Context in the exploration of results. Apart from considering context in the actual mining process it is also interesting to incorporate semantic context to the inspection of the resulting patterns. We will investigate visual representations that give an overview of the distribution of the identified sequential patterns across different variables of the dataset. This will enable more flexible exploration and comparison of sequential patterns and potentially allow a better understanding of their distribution and the reasons behind it.
Ontologies to set semantics. Apart from allowing the expert user to directly control the mining process in order to apply semantic context manually, as previously suggested, we also intend to explore the use of ontologies for including semantics in the identification of sequential patterns. Ontologies can be used to express important background and expert knowledge and, consequently, can potentially decrease the pattern search space by ignoring patterns which are uninteresting in the context of the search.
Expressive pattern language. Sequential mining algorithms are applied in order to identify patterns and infer some knowledge about these from a data set. Such knowledge, most often, implies simple rules which tend to be in the form of ‘if event a occurs followed by event b then most likely event c will follow’. Different languages have been proposed for specifying sequential patterns of interest. Some of these languages use regular expressions, disjunction, or negation. These pattern specifications can however be enriched and we will investigate more expressive languages in order to incorporate context. We plan to investigate the creation of pattern languages allowing for a richer specification of sequential patterns of interest, in order to extract more sophisticated and context-aware results.
The analysis of event-sequence data is a relatively new field in information visualization. There is currently an opportunity for the creation of a group focused in this valuable area of research. Thus, in the longer term, we envision the formation of a research group within the Visualization Centre at Campus Norrköping which will work with the interactive visual analysis of event-based data of different types. The group will have a technical core but the methodological analysis approaches and application areas will be interdisciplinary. We will work with interactive visualization and algorithmic data mining, we will try to reach out to real world problems and research novel approaches to represent and reason with data, always in collaboration with the user and always with the focus being on both the task and the data. Within these frames we aim to build a solid research base for addressing the analysis aspects of event-based data. This base will, in time, result in a library of analysis tools that will be configurable towards various fields and will grow continuously. On top of this we see strands of application-oriented research opportunities applying our knowledge onto different fields, uncovering new questions and answering them with new tools supporting industry, business, medicine, infrastructure and society.
Research environment and industrial cooperation
The main applicant, Katerina Vrotsou, and co-applicant, Aida Nordman, are research associates in the division for Media and Information Technology at the Department for Science and Technology at Linköping University. As a research group they have extensive expertise in data visualization and data mining which are complementary in the scope of this project. Vrotsou’s research has focused in interactive visualization and visual mining methods for data analysis. Nordman’s research has been concerned with knowledge representation using logic-based languages for reasoning with concrete applications in data mining. The group will also have access to the facilities, programmes and expertise of the Norrköping Visualization Centre, the Norrköping National Meeting Place for Visualization (C-site), and Norrköping Science Park (NOSP). These centres actively and successfully promote visualization through activities supporting collaboration between academia, industry and society.
The project aims to produce results which will be applicable to a wide range of areas. To ensure this, we will conduct our research using event-sequence datasets produced from diverse sources in different domains and in close contact with experts from these domains. These domains include: (a) analysing sequences of game player activity, in collaboration with EA Digital Illusions CE (DICE), (b) the analysis of alarm-event sequences produced from monitoring and early warning systems, in collaboration with Siemens Industrial Turbomachinery AB, (c) the analysis of sequences of events which occur when an emergency or crisis situation arises, in collaboration with the Swedish Meteorological and Hydrological Institute (SMHI).
- Katerina Vrotsou and Aida Nordman. “Interactive visual sequence mining based on pattern-growth.” In IEEE Conference on Visual Analytics Science and Technology(poster). Paris, France, 2014.
Previous related publication
- Katerina Vrotsou, Anders Ynnerman, and Matthew Cooper. "Are we what we do? Exploring group behaviour through user-defined event-sequence similarity." Information Visualization, vol. 13, no. 3, pp. 232-247, 2013.
- Katerina Vrotsou. “Everyday mining: Exploring sequences in event-based data”. Doctoral thesis. Linköping University, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-58311.
- Katerina Vrotsou, Jimmy Johansson and Matthew Cooper. “ActiviTree: Interactive Visual Exploration of Event-Based Data Using Graph Similarity”. IEEE Transactions on Visualization and Computer Graphics, Volume 15, no.6, November/December, 2009. Pages 945-952.
- Katerina Vrotsou, Kajsa Ellegård and Matthew Cooper. “Exploring time diaries using semi-automated activity pattern extraction”. electronic International Journal of Time Use Research, Volume 6, no.1, 2009. Pages 1-25.
Last updated: Tue Aug 18 11:59:22 CEST 2015