Research program – Orpailleur

Knowledge discovery in databases (KDD) is aimed at discovering patterns in databases. From an operational point of view, the KDD process is based on three main operations, (i) data preparation, (ii) data mining, (iii) interpretation of the discovered patterns. KDD is based on data mining methods which are either symbolic or numerical. In the Orpailleur Team, symbolic methods are based on pattern mining, Formal Concept Analysis (FCA) and extensions (Pattern Structures and Relational Concept Analysis). Patterns can have different forms such as itemsets, sequences, trees, graphs… Numerical methods which are of interest for the team are based on statistics and probabilities, and includes among others Hidden Markov Models, SVM, RF, and deep learning.

Domain knowledge, when available, can improve and guide the KDD process, for implementing Knowledge Discovery guided by Domain Knowledge (KDDK). Knowledge discovery is a core task in knowledge engineering, with an impact in various semantic activities, e.g. information retrieval, recommendation and ontology engineering. Moreover, KDD can also be regarded as an exploratory process supported by adapted graphical interfaces for helping the analysts in their decisions. KDD and variations are used in application domains such as agronomy, astronomy, biology, chemistry, cooking and medicine.

As a huge data repository, the web of data constitutes a good platform for experimenting ideas on knowledge engineering and knowledge discovery. In particular, text mining can used for mining the web of data and for ontology engineering as well. Text mining should take into account the characteristics of texts, as they are complex objects written in natural language. Moreover, ontologies in their form and content show similarities with concept lattices as constructed by FCA. In a concept lattice, concepts have an extent, i.e. a set of instances related to the concept, and an intent, i.e. a description given by a set of attributes. In addition, concepts are partially ordered thanks to a subsumption relation, and this organization can be used for navigation and search.

Then, research questions of high interest exist such as, how the web of data, as a set of potential knowledge sources (e.g. DBpedia, Wikipedia, Yago, Freebase), can be mined for guiding the search for ontological concept definitions and the design of knowledge bases, and how knowledge discovery techniques can be applied for improving the usability of the web of data (e.g. Linked Data classification).