Data Mining and classification
The Classification and Data Mining (CDM) theme, lead by Pierre Gançarski, focuses on machine learning and knowledge extraction from complex data (eg. images, databases, etc.). Our research have two aims: on the one hand, it consists in conceiving and implementing knowledge extraction methods, and, on the other hand, to apply those methods to analyse databases and numerical images. Our approaches belongs to the machine learning area, in particular to clustering and relational data mining. Our main application domains are remote sensing or medical images, biochemical data, and also customer relationship management.
Our theme is articulated around different aspects:
- FODOMUST: Multistrategy data mining
- FODOREL: Relational data mining
- FODOST: Structured data mining
- FODOGECO : Data mining and Knowledge
Contents |
Main collaborators
Pierre Gançarski
Full ProfessorAline Deruyver
Senior LecturerAgnès Braud
LecturerGabriel Frey
LecturerNicolas Lachiche
Senior LecturerCedric Wemmert
LecturerCecilia Zanni-Merk
LecturerFrançois de Bertrand de Beuvron
LecturerCamille Kurtz
PhD studentFrancois Petitjean
PhD studentBruno Belarte
PhD student
Former PhD students
Jonathan Weber
ATER Université de Nancy
Operations
FODOMUST: Multistrategy data mining
Our works on multistrategy data mining follow three axes:
- collaborative clustering methods: the idea is to improve the result of one clustering on some data by using an ensemble of clustering methods and making them collaborate. Keywords: data mining, collaborative clustering, unsupervised classification, knowledge extraction, complex data
- evolutionary approaches for feature weighting in Kmeans based algorithms: in the Maclaw method (Modular Approach for Clustering with Local Attribute Weighting), each population only looks for weights and center for one cluster. In our methods, all the populations work in a cooperative way. All populations
search for the partition built from local solutions proposed by individuals that minimizes the LKM cost function. Each individual from a population is evaluated according to individuals from other populations: the better the best result that can build using its solution, the better the individual evaluation.
- integration of background knowledge in clustering: either directly in Samarah, or in classical techniques such as Kmeans (Germain Forestier's PhD thesis)
All aspects are implemented in a common platform Samarah (Multi-agent learning system for the automatic refinement of hierarchies).
Many of these works have been realized in collaboration with the Laboratoire Image et Ville (UMR CNRS/UDS 7011) and have been validated in the remote sensing domain. Thus, the main application domain of our methods is the clustering of remote sensing images, and generally the clustering of images. Our main application area is the automatic remote sensing classification.
In parallel, we work on the automatic structuration of sets of images and video sequences.
FODOREL: Relational data mining
Relational data mining deals with the knowledge extraction from (relational, of course) databases, and more generally with inductive learning from data that cannot naturally be represented as a single attribute-value table, e.g. chemical reactions.
Our application areas include:
- chemistry
- water quality
- customer relationship management (CRM)
- geography
Our research topics are:
- rule discovery
- naive bayesian classifiers
- ROC based optimisation
- propositionalisation, and more generally representing the problem and data preparation
FODOST: Structural data mining
Structural data mining concerns the knowledge extraction from complex data structured by spatial, semantic, or temporal dimensions. Our aim is to make use of the structure between objects to cluster.
We work from multisource, multiview, multiresolution and multitemporal data, mainly in the remote sensing domain.
FODOGECO: Data mining and knowledge management
We are concerned here with the semantic interpretation of high-resolution satellite images. To enable the identification of high-level structured objects (house, street…), it is necessary to merge the classifications the regions coming from the analysis of the images with from inferences based on a geographical ontology. This ontology needs to describe not only the urban objects, but also their spatial qualitative and quantitative relationships.
Main projects and collaborations
- Ongoing projects
- FOSTER Spatio-temporal data mining - application to the understanding and monitoring of erosion (January 2011 - March 2013)
- Objective : The FOSTER project aims at building, from the available data, dynamic models to support monitoring and studying the evolution of the environment, actually the erosion. Data consist of satellite images, (symbolic and numerical) spatio-temporal data, and background knowledge. Those data are heterogeneous, multi-scale, noisy, and have missing values. They are large, for instance a satellite image and the associated Digital Elevation Model (DEM) need 26Gb.
The environmental processes are complex and we would like to learn several models, eg. sharp changes as well as monitoring slowly evolving phenomena. Studying dynamic systems evolving in space and time raises questions on the spatial relation and temporal evolutions to consider. The area studied in this project are located in New Caledonia (a travel over there is possible) and French Alps. - Web site : http://foster.univ-nc.nc
- Postdoc position. The post-doctoral research fellow will contribute to the following tasks:
1) complement the Samarah collaborative approach by integrating mecanisms to deal with spatio-temporal data and constraints/background knowledge. On the one hand, semi-supervised clustering techniques will be integrated. On the other hand, supervised/active learning techniques will be considered, in a complementary/collaborative approach.
2) conceive and evaluate new multi-step collaborative methods for spatio-temporal data mining. One important research issue concerns the back-propagation of information through learning processes.
We look for a PhD with an experience in data mining, and if possible in remote sensing. The candidate should be able to dialog with geologists to validate methods and results.
Duration: 1 year renewable once
Salary: around 2000 € NET Income
Location: LSIIT - Université de Strasbourg, France
Contact person: Pierre Gançarski (gancarski@unistra.fr)
- Objective : The FOSTER project aims at building, from the available data, dynamic models to support monitoring and studying the evolution of the environment, actually the erosion. Data consist of satellite images, (symbolic and numerical) spatio-temporal data, and background knowledge. Those data are heterogeneous, multi-scale, noisy, and have missing values. They are large, for instance a satellite image and the associated Digital Elevation Model (DEM) need 26Gb.
- CNES (National center for space studies)
- ORFEO GT3 study (2010-2011 and 2011-2012): modelling objects of interest in remote sensing images and their spatial relationships for a knowledge extraction guided by those informations
- PhD grant with Thalès (2009-2012): clustering of temporal sequences of heterogeneous satellite images
- Roche pharmaceutical company (2010-2012): analysis of images to extract knowledge about the efficiency of drugs
- DAHLIA (2010-2013): PhD grant with Christophe Collet (MIV team) on monitoring daily activities of elderly people at home
- FOSTER Spatio-temporal data mining - application to the understanding and monitoring of erosion (January 2011 - March 2013)
- Past projects
- GeOpenSim (2007-2011): learning the classes and the evolution rules of urban areas.
- RBS (2007-2010): automatic structuration of sets of images and video sequences
- ECOSGIL (2005-2008): extraction of spatial knowledge for an integrated management of the littoral
- FoDoMuSt (2004-2008): multistrategy data mining on remote sensing images
- CNES (2007-2008): interactive interpretation of remote sensing images
Former trainees
- Mickaël Fromeyer (MSc) : collaboration of clustering methods
- Benoît Zugmeyer, (MSc) : Rebuz Project
Publications in international journals



