Semantics for Big Data Fusion and Analysis: Improving energy efficiency in smart grids

Motivation

The increasing computational capabilities for information acquisition and storage have led to a massive increase of available data in different application domains, such as e-commerce, healthcare, energy services, or the public administration. Interestingly, a significant amount of these data is offered as open data and can be freely accessed on the web. Big Data technologies provide a framework for processing large volumes of data, and to extract meaningful knowledge from them.

Undoubtedly, Big Data analysis offers a great opportunity to gain insights into many aspects of the society and the organizations’ performance. However, current approaches mostly focus in problems that involve homogeneous data with a simple structure. They do not use different information sources, even though they can be essential to discover relevant hidden knowledge. This is the case of data analysis in smart grids. In order to answer questions like ‘what is the expected trend of energy demand for a district’, we need to consider existing records about energy consumption in the past, but also geographical, temporal, socio-economical, and weather information, to name some of the most relevant contextual influences. For example, a wealthy district of an inland city close to the coastal border may have unexpected low energy consumption during a hot summer because of the outgoing tourism.


Objectives

The BIGFUSE project studies new Information Fusion techniques to address the challenge of extracting knowledge from multi-source Big Data. Data combination implies incompatible formats, which must be expressed in a common language; and more importantly, semantic heterogeneity, which requires the alignment of diverse conceptual schemas based on information meaning rather than data format. Once data has been translated into a uniform representation with agreed semantics, it is ready to be processed by a knowledge extraction procedure, which will construct an overall picture describing or explaining the raw facts. These problems remain unsolved in the Information Fusion research area, and have not been studied in depth in Big Data setups.

The theoretical results of the project are being applied to increasing energy efficiency in smart grids. Smart grids are the next generation energy distribution networks, in which the behavior of any connected components and agents is detected, recorded, and analyzed to guarantee sustainable and efficient energy management. Consequently, smart grids produce large amounts of data that cannot be processed by conventional means. The project provides solutions to interpret these data, which requires incorporating additional information not directly collected by the network, and possibly generated by different organizations.

Collection and storage of massive data entails several threats to privacy rights. Specifically, domestic electricity consumption data can include personal data and can also be used to identify private activities. In such scenario, it is necessary to characterize and protect sensitive data according to the legal restrictions. A more serious threat appears in Big Data Fusion, since information sources, each one isolated, may not violate privacy, but personal data may emerge after they are combined and processed. For instance, we may know from one source the behavior of a type of household, e.g., one with very few inhabitants and large size. If these values are particularly unusual in the district, it can be relatively easy to identify to whom the energy consumption pattern belongs, thus resulting in a data privacy breech. Consequently, the project also studies the impact of Big Data technologies to privacy rights.


Methodology

The methodological approach of the BIGFUSE project relies on two cornerstones: Semantic Web technologies and Knowledge Extraction algorithms. The project studies the many opportunities and challenges arising from exploiting these tools in Big Data setups. Semantic Web technologies, based on ontological and graph-based knowledge representations, provide an infrastructure for publishing, storing, retrieving, reusing, integrating, and analyzing data founded on open and well-established standards. They allow building a web of formally described, linked, and accessible data without requiring a global agreement on the conceptual models. Data mining algorithms, in turn, can be used to automatically discover underlying non-trivial knowledge from datasets by applying intelligent analysis techniques. The project focuses on association, anomaly, and exception detection techniques, which have proved their potential with smaller datasets. The research process is inspired by the privacy by design principles for proactive engineering of privacy and ethics into the system architecture, design, and use. The study of legal and ethical framework for respecting privacy will be based on an analysis of the relevant national and EU legislation; e.g., the Royal Decree 1720/2007 and the 2016 reform of the European data protection laws.


Impact

The contributions of the BIGFUSE project will have an impact on the implementation of new policies, tariffs, educational campaigns, etc. aimed at increasing energy savings and reducing the environmental impact of energy production. These are two major goals of the directive 2012/27/UE of the European Parliament on energy efficiency, as well as one of the major challenges of the Spanish R+D+i Programme 2013-2015 (c. Secure, Sustainable and Clean Energy) and the Horizon 2020 programme (3. Secure, Clean and Efficient Energy). To expand this research work beyond the project's initial scope, the project develops several activities to establish a network of excellent collaborators, and to increase the international impact of research carried out in the University of Granada.


Team