INNOVATIONS IN DATABASES A technical and juridical perspective Prof. Dr. Guy De Tré Ghent University Database, Document and Content Management Big data management challenges NoSQL database solutions Juridical challenges OUTLINE 2 BIG DATA MANAGEMENT CHALLENGES 3 Data which have such characteristics that they cannot be efficiently handled by conventional information systems BIG DATA: A DEFINITION… 4 Volume: Variety: Big data Heterogeneous data Velocity: Fast data Veracity: Bad data BIG DATA: FOUR MAIN CHARACTERISTICS 5 Big data: scaling up to distributed data storage (availability vs. consistency) Heterogeneous data: avoid data transformations Fast data: avoid data processing overhead Bad data: data quality assessment and handling NEW CHALLENGES FOR DATA MANAGEMENT SYTEMS 6 Key-value stores Document Column stores stores NOSQL DATABASE SOLUTIONS 7 Data schema (no database schema!) Bezoekersopinies BID, tijdstip Waarde B1, 15/1:14u00 ‘zaal 1’ B1, 15/1:14u01 ‘zaal 1, niet leuk, te veel volk’ B1, 15/1:14u02 ‘zaal 1, 2/10’ B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’ SQL + B1, 15/1:14u03 ‘zaal 1, ’ B2, 15/1:14u03 ‘zaal 1, 9/10’ B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’ B2, 15/1:14u04 ‘zaal 1, meer van dit pls’ B3, 15/1:14u04 ‘zaal 1, zo een drukte’ NoSQL KEY-VALUE STORES 8 Limited interaction via API Bezoekersopinies BID, tijdstip Waarde B1, 15/1:14u00 ‘zaal 1’ B1, 15/1:14u01 ‘zaal 1, niet leuk, te veel volk’ B1, 15/1:14u02 ‘zaal 1, 2/10’ B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’ B1, 15/1:14u03 ‘zaal 1, ’ B2, 15/1:14u03 ‘zaal 1, 9/10’ B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’ B2, 15/1:14u04 ‘zaal 1, meer van dit pls’ B3, 15/1:14u04 ‘zaal 1, zo een drukte’ Get(B1, 15/1:14u02) Result: ‘zaal 1, 2/10’ Put(B3, 15/1:14u05, ‘zaal 1, Waw!’) Delete(B1, 15/1:14u02) KEY-VALUE STORES 9 Data distribution (horizontal scaling – consistent hashing) KEY-VALUE STORES 10 Data schema (no database schema!) SQL + NoSQL DOCUMENT STORES 11 More advanced interaction via API db.opinies.find() db.opinies.find({plaats: “zaal 1”}) db.opinies.find()sort({score: 1}) db.opinies.find({score: {$gt:8}}) db.opinies.find({score: {$gt:8}}, {plaats: “zaal 1”}) db.opinies.find({$or[{score: {$gt:8}}, {plaats: “zaal 1”}]}) DOCUMENT STORES 12 Data distribution (horizontal scaling – sharding) Replica sets with Master/Slave replication DOCUMENT STORES 13 Data schema (no database schema!) SQL COLUMN STORES + NoSQL 14 SQL-like interaction via API SELECT taal FROM Bezoeker WHERE naam = ‘Yana’ SELECT commentaar FROM Opinie WHERE score<5 SELECT COUNT(*) FROM Opinie WHERE dag=‘15/1/2016’ No relational database style joins supported! The application should handle that. COLUMN STORES 15 Data distribution (horizontal scaling – partitioning and replication) Horizontal partitioning and replication COLUMN STORES 16 NOSQL DATABASES 17 JURIDICAL CHALLENGES 18 19 20 Sourcing Analysing Using Personal data protection • Privacy • Privacy compliance as competitive advantage • Purpose limitation Antidiscrimination • • • • Ethical issues Restrictions to automated decision making Gender Act Racism Act Anti-discrimination Act • Profiling • Right to correction and removal 21 Sourcing Cloud Competition Data ownership Analysing Using • Sharing personal data with third party, within group • Store personal data on centralized system • Principle prohibition with exceptions • Pricing based on behaviour of the consumer • Charging customers a different price for the same product • IP protection for database owners • Contractual protection • Confidentiality 22 New rights for individuals New obligations for companies Stronger enforcement of infringements 23 THANK YOU For your attention Guy De Tré Database, Document and Content Management 24 UGAIN UGent Academie voor Ingenieurs Opleiding Big data 25