Making Databases Smarter with Optimization
PLATO Lecture Series Spring, 2014 – Greeners on the Cutting Edge
Monday April 21, 1:30-3 pm, LH1
Emir Pasalic, LogicBlox, Atlanta GA.
The path from a large collection of passive data to actionable intelligence in the enterprise software environment is fraught with an astonishing degree of unnecessary complexity. At LogicBlox, we have been addressing this problem by designing a single software platform that combines traditional database capabilities (transactions, parallelization, distribution, durability) with analytics (optimization, machine learning) unified by a single declarative language (based on Datalog).
In this talk, we show how a whole class of mathematical optimization techniques (mixed integer programming) can be tightly integrated with a database. This abstracts away the complexity of data management, transformation and inter-operation with complex software artifacts (mathematical programming solvers), while allowing the programmer to specify large-scale optimization models at the domain-appropriate level of abstraction. The result is a “smart database” whose “tables” can store not just values, but numerical values that are optimal with respect to some objective function and obey a set of numerical constraints.
We present an example based on a real world application of optimizing supply chain for a large retailer.
Emir Pasalic is a computer scientist at LogicBlox, Atlanta GA. Emir’s interests include multi-stage programming, type theory and domain specific languages.designs; currently, he implements database platform support for mathematical programming, optimization and in-database machine learning. As a postdoc at Rice, he worked on adding dependent types to OCaml by plugging Coq into its type checker and on program generation (e.g., staging dynamic programming algorithms) at the Rice PLT group.
Companion Reading. The Companion Reading for Student Originated Software (below) might be daunting for students with (as yet!) little preparation in computing. See the Alternative Readings (below) about Big Data, the current popularity of which is one rationale for the enhanced database languages (and programming) that LogicBlox offers:
- First,read a little about LogicBlox Technology , then look at (2), and finally try (3)
- Then, read: Shan Shan Huang, Green, Todd Jeffrey and Loo, Boon Thau. Datalog and Emerging Applications: An Interactive Tutorial. SIGMOD’11, June 12–16, 2011, Athens, Greece. ACM 978-1-4503-0661-4/11/06.
- Finally, work through the Interactive Tutorial: LogiQL in 30 Minutes
- and think about how this differs from traditional database (SQL or Hadoop) development.
For further work in this area, see
- In particular, Section 6: The authors conclude their exposition of Datalog with some example applications: program analysis, declarative networking, data integration and exchange, and enterprise software systems. For each domain, they highlight language extensions, runtime considerations, and use cases. They also briefly survey other applications.
Green, T. J., Huang, S. S., and Loo, B. T. Datalog and recursive query processing. Foundations and Trends in Databases, Vol. 5, No. 2 (2012) 105–195. c 2013 T. J. Green, S. Huang, B. T. Loo, W. Zhou, DOI: 10.1561/1900000017 - Mathematical Programming for understanding DataLog and LogicBlox. An accessible introduction to Linear Programming (in some sense the simples of mathematical programming techniques) is by the late Saul Gass (An Illustrated Guide to Linear Programming). Also, see any number of freely available online courses and/or books.
Alternative Readings (for non-SOS students with little preparation in computing):
- David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani, The Parable of Google Flu:Traps in Big Data Analysis BIG DATA, Science, Vol. 343 3/14/2014, pp 1203-5.
- Gary Marcus and Ernest Davs, Eight (No, Nine!) Problems With Big Data, New York Times, April 6, 2014, p. A23.
- Daniel Halperin, Konstantin Weitz, Bill Howe, Francois Ribalet, E. Virginia Armbrust. Real-Time Collaborative Analysis with (Almost) Pure SQL: A Case Study in Biogeochemical Oceanography. Scientific and Statistical Database Conference, Baltimore, MD. July 2013.