Ad-Hoc OLAP databases with Yertl and HANA
In my previous posts I showed how to set up an ODBC connection to cloud-hosted HANA (HCP) and then, in a follow-up post, I drawned on moving data over such channel on the fly using an ETL fremework called Yertl. In less then two weeks I will be giving a short talk at YAPC EU 2015 related to these topics. Specifically, I would like to share some ideas about building ad-hoc OLAP databases using the two tools (Yerlt and HANA).
Most of the applications we're building are about on-line transaction processing, when single rows are inserted, retrieved, updated and deleted. That fact implicates some DB design decisions that typically boils down to having some level of normalization. As the system grows, we often realize that archiving unnecessary data is a pain-staking exercise, when done leaves the tables fragmented and the exceptions from normalization (such as bitmasks) create issues when building some reporting queries non so typical for an OLTP database. What we typically do, if there's a need for analytical processing (OLAP), in such cases is that we create a second, de-normalized schema (even in the same DBMS instance) and take an effort to keep the data in syc. Then there's all sort sort of gymnastics you'll do to have these additional persisted objects in the database to make frequent queries run faster...
An alternative that I've been testing involves setting up an ODBC/TLS channel between the source, OLTP system to an in-memory data base (HANA) and using Yertl to upload the data, doing some data transformation and filtering on-the-fly. Once the data has been transferred, we can create non-persistent reporting views that will be the base of ODBC/MDX analytical reports. Because of in-memory nature of the database the schema layout is pretty simple. Additionally, the MDX interface makes the database accessible to a number of MDX-speaking reporting tools including MS Excel. Unfortunately, Perl cannot be coupled with HANA this way but this isn't a real limitation in the scenarios that I was considering.
Slides from the talk are available on SlideShare