Being in the know, and responding to real events, in real-time, is a natural inclination. Most people watch their sports live, and most people watch their news live too. And just as it’s more exciting to stream a live event than it is to wait to watch it on-demand, many feel that the best data-driven insights are the ones acted upon as soon as the data — especially from the Internet of Things (IoT) — is generated.
The great data at-rest/data in-motion divide
But most query mechanisms are based on the paradigm of retrieving a subset of data that has already been inserted into a database — akin to waiting for the evening news to finish airing, then waiting for it to download, and then viewing a particular segment or story from one’s local hard drive.
And most analytics tools, visual or otherwise, are built on that rubric too: users drag and drop things into a visualization, with those actions generating a SQL or MDX query that returns some subset of the data already stored in a conventional database.
The reading of streaming data, meanwhile, has largely been based on some engine, triggering some piece of code, at the time that some piece of data arrives. So streaming data processing has been based on imperative code handling a row of data at a time, while conventional querying, and most BI tools, have been based on declarative code processing an entire set of data.
Bridging the gap
Such differences in tech are often referred to as impedance mismatches, a name that is quite apropos here, given how this mismatch has indeed impeded data-driven insight by business users, analysts and even Enterprise developers. It has also prevented them from realizing most of the benefit from IoT technologies.
Recently though, various streaming data platforms have implemented their own dialect of SQL, adapted to work over data that hasn’t yet arrived. These dialects model the data stream as if it were a special table in a database. Essentially, the query becomes a filtered view of data as it arrives.
Support developers; tools come along for the ride
By merging the query and stream paradigms, developers familiar with SQL can start to work more ably with streaming data. But the perhaps more valuable part is that downstream data technologies, like drivers/connectors and BI tools themselves, can also also work more ably with streaming data. In other words, by conforming streaming data processing to conventional query mechanics and syntax, the associated ecosystem of data querying tools and technologies can, with with some engineering work, become streaming data tools themselves.
Apache Kafka, arguably the most popular open source streaming data platform, has added its own SQL dialect and interface, called KSQL. KSQL was first announced by Confluent — the major commercial entity supporting Kafka — in a blog post last August, and its general availability was announced just last month. As with other such dialects, it asks developers to adapt to the idea that the query’s “result set” will be constantly changing. And if a visual analytics tool can adapt likewise, by continually updating visualizations returned by a query — instead of rendering them statically — it can be the downstream beneficiary of KSQL’s power.
Coming to a desktop near you
With the release of Arcadia Instant for KSQL, one analytics vendor has attempted just such a transformation of its product, and has made it available for free. While Arcadia Data announced KSQL integration last month, it was limited to its Arcadia Enterprise product. Today, Arcadia is announcing similar integration for Arcadia Instant, the product’s free version.
And while that lowers a big barrier to entry, Arcadia Instant for KSQL goes even further — it gets users past the very real difficulty of setting up Kafka and KSQL in the first place. Rather than limiting the KSQL capabilities to users who can connect Arcadia Instant up to running Kafka and KSQL clusters, Arcadia has created a Docker container image that includes both, along with a streaming data source.
Once that container image — also a free download — is in place, users can just point Arcadia Instant at it, and test out the functionality, all on a desktop computer. Arcadia has also created a Getting Started guide, to help users get everything running. After all, most business users aren’t Docker jockeys. All three components (the updated version of Arcadia Instant, the back-end Docker image and the Getting Started guide) should be available from a single Web page by the time you read this post.
In addition to the accommodations for business users in today’s announcements, both they and the hardcore techies already using Arcadia Enterprise get some goodies too: Arcadia’s KSQL integration has now added support for complex data types, including MAP and ARRAY data. It also gets a brand new feature called Time Warp that allows users to specify different time windows of data in the stream. This, in turn, allows viewing data in the recent past and permits users to “pause” and “replay” data from specific points in time.
I can’t (yet) personally vouch for this solution, as Arcadia Data briefed me on, and allowed me to write about, it in advance of its release. I’m eager to download and install it though. I’ve been saying for years that streaming data processing won’t become mainstream until developers and users can treat it as a special case of conventional query and analytics. If Arcadia has succeeded here, you can bet other vendors will follow suit. And then maybe digital transformation and data-driven decision making can proceed as more than aspirational concepts.