The diagram above illustrates an alternative simple solution with a single real time data flow from source to dashboard. The critical component that makes this possible is the NewSQL database technology (eg. VoltDB, NuoDB or MemSQL) which supports full ACID consistency while processing millions of transactions per second.
The components in the above solution are:-
- Apache Flume : An optional component for high throughput data capture of web logs for clickstream analysis.
- Apache Kafka: For fault tolerant message queuing and broadcast system
- Apache Spark Streaming: For near real time in memory data processing and transformation. Also consider Apache Storm or Flink.
- Hadoop / HDFS: An optional component for long term inexpensive storage, and a Data Lake
- VoltDB: For real time data ingestion and storage at millisecond latency in addition to real time analytics. Also consider MemSQL, NuoDB and CockroachDB.
- Tableau: For analytic presentation and dashboards.
The advantages of this architecture are:-
- Transformation Simplicity: With all data transformation logic in the Spark Streaming component (using industry standard SQL), there’s no code duplication or multiple technologies to cause maintenance issues.
- Real Time Accuracy: As the database solution provides full relational support and ACID compliance at millions of transactions per second, there’s no issue around eventual consistency from NoSQL solutions.
- Analytic Simplicity: In common with many NewSQL databases, VoltDB supports real time analytics using industry standard SQL which is simply not possible on NoSQL solutions. In addition, dashboard users (for example Tableau), can directly connect to the database, and seamlessly query results without the need to combine data from multiple sources.
Of course any real time solution must fit into an existing batch oriented architecture including integration into a data lake, and the solution includes an additional feed into data into Hadoop HDFS for subsequent long term storage and batch processing.