Relational databases have existed almost 50 years since IBM’s System R was built as a research project in San Jose. The Postgres research at UC Berkeley then added object concepts to relational databases. The still-popular open-source product, PostgreSQL, is an example of how powerful and relevant object-relational database management systems (ORDBMS) are. Other examples are Oracle Database, about to celebrate its 45th anniversary, along with Informix, IBM DB2, and Microsoft SQL Server.
But relational databases are not suited to handle all business data needs. Rather than reviewing historical information, some organizations want immediate information from current data that loses its value in a short time frame. Real-time analytics is used to gain insights into data as soon as it’s created to make immediate or rapid decisions.
What are some areas where this makes sense?
- Fraud detection
- Consumer credit scoring
- Retailers who want to target individual customers for cross-selling
- Customer relationship management (CRM)
- Supply chain and logistics
- IT systems to detect failures, bottlenecks, and security risks
- Monitoring objects, including wearable devices
A company using real-time analytics is Netflix, with over 150 million customers globally. Netflix remains competitive with over half the market share of the American streaming industry by concentrating on user experience and content. Netflix uses the data to provide each user with individualized user experience and recommendations based on their personal activity and interaction with the Netflix platform. The bottom line is Netflix wants users to continue watching rather than growing bored (Markman, 2019; Xu, 2018).
As previously mentioned, ORDBMS is an inappropriate technology for real-time analytics due to the large volume of high-velocity inbound data, especially if mobile data is evaluated. What kind of technologies are used by real-time analytics?
- In-database analytics eliminates the need to transform data and move it between the database and analytics application. Examples are IBM Netezza, PureData Systems, and DB2 Warehouse, Teradata Vantage, and SAP (Sybase) IQ.
- Processing in memory (PIM) or In-memory database (IMDB) queries data while it is in the random access memory (RAM), rather than querying data stored on physical disks. However, RAM is volatile, and the data could be lost due to an outage. Still, non-volatile random-access memory technology has been introduced to prevent data loss. Examples are SAP HANA, Oracle Exalytics, and SAS offers a suite containing multiple solutions.
- Massively parallel programming (MPP) is a storage structure that coordinates multiple processors working independently and simultaneously yet communicating through a messaging interface. Both grid computing and computer clustering are possible architectures. Examples are IBM Power Parallel Systems, Intel Scalable Systems, and Unisys Open Parallel Unisys Server (OPUS).
If your business is interested in real-time data analytics to remain competitive, rather than managing the underlying IT architecture, why not choose a cloud-based data streaming tool? These are some of the more popular services:
- Amazon Kinesis. Available with basic reporting and insights, but the addition of machine learning algorithms leads to in-depth analysis.
- Apache Storm is an open-source tool for real-time data evaluation built by Twitter. Also can be integrated with Hadoop.
- Azure Stream Analytics also offers machine learning to detect outliers and help users interpret output visualizations.
- Google Cloud DataFlow
- IBM Streaming Analytics
- Oracle Stream Analytics (OSA)
Let’s look at Oracle’s Stream Analytics to perform a deeper exploration of potential options, whether as a managed service in Oracle Cloud or as an on-premise deployment.
- OSA incorporates best-of-breed technologies built on the latest Apache Spark Streaming technology and Apache Kafka infrastructure. OSA’s native support of Oracle GoldenGate change data capture and automated check-pointing to ensure zero data loss and exact-once processing. Distributed caching is provided through the Oracle Coherence In-memory cache seamlessly embedded as part of the Spark deployment. Machine Learning models developed in Jupyter Notebook (Python or Spark), R Studio, or Matlab can be imported to create scoring stages and allow OSA to be the Machine Learning runtime 24×7 in production.
Oracle Stream Analytics Integration Architecture Diagram
- OSA includes the ability to interactively design streaming pipelines, without hand coding, to address critical real-time use cases.
Designing Streaming Pipelines with Oracle Stream Analytics
- OSA includes extensive pattern libraries applicable to a wide array of use cases such as Dynamic Pricing, Smart Inventory, Digital Marketing, Fraud Detection, Predictive Maintenance, and Real-Time Log Analytics.
Oracle Stream Analytics Pattern Library
- OSA includes a library of over 30 visualization charts, based on Apache Superset. This gives users multiple ways to create and explore data for critical decision-making.
Druid Superset Cube Visualizations
- OSA includes platform targets for a wide range of industries and functional areas.
Oracle Stream Analytics with Google Maps Geofence tile layer
Markman, J. (2019). Netflix harnesses big data to profit from your tastes. Forbes. https://www.forbes.com/sites/jonmarkman/2019/02/25/netflix-harnesses-big-data-to-profit-from-your-tastes
Oracle. (n.d.). Understanding Stream Analytics. https://docs.oracle.com/en/middleware/fusion-middleware/osa/18.1/understanding-stream-analytics/overview-oracle-streaming-analytics.html
Oracle. (2018). Oracle data sheet: Oracle Stream Analytics. https://www.oracle.com/us/products/middleware/data-integration/oracle-stream-analytics-ds-4478221.pdf
Xu, Z. (2018). Keystone real-time stream processing platform. Netflix Technology Blog. https://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a