Skip to content

Edge Analytics Architecture

System73's ecosystem of products generates metrics that are to be ingested and stored so these can be visualized on dashboards of System73 Platform or issue queries via API to gain a programmatic access.

overlap-data-layers
Edge Analytics architecture

Data architecture

Edge Analytics is designed to provide content delivery metrics with both realtime and historical OLAP queries. Each kind of query targets a specific data layer so basically, there are two main layers: Realtime and Historical.

Like many other databases specifically designed for big data applications, Edge Analytics's database stores data in tables with a large number of columns with denormalized data. Those tables are called datasources and they are central to the implementation of data layers as each data layer maps to a specific datasource.

Each layer has its own capabilities and restrictions to guarantee the same baseline query performance margins across layers.

data-layers
Main data layers

In reality these two layers overlap in terms of time to cover as much data as possible.

overlap-data-layers
Data layers overlapping in time

Realtime layer

Query granularity Data retention Most recent data
Second Last 30 days Within the last 5 seconds

The realtime layer allows queries to obtain the latest live metrics with up to a second resolution/granularity, and it holds the data of the last 30 days. Data is indexed in realtime with a delay of up to 5 seconds.

Historical layer

Historical sub-layer Query granularity Data retention Most recent data
By minute Minute Last 13 months Yesterday at 23:59:59
By hour Hour Last 13 months Yesterday at 23:59:59

The historical layers allows for queriers spanning much longer time ranges (up to 13 months) so this layer is subdivided in two layers. Both layers hold the same data, but they differ in the query granularity property.

While executing a historical query that spans 2 months with a minute granularity might be fast, the same query for a larger time range (i.e.: 6 months or more) is going take longer to complete. To compensate for this you can choose to increase the granularity up to an hour against the historical by hour to shorten the execution times. It's a trade-off of between data resolution & query time range.

The historical layers does not fully overlap with the realtime layer. The most recent data of the historical layers is the last data point from the previous day. In other words, it holds data from the last 13 months up to yesterday at 23:59:59.

The following is the time boundaries (data retention) of the main data layers

time-data-layers
Data layers time boundaries

Important

Granularities are predefined (e.g: second, minute, hour, day, week, month, etc...) and you can change them at query time but be aware that the query will not get results with granularities finer than the query granularity defined for a data layer.

Data model

As mentioned before, datasources are big tables containing denormalized data. Datasources are essentially, composed of three kinds of columns:

  1. Timestamp (primary timestamp)

    Datasources must always include a primary timestamp because it is used for partition and sorting the data. The column __time always contains the data timestamp.

  2. Dimensions

    Dimensions are columns that are stored as-is and can be used for any purpose. You can group, filter, or apply aggregators to dimensions at query time in an ad-hoc manner.

  3. Metrics

    Metrics are columns that are stored in an aggregated form. Each metric column has defined an aggregation function that is applied when ingesting data and based on the datasource's query granularity. You can apply aggregation functions at query time to metrics columns.

Keep on reading the Data Model section to have a full reference of datasource schemas for Edge Intelligence.


This section was last updated 2024-10-07