When considering complex distributed architectures—spanning also multiple cluster or edge environments—the ability to gain deep insights into the performance, health, and interactions of these clusters, nodes and constituent workloads becomes paramount. Observability represents an approach to analyzing and optimizing systems by providing a real-time perspective on all operational data related to applications and infrastructure. Observability lays the groundwork for the ACES platform to be able to proactively identify and address issues to ensure seamless operations and optimal resource utilization. However, the complexities of multi-cluster or edge environments, such as the ones tackled in ACES, change the way of comprehensively viewing system behavior, dependencies, and potential bottlenecks and subsequently detecting, diagnosing, and resolving fatal errors.
To this end, ACES is defining and realizing a set of open, portable, and expressive data acquisition and knowledge representation models as well as software that can cover the needs of Cognitive Edge-Cloud services and infrastructure at all levels. Specifically, ACES is developing components to acquire and transform data from available infrastructure, resources, devices, services, users, and applications that feed into knowledge, and, in this, achieve a high level of contextual awareness. We touch on topics such as distributed data collection, data collection reconfiguration approaches, data aggregation, data exchange minimization, peer-to-peer data exchange and data replication approaches. In the context of autopoiesis, data acquisition is critical as the systems require sensing and deep understanding of the world in order to operate and drive actionability.
In ACES, University of Ljubljana is designing and developing the Monitoring & Observability framework, a vertical ACES component, spanning the overall ACES architecture and its constituent components. The component provides monitoring and observability aspects to the different layers of the software stack on various levels (i.e., edge, application, network, and cloud layer). It encompasses monitoring, logging, tracing, metrics collection, alerting, anomaly detection and analysis, visualization, and performance analysis. Due to its inherent distributed nature, the Monitoring & Observability framework considers hierarchical and distributed monitoring and storage, including across multiple clusters. To date, we have defined the component architecture consisting of the main subcomponents Monitoring & Observability Core, Push Gateway, Alert Manager, and Data Forwarder, which provide core functionalities of monitoring and telemetry data collection, storage, forwarding and querying as well as alerting to ACES workloads or other ACES components. Additionally, auxiliary subcomponents enable functionalities such as service discovery or data analysis, data export and visualization. Following the initial implementation and first software release, the component has already been successfully deployed in the test and integration environment. Next steps target final software component release and further use case implementation as well as other related piloting activities.
University of Ljubljana is additionally investigating and developing various multi-layer system characterization and aggregation approaches that should provide unprecedented monitoring and observability context of edge systems. But more on that in the upcoming news releases. Stay tuned!
0 Comments