..
#
# Critical Application Monitoring (CAM)
#
# SPDX-FileCopyrightText: Copyright 2023-2024 Arm Limited
# and/or its affiliates
#
# SPDX-License-Identifier: BSD-3-Clause
#
.. _overview:
########
Overview
########
************
Introduction
************
Automotive compute platforms hosting advanced features such as Advanced Driver
Assistance Systems (ADAS) and Autonomous Drive (AD) stacks require increasingly
complex and higher performance CPUs in order to meet the demanding workloads.
In these environments, the detection of application runtime faults is one
strategy used to achieve the required system reliability goals.
To help reach these goals, automotive systems benefit from the addition of a
Safety Island; a separate compute sub-system that provides a higher safety level
compute area for system and application monitoring services.
The Critical Application Monitoring (CAM) project demonstrates an application
observation mechanism hosted on a Safety Island which can improve the overall
system fault coverage.
**********************
Principle of Operation
**********************
Critical applications often follow a pattern where the workloads are split into
multiple periodic tasks chained together to produce a feature pipeline. CAM's
principle of operation revolves around this pattern where such tasks generate
periodic events which are then monitored by the CAM monitoring service. The two
main classes of issues that can be detected are:
* Temporal issues: Events arriving outside the expected period
* Logical issues: Events arriving out of order
.. image:: images/cam_overview.svg
:align: center
The diagram above describes the mains steps and components involved. Further
sections of this document will further describe the individual components in
more detail.
**cam-service** is the monitoring agent that executes from the higher safety
cores in the Safety Island. **cam-service** exposes a socket-based communication
channel.
Critical applications use this communication channel to stream their periodic
events (heartbeat). **libcam** library provides a high-level API that
implements the message protocol used to communicate with **cam-service** as
well other features.
The **stream configuration file** defines the number of events and their timing
characteristics according to the requirements of the critical application. With
the help of **cam-tool**, this file is converted into a binary format
(**stream deployment data**) which is then deployed in the Safety Island to be
consumed by **cam-service**.
**cam-service** implements a driver interface to communicate with a **fault
manager** which is system specific (software or hardware) responsible for taking
any action.
Many aspects of the CAM implementation revolve around time. The main goal for
CAM is to ensure a certain piece of code in critical applications executes
periodically on a specific frequency. When the execution time is violated,
critical applications are deemed as malfunctioning.
.. image:: images/cam_timings.svg
:align: center
Given the diagram above, the following sequence is described:
* Within the monitored piece of code, an event is created at some point in time
(T\ :sub:`0`)
* The **stream deployment data** provides **cam-service** with the event period.
Together with a start sequence, a timer is setup to trigger in T\ :sub:`n`
* The event arrival time in **cam-service** (T\ :sub:`a`) is expected to be
` section for the API
documentation.
===========
cam-service
===========
**cam-service** is an application used for monitoring all the event streams
sent by critical applications. The main goal is for **cam-service** to run
on higher-safety level subsystem.
**cam-service** can be built for both Linux and Zephyr RTOS. The Linux porting
is primarily intended to provide a development environment allowing easier
development and validation. The Zephyr RTOS porting provides a closer experience
to a real production where a real-time operating system is more suitable.
**Interface**
**cam-service** exposes a socket interface which implements the
:ref:`Stream Message Protocol `. This is the
communication channel available for critical applications to send events. Each
connection from an application spawns a new thread. One or more streams of
events are initialized on a per-connection basis.
**Event Stream**
An event stream defines a set of periodic events to be monitored by
**cam-service**. As part of the system deployment, **cam-service** must have
access to the stream deployment data files of each critical application. Note
that each critical application can have on more event streams. During the
initial initialization, critical applications 'create' an event stream on the
connection using specific commands in the message protocol. Each stream is
uniquely identified using UUIDs. **cam-service** uses the UUID to match
the stream deployment data files available to it. The alarm and timings found in
the corresponding file is then used for the monitoring.
Refer to :ref:`CSD_File` section for more information on event streams.
**Fault Handling**
**cam-service** is able to raise various faults when it observes an exception
in the event streams. These include:
* Stream state fault: Stream message which does not match expected state.
* Stream event logic fault: Stream event out of order.
* Stream event temporal fault: Stream event timeout.
In addition, when an unrecoverable error occurs in **cam-service** itself,
it also reports the fault.
The fault module in **cam-service** is divided into front-end and back-end.
The back-end implements a driver interface allowing platform specific drivers to
receive faults from **cam-service**. This allows custom modules (both
software and hardware) to better accommodate the safety workflow required in a
given system.
========
cam-tool
========
**cam-tool** is CAM's Swiss army knife.
**File conversion**
CAM Stream Configuration (CSC) files are written in YAML format. **cam-tool**
can convert these into CAM Stream Deployment (CSD) files ready for deployment.
Refer to :ref:`CSC_File` and :ref:`CSD_File` for more information on CSC
and CSD file specifications.
**Event Log Analysis**
**libcam** supports a log mode where the stream of events can be saved into
a CAM Stream Event Log (CSEL) log file. **cam-tool** has a simple analysis mode
capable of reading these files to provide an initial CSC file with pre-set data
extracted from the logs.
Refer to :ref:`CSEL_File` and :ref:`CSC_File` for more information on
CSEL and CSD file specification.
**Deployment**
Stream Deployment files are meant to be deployed into the system following all
security and safety relevant process adopted by the target platform.
But in order to simplify the development lifecycle, **cam-service** has the
option to allow deployments over the network using the same communication
channel used by the event streams. **cam-tool** supports sending deployment
files directly to **cam-service** using this feature.
Execute **cam-tool** with ``--help`` for details on all possible parameters and
features.
===========
Test Suites
===========
**libcam**, **cam-uuid** and **cam-service** have their own CUnit based unit
tests. These are built and run using CMake's CTest support.
**cam-tool** has a Pytest based tests.
**cam-app-example** has a Python based set of scripts used as integration tests.
These tests are capable of launching both **cam-app-example** and
**cam-service** on Linux.
For more details on how to build and run the tests, refer to
:ref:`development_validation`.
*********************************
Contributions and Issue Reporting
*********************************
This project has not put in place a process for contributions currently.
To report issues with the repository such as potential bugs, security concerns,
or feature requests, submit an Issue via `GitLab Issues`_, following the
project's template.
********************
Feedback and Support
********************
To request support contact Arm at support@arm.com. Arm licensees may also
contact Arm via their partner managers.