Skip to content

Kepler Operator Requirement

Huamin Chen edited this page Sep 6, 2022 · 8 revisions

Requirement

The operator should be able to probe the cluster and nodes to ensure Kepler is running on a supported environment and starts up with the right configuration.

After Kepler is up, the operator should integrate with Prometheus and Grafana to create a ServiceMonitor and Grafana dashboard, in accordance with the CRD spec.

Cluster Probe

The Operator will probe the nodes and resolve dependency, install the following pkg if missing (if not possible, avoid using those nodes):

  • Kernel-devel
  • Cgroup

CRD Spec

The CRD specifies the following:

  • Kepler deployment
  • RBAC, deployment configuration (including whether using /proc (for cgroup v1), the model server endpoint, whether use estimator), metrics Service
  • Kepler Integration
  • ServiceMonitor, Grafana instance, datasource, dashboard
Clone this wiki locally