Name		Name	Last commit message	Last commit date
parent directory ..
cpe_benchmark		cpe_benchmark
deployment		deployment
s3		s3
tekton		tekton
README.md		README.md
cmd_instruction.md		cmd_instruction.md
cpe_script_instruction.md		cpe_script_instruction.md
script.sh		script.sh

README.md

Contribute to power profiling amd model training

Requirements

git > 2.22
kubectl
yq, jq
power meter is available

Pre-step

Fork and clone this repository and move to profile folder
```
git clone
cd model_training
chmod +x script.sh
```

1. Prepare cluster

From scratch (no target kubernetes cluster)

port 9090 and 5101 not being used (will be used in port-forward for prometheus and kind registry respectively)

Run

./script.sh prepare_cluster

The script will

create a kind cluster kind-for-training with registry at port 5101.
deploy Prometheus.
deploy Prometheus RBAC and node port to 30090 port on kind node which will be forwarded to 9090 port on the host.
deploy service monitor for kepler and reload to Prometheus server

For managed cluster

Please confirm the following requirements:

Kepler installation
Prometheus installation
Kepler metrics are exported to Promtheus server
Prometheus server is available at http://localhost:9090. Otherwise, set environment PROM_SERVER.

2. Run benchmark and collect metrics

With benchmark automation and pipeline

There are two options to run the benchmark and collect the metrics, CPE-operator with manual script and Tekton Pipeline.

The adoption of the CPE operator is slated for deprecation. We are on transitioning to the automation of collection and training processes through the Tekton pipeline. Nevertheless, the CPE operator might still be considered for usage in customized benchmarks requiring performance values per sub-workload within the benchmark suite.

Tekton Pipeline Instruction

CPE Operator Instruction

With manual execution

In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint cmd/main.py

Manual Metric Collection and Training with Entrypoint

Clean up

For kind-for-training cluster

Run

./script.sh cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_training

model_training

README.md

Contribute to power profiling amd model training

Requirements

Pre-step

1. Prepare cluster

From scratch (no target kubernetes cluster)

For managed cluster

2. Run benchmark and collect metrics

With benchmark automation and pipeline

Tekton Pipeline Instruction

CPE Operator Instruction

With manual execution

Manual Metric Collection and Training with Entrypoint

Clean up

For kind-for-training cluster

Files

model_training

Directory actions

More options

Directory actions

More options

Latest commit

History

model_training

Folders and files

parent directory

README.md

Contribute to power profiling amd model training

Requirements

Pre-step

1. Prepare cluster

From scratch (no target kubernetes cluster)

For managed cluster

2. Run benchmark and collect metrics

With benchmark automation and pipeline

Tekton Pipeline Instruction

CPE Operator Instruction

With manual execution

Manual Metric Collection and Training with Entrypoint

Clean up

For kind-for-training cluster