Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

riya-17 · 2024-05-02T20:33:15Z

Add a plugin to get the count of Number of Lines and also count of Lines which match a particular regex.

jhutar · 2024-05-03T09:25:08Z

tests/test_cluster_read.py

+ def test_count(self):
+ string = """
+ - name: measurements.logs.openshift-pipelines.pipelines-as-code-controller
+ log_source_command: oc -n openshift-pipelines logs --since=10h --all-containers --selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=default


You can hack this and use something like echo -e "log_line_1\nlog_line_2\n..." here to avoid need to call oc here - oc was meant just as an example of how to get a log.

jhutar · 2024-05-03T09:25:54Z

opl/cluster_read_config.yaml

@@ -0,0 +1,4 @@
+- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller


Pls do not create new file for this, add it to opl/cluster_read_example.yaml.

jhutar · 2024-05-03T09:28:53Z

opl/cluster_read.py

@@ -398,6 +434,7 @@ def measure(self, ri, name, **kwargs):
 "env_variable": EnvironmentPlugin,
 "command": CommandPlugin,
 "copy_from": CopyFromPlugin,
+ "count_line": CountLinePlugin,


To register plugin in a way that it will be used for correct config file entries (see

opl/opl/cluster_read.py

Line 495 in df1b355

if key in self.measurement_plugins:

) you need to use "log_source_command": CountLinePlugin, IIRC (not sure here, please try).

jhutar · 2024-05-03T09:31:24Z

opl/cluster_read.py

+ }
+ }
+ }
+ }


Plugin should return just a name, value tuple which would be something like name, {"all": line_count, ...} AFAICT

jhutar · 2024-05-03T09:33:18Z

opl/cluster_read.py

+
+ for line in result_lines:
+ line_count+=1
+ if line["level"] == "error":


Do not assume log line is JSON, you need to use re module to match it with provided regexp.

jhutar · 2024-05-10T05:31:54Z

opl/cluster_read.py

+ """
+ result = execute(log_source_command)
+
+ regex_lst = ["log_regexp_error"[len("log_regexp_") :]]


Note there can be any number of regexp. Imagine you want to process config with all of these:

log_regexp_error: ERROR log_regexp_warning: WARNING log_regexp_info: INFO log_regexp_debug: DEBUG

So I'm you need a dict (key would be regexp group name and value would be regexp) like this:

{ "error": "ERROR", "warning": "WARNING", "info": "INFO", "debug": "DEBUG", }

@jhutar The issue is not number of queries or dict produced in the end, it is the way the function is called from line no. 555. It calls it with exact number of regexp attributes and with same name.

if following example is the query:

Example:

log_regexp_error: '"level":"error".*"logger":"ABC"' log_regexp_warning: '"level":"error".*"logger":"XYZ"'

the above query will produce the function definition with the following attributes:

def measure(self, ri, name, log_source_command, log_regexp_error, log_regexp_warning, output="text", )

it can neither increase or decrease the number of variable nor change their names.

To use this we will have to use a distinct number of regexp queries on call and with the same name.. we cannot change them according to each query.

What I am thinking we can do is we can create a single query variable like log_regexp and in this we can give a list of values to match like:
log_regexp: ["error", "warning", "abc_error", "all_error"]

I was thinking about something like this code:

import subprocess import re config = { "name": "aaa/bbb", "log_source_command": 'echo -e \'{"level":"error","logger":"ABC","msg":"Oh no"}\n{"level":"info","logger":"XYZ","msg":"Hello!"}\n{"level":"error","logger":"XYZ","msg":"Oh no"}\n{"level":"warning","logger":"XYZ","msg":"Beware"}\'', "log_regexp_error_abc": '"level":"error".*"logger":"ABC"', "log_regexp_error_xyz": '"level":"error".*"logger":"XYZ"', "log_regexp_warning": '"level":"warning"', } session = subprocess.Popen(config["log_source_command"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = session.communicate() #print(f"STDOUT: {stdout.decode()}") #print(f"STDERR: {stderr.decode()}") output = {} output["all"] = len(stdout.decode().splitlines()) for pattern_name, pattern_value in config.items(): if not pattern_name.startswith("log_regexp_"): continue pattern_key = pattern_name[len("log_regexp_"):] pattern_regexp = re.compile(pattern_value) counter = 0 for line in stdout.decode().splitlines(): if pattern_regexp.search(line): counter += 1 output[pattern_key] = counter print(f"OUTPUT: {output}")

Output here is:

OUTPUT: {'all': 4, 'error_abc': 1, 'error_xyz': 1, 'warning': 1}

jhutar · 2024-05-10T05:40:12Z

opl/cluster_read.py

@@ -373,6 +374,50 @@ def measure(self, ri, name, command, output="text"):
 return name, result


+class CountLinePlugin(BasePlugin):
+ def check_word_presence(self, word, line):
+ pattern = r"\b{}\b".format(re.escape(word))


I do not think you need to escape here. Configuration holds the actual regular expression.

Imagine log line like this:

error: Something broke ERROR Another thing broke

and config with:

log_regexp_error: '(error|ERROR)'

(not sure if I need backslashes there or not).

Value of that config option is directly a regular expression.

tests/test_cluster_read.py

Signed-off-by: Riya <[email protected]>

jhutar

Thank you!

jhutar · 2024-06-28T10:44:04Z

Wait, the test is failing!

jhutar · 2024-06-28T10:46:45Z

OK, that was simple: 985e6c8

jhutar reviewed May 3, 2024

View reviewed changes

riya-17 requested a review from jhutar May 9, 2024 07:18

jhutar reviewed May 10, 2024

View reviewed changes

tests/test_cluster_read.py Outdated Show resolved Hide resolved

jhutar reviewed May 10, 2024

View reviewed changes

tests/test_cluster_read.py Outdated Show resolved Hide resolved

jhutar reviewed May 10, 2024

View reviewed changes

tests/test_cluster_read.py Outdated Show resolved Hide resolved

riya-17 force-pushed the count-lines-for-regex branch from ee6c479 to 4523648 Compare June 27, 2024 09:54

riya-17 added 4 commits June 27, 2024 15:25

Add Count Plugin for Counting the Number of Lines and matched Regex

a2b1d2d

Black & Flake8 issues

6ad3a17

Change the approach for counting lines as regex

b689213

Test Case Update

c60059c

riya-17 force-pushed the count-lines-for-regex branch from 4523648 to c60059c Compare June 27, 2024 10:02

riya-17 added 5 commits June 28, 2024 15:51

Add Count lines

d60c146

Signed-off-by: Riya <[email protected]>

format with balck

dc802a8

Add empty line

cb305af

Add empty line

30feff9

remove counter import

49f7684

jhutar approved these changes Jun 28, 2024

View reviewed changes

jhutar merged commit 33cf752 into redhat-performance:main Jun 28, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

riya-17 commented May 2, 2024

jhutar May 3, 2024

jhutar May 3, 2024

jhutar May 3, 2024

jhutar May 3, 2024

jhutar May 3, 2024

jhutar May 10, 2024

jhutar May 10, 2024

riya-17 May 10, 2024 •

edited

Loading

riya-17 May 10, 2024

jhutar May 11, 2024

jhutar May 10, 2024

jhutar left a comment

jhutar commented Jun 28, 2024

jhutar commented Jun 28, 2024

		@@ -0,0 +1,4 @@
		- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller

Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

Conversation

riya-17 commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

riya-17 May 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhutar left a comment

Choose a reason for hiding this comment

jhutar commented Jun 28, 2024

jhutar commented Jun 28, 2024

riya-17 May 10, 2024 •

edited

Loading