Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin for OPL cluster_read.py to count file lines matching to various regexps #124

Merged
merged 9 commits into from
Jun 28, 2024

Conversation

riya-17
Copy link
Contributor

@riya-17 riya-17 commented May 2, 2024

Add a plugin to get the count of Number of Lines and also count of Lines which match a particular regex.

def test_count(self):
string = """
- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller
log_source_command: oc -n openshift-pipelines logs --since=10h --all-containers --selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can hack this and use something like echo -e "log_line_1\nlog_line_2\n..." here to avoid need to call oc here - oc was meant just as an example of how to get a log.

@@ -0,0 +1,4 @@
- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls do not create new file for this, add it to opl/cluster_read_example.yaml.

@@ -398,6 +434,7 @@ def measure(self, ri, name, **kwargs):
"env_variable": EnvironmentPlugin,
"command": CommandPlugin,
"copy_from": CopyFromPlugin,
"count_line": CountLinePlugin,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To register plugin in a way that it will be used for correct config file entries (see

if key in self.measurement_plugins:
) you need to use "log_source_command": CountLinePlugin, IIRC (not sure here, please try).

}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin should return just a name, value tuple which would be something like name, {"all": line_count, ...} AFAICT


for line in result_lines:
line_count+=1
if line["level"] == "error":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not assume log line is JSON, you need to use re module to match it with provided regexp.

@riya-17 riya-17 requested a review from jhutar May 9, 2024 07:18
"""
result = execute(log_source_command)

regex_lst = ["log_regexp_error"[len("log_regexp_") :]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there can be any number of regexp. Imagine you want to process config with all of these:

log_regexp_error: ERROR
log_regexp_warning: WARNING
log_regexp_info: INFO
log_regexp_debug: DEBUG

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm you need a dict (key would be regexp group name and value would be regexp) like this:

{
  "error": "ERROR",
  "warning": "WARNING",
  "info": "INFO",
  "debug": "DEBUG",
}

Copy link
Contributor Author

@riya-17 riya-17 May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhutar The issue is not number of queries or dict produced in the end, it is the way the function is called from line no. 555. It calls it with exact number of regexp attributes and with same name.

if following example is the query:

Example:

log_regexp_error: '"level":"error".*"logger":"ABC"'
log_regexp_warning: '"level":"error".*"logger":"XYZ"'

the above query will produce the function definition with the following attributes:

def measure(self,
        ri,
        name,
        log_source_command,
        log_regexp_error,
        log_regexp_warning,
        output="text",
    )

it can neither increase or decrease the number of variable nor change their names.

To use this we will have to use a distinct number of regexp queries on call and with the same name.. we cannot change them according to each query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am thinking we can do is we can create a single query variable like log_regexp and in this we can give a list of values to match like:
log_regexp: ["error", "warning", "abc_error", "all_error"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about something like this code:

import subprocess
import re

config = {
    "name": "aaa/bbb",
    "log_source_command": 'echo -e \'{"level":"error","logger":"ABC","msg":"Oh no"}\n{"level":"info","logger":"XYZ","msg":"Hello!"}\n{"level":"error","logger":"XYZ","msg":"Oh no"}\n{"level":"warning","logger":"XYZ","msg":"Beware"}\'',
    "log_regexp_error_abc": '"level":"error".*"logger":"ABC"',
    "log_regexp_error_xyz": '"level":"error".*"logger":"XYZ"',
    "log_regexp_warning": '"level":"warning"',
}

session = subprocess.Popen(config["log_source_command"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = session.communicate()

#print(f"STDOUT: {stdout.decode()}")
#print(f"STDERR: {stderr.decode()}")

output = {}
output["all"] = len(stdout.decode().splitlines())

for pattern_name, pattern_value in config.items():
    if not pattern_name.startswith("log_regexp_"):
        continue
    pattern_key = pattern_name[len("log_regexp_"):]
    pattern_regexp = re.compile(pattern_value)
    counter = 0
    for line in stdout.decode().splitlines():
        if pattern_regexp.search(line):
            counter += 1
    output[pattern_key] = counter

print(f"OUTPUT: {output}")

Output here is:

OUTPUT: {'all': 4, 'error_abc': 1, 'error_xyz': 1, 'warning': 1}

@@ -373,6 +374,50 @@ def measure(self, ri, name, command, output="text"):
return name, result


class CountLinePlugin(BasePlugin):
def check_word_presence(self, word, line):
pattern = r"\b{}\b".format(re.escape(word))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you need to escape here. Configuration holds the actual regular expression.

Imagine log line like this:

error: Something broke
ERROR Another thing broke

and config with:

log_regexp_error: '(error|ERROR)'

(not sure if I need backslashes there or not).

Value of that config option is directly a regular expression.

tests/test_cluster_read.py Outdated Show resolved Hide resolved
tests/test_cluster_read.py Outdated Show resolved Hide resolved
tests/test_cluster_read.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jhutar jhutar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jhutar jhutar merged commit 33cf752 into redhat-performance:main Jun 28, 2024
1 check failed
@jhutar
Copy link
Contributor

jhutar commented Jun 28, 2024

Wait, the test is failing!

@jhutar
Copy link
Contributor

jhutar commented Jun 28, 2024

OK, that was simple: 985e6c8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants