-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin for OPL cluster_read.py to count file lines matching to various regexps #124
Plugin for OPL cluster_read.py to count file lines matching to various regexps #124
Conversation
tests/test_cluster_read.py
Outdated
def test_count(self): | ||
string = """ | ||
- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller | ||
log_source_command: oc -n openshift-pipelines logs --since=10h --all-containers --selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can hack this and use something like echo -e "log_line_1\nlog_line_2\n..."
here to avoid need to call oc here - oc was meant just as an example of how to get a log.
opl/cluster_read_config.yaml
Outdated
@@ -0,0 +1,4 @@ | |||
- name: measurements.logs.openshift-pipelines.pipelines-as-code-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls do not create new file for this, add it to opl/cluster_read_example.yaml
.
opl/cluster_read.py
Outdated
@@ -398,6 +434,7 @@ def measure(self, ri, name, **kwargs): | |||
"env_variable": EnvironmentPlugin, | |||
"command": CommandPlugin, | |||
"copy_from": CopyFromPlugin, | |||
"count_line": CountLinePlugin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To register plugin in a way that it will be used for correct config file entries (see
Line 495 in df1b355
if key in self.measurement_plugins: |
"log_source_command": CountLinePlugin,
IIRC (not sure here, please try).
opl/cluster_read.py
Outdated
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plugin should return just a name, value
tuple which would be something like name, {"all": line_count, ...}
AFAICT
opl/cluster_read.py
Outdated
|
||
for line in result_lines: | ||
line_count+=1 | ||
if line["level"] == "error": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not assume log line is JSON, you need to use re
module to match it with provided regexp.
opl/cluster_read.py
Outdated
""" | ||
result = execute(log_source_command) | ||
|
||
regex_lst = ["log_regexp_error"[len("log_regexp_") :]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there can be any number of regexp. Imagine you want to process config with all of these:
log_regexp_error: ERROR
log_regexp_warning: WARNING
log_regexp_info: INFO
log_regexp_debug: DEBUG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm you need a dict (key would be regexp group name and value would be regexp) like this:
{
"error": "ERROR",
"warning": "WARNING",
"info": "INFO",
"debug": "DEBUG",
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhutar The issue is not number of queries or dict produced in the end, it is the way the function is called from line no. 555. It calls it with exact number of regexp attributes and with same name.
if following example is the query:
Example:
log_regexp_error: '"level":"error".*"logger":"ABC"'
log_regexp_warning: '"level":"error".*"logger":"XYZ"'
the above query will produce the function definition with the following attributes:
def measure(self,
ri,
name,
log_source_command,
log_regexp_error,
log_regexp_warning,
output="text",
)
it can neither increase or decrease the number of variable nor change their names.
To use this we will have to use a distinct number of regexp queries on call and with the same name.. we cannot change them according to each query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am thinking we can do is we can create a single query variable like log_regexp and in this we can give a list of values to match like:
log_regexp: ["error", "warning", "abc_error", "all_error"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like this code:
import subprocess
import re
config = {
"name": "aaa/bbb",
"log_source_command": 'echo -e \'{"level":"error","logger":"ABC","msg":"Oh no"}\n{"level":"info","logger":"XYZ","msg":"Hello!"}\n{"level":"error","logger":"XYZ","msg":"Oh no"}\n{"level":"warning","logger":"XYZ","msg":"Beware"}\'',
"log_regexp_error_abc": '"level":"error".*"logger":"ABC"',
"log_regexp_error_xyz": '"level":"error".*"logger":"XYZ"',
"log_regexp_warning": '"level":"warning"',
}
session = subprocess.Popen(config["log_source_command"], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = session.communicate()
#print(f"STDOUT: {stdout.decode()}")
#print(f"STDERR: {stderr.decode()}")
output = {}
output["all"] = len(stdout.decode().splitlines())
for pattern_name, pattern_value in config.items():
if not pattern_name.startswith("log_regexp_"):
continue
pattern_key = pattern_name[len("log_regexp_"):]
pattern_regexp = re.compile(pattern_value)
counter = 0
for line in stdout.decode().splitlines():
if pattern_regexp.search(line):
counter += 1
output[pattern_key] = counter
print(f"OUTPUT: {output}")
Output here is:
OUTPUT: {'all': 4, 'error_abc': 1, 'error_xyz': 1, 'warning': 1}
opl/cluster_read.py
Outdated
@@ -373,6 +374,50 @@ def measure(self, ri, name, command, output="text"): | |||
return name, result | |||
|
|||
|
|||
class CountLinePlugin(BasePlugin): | |||
def check_word_presence(self, word, line): | |||
pattern = r"\b{}\b".format(re.escape(word)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you need to escape here. Configuration holds the actual regular expression.
Imagine log line like this:
error: Something broke
ERROR Another thing broke
and config with:
log_regexp_error: '(error|ERROR)'
(not sure if I need backslashes there or not).
Value of that config option is directly a regular expression.
ee6c479
to
4523648
Compare
4523648
to
c60059c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Wait, the test is failing! |
OK, that was simple: 985e6c8 |
Add a plugin to get the count of Number of Lines and also count of Lines which match a particular regex.