Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster performance test #2130

Merged
merged 11 commits into from
Nov 30, 2021
Merged

Add cluster performance test #2130

merged 11 commits into from
Nov 30, 2021

Conversation

Selutario
Copy link
Contributor

@Selutario Selutario commented Oct 26, 2021

Related issue
Closes #1939

THIS PR SHOULD BE MERGED TO MASTER AFTER #2032 IS MERGED

Description

This PR adds a new test that checks if cluster behavior is correct. It generates some stats from data in CSVs with information of a cluster environment and then, each stat is compared against a threshold that is defined in advance (inside the data folder).

The thresholds are provisional and small changes are likely to be made in the future.

Configuration options

The test currently contains two configurations:

  • 10 workers and 50000 agents.
  • 25 workers and 50000 agents.

It is possible to add new configurations in the future. In addition, the test requires 3 parameters to correctly work, as well as other optional ones to generate HTML reports:

  • --n_workers: Number of workers node in the cluster environment.
  • --n_agents: Number of agents in the cluster environment.
  • --artifacts_path: Path where CSVs with cluster information can be found. It should follow the structure below:
    ├── master
    │   ├── data
    │   │   ├── *
    │   │   │   ├── wazuh-clusterd.csv
    │   │   ├── *
    │   │   │   ├── agent-info_sync.csv
    │   │   │   ├── integrity_check.csv
    │   │   │   └── integrity_sync.csv
    ├── worker_x
    │   ├── data
    │   │   ├── *
    │   │   │   ├── wazuh-clusterd.csv
    │   │   ├── *
    │   │   │   ├── agent-info_sync.csv
    │   │   │   ├── integrity_check.csv
    │   │   │   └── integrity_sync.csv
    └── ...
    
  • --html=report.html: Create a html report with the test results.
  • --self-contained-html: Store all the necessary data for the report inside the html file.

Logs example

The test creates a report where each exceeded threshold is inserted inside a table (red arrow). It also shows the duration of each phase (green arrow):
image

python3 -m pytest test_cluster_performance.py --artifacts_path='/home/selu/Descargas/cluster_performance/74' --n_workers=10 --n_agents=50000 --html=report.html --self-contained-html
============================================================================================ test session starts ============================================================================================
platform linux -- Python 3.8.10, pytest-5.0.0, py-1.8.2, pluggy-0.13.1
rootdir: /home/selu/Git/wazuh-qa/tests/performance/test_cluster
plugins: metadata-1.10.0, html-3.1.1, testinfra-5.0.0, tavern-1.2.2, pep8-1.0.6, cov-2.10.0, asyncio-0.14.0
collected 1 item                                                                                                                                                                                            
test_cluster_performance.py F                                                                                                                                                                         [100%]
================================================================================================= FAILURES ==================================================================================================
_________________________________________________________________________________________ test_cluster_performance __________________________________________________________________________________________
artifacts_path = '/home/selu/Descargas/cluster_performance/74', n_workers = '10', n_agents = '50000'
    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        information of a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        if (selected_conf := f"{n_workers}w_{n_agents}a") not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f'Path "{artifacts_path}" could not be found or it may not follow the proper structure.')
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f'Information of {n_workers} workers was expected, but {cluster_info.get("worker_nodes", 0)} '
                        f'were found.')
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
            pytest.fail(f'Stats could not be retrieved, "{artifacts_path}" path may not exist, it is empty or it may not'
                        f' follow the proper structure.')
    
        # Compare each stat with its threshold.
        for data_name, data_stats in data.items():
            for phase, files in data_stats.items():
                for file, columns in files.items():
                    for column, nodes in columns.items():
                        for node_type, stats in nodes.items():
                            for stat, value in stats.items():
                                th_value = configurations[selected_conf][data_name][phase][file][column][node_type][stat]
                                if value[1] >= th_value:
                                    exceeded_thresholds.append({'value': value[1], 'threshold': th_value, 'stat': stat,
                                                                'column': column, 'node': value[0], 'file': file,
                                                                'phase': phase})
    
        try:
            output = '\n - '.join('{stat} {column} {value} >= {threshold} ({node}, {file}, '
                                  '{phase})'.format(**item) for item in exceeded_thresholds)
>           assert not exceeded_thresholds, f"Some thresholds were exceeded:\n - {output}"
E           AssertionError: Some thresholds were exceeded:
E              - max time_spent(s) 9.43 >= 9 (worker_10, integrity_check, stable_phase)
E              - max time_spent(s) 10.233 >= 10 (worker_10, agent-info_sync, stable_phase)
E           assert not [{'column': 'time_spent(s)', 'file': 'integrity_check', 'node': 'worker_10', 'phase': 'stable_phase', ...}, {'column': 'time_spent(s)', 'file': 'agent-info_sync', 'node': 'worker_10', 'phase': 'stable_phase', ...}]
test_cluster_performance.py:101: AssertionError
------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
Setup phase took 0:10:11s (2021/10/15 15:39:41 - 2021/10/15 15:49:52).
Stable phase took 0:13:47s (2021/10/15 15:49:52 - 2021/10/15 16:03:39).
------------------------------------------------------ generated html file: file:///home/selu/Git/wazuh-qa/tests/performance/test_cluster/report.html -------------------------------------------------------
========================================================================================= 1 failed in 0.45 seconds ==========================================================================================

@Selutario Selutario self-assigned this Oct 26, 2021
@Selutario Selutario linked an issue Oct 26, 2021 that may be closed by this pull request
@Selutario Selutario force-pushed the 1939-cluster-performance-test branch from 54ee28b to 71cc270 Compare October 27, 2021 08:10
AdriiiPRodri
AdriiiPRodri previously approved these changes Nov 2, 2021
@Selutario Selutario changed the base branch from feature/1938-cluster-stas-script to master November 2, 2021 11:27
@Selutario Selutario dismissed AdriiiPRodri’s stale review November 2, 2021 11:27

The base branch was changed.

@Selutario Selutario mentioned this pull request Nov 3, 2021
Copy link
Contributor

@snaow snaow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_datetime_diff should be part of common Framework functions and no local test function

@snaow snaow merged commit 480efbe into master Nov 30, 2021
@snaow snaow deleted the 1939-cluster-performance-test branch November 30, 2021 22:50
@snaow snaow mentioned this pull request Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Develop a test to check that the cluster performance is correct
4 participants