Add cluster performance test #2130

Selutario · 2021-10-26T11:19:31Z

Related issue
Closes #1939

THIS PR SHOULD BE MERGED TO MASTER AFTER #2032 IS MERGED

Description

This PR adds a new test that checks if cluster behavior is correct. It generates some stats from data in CSVs with information of a cluster environment and then, each stat is compared against a threshold that is defined in advance (inside the data folder).

The thresholds are provisional and small changes are likely to be made in the future.

Configuration options

The test currently contains two configurations:

10 workers and 50000 agents.
25 workers and 50000 agents.

It is possible to add new configurations in the future. In addition, the test requires 3 parameters to correctly work, as well as other optional ones to generate HTML reports:

--n_workers: Number of workers node in the cluster environment.
--n_agents: Number of agents in the cluster environment.

--artifacts_path: Path where CSVs with cluster information can be found. It should follow the structure below:

├── master
│   ├── data
│   │   ├── *
│   │   │   ├── wazuh-clusterd.csv
│   │   ├── *
│   │   │   ├── agent-info_sync.csv
│   │   │   ├── integrity_check.csv
│   │   │   └── integrity_sync.csv
├── worker_x
│   ├── data
│   │   ├── *
│   │   │   ├── wazuh-clusterd.csv
│   │   ├── *
│   │   │   ├── agent-info_sync.csv
│   │   │   ├── integrity_check.csv
│   │   │   └── integrity_sync.csv
└── ...

--html=report.html: Create a html report with the test results.
--self-contained-html: Store all the necessary data for the report inside the html file.

Logs example

The test creates a report where each exceeded threshold is inserted inside a table (red arrow). It also shows the duration of each phase (green arrow):

python3 -m pytest test_cluster_performance.py --artifacts_path='/home/selu/Descargas/cluster_performance/74' --n_workers=10 --n_agents=50000 --html=report.html --self-contained-html
============================================================================================ test session starts ============================================================================================
platform linux -- Python 3.8.10, pytest-5.0.0, py-1.8.2, pluggy-0.13.1
rootdir: /home/selu/Git/wazuh-qa/tests/performance/test_cluster
plugins: metadata-1.10.0, html-3.1.1, testinfra-5.0.0, tavern-1.2.2, pep8-1.0.6, cov-2.10.0, asyncio-0.14.0
collected 1 item                                                                                                                                                                                            
test_cluster_performance.py F                                                                                                                                                                         [100%]
================================================================================================= FAILURES ==================================================================================================
_________________________________________________________________________________________ test_cluster_performance __________________________________________________________________________________________
artifacts_path = '/home/selu/Descargas/cluster_performance/74', n_workers = '10', n_agents = '50000'
    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        information of a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        if (selected_conf := f"{n_workers}w_{n_agents}a") not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f'Path "{artifacts_path}" could not be found or it may not follow the proper structure.')
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f'Information of {n_workers} workers was expected, but {cluster_info.get("worker_nodes", 0)} '
                        f'were found.')
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
            pytest.fail(f'Stats could not be retrieved, "{artifacts_path}" path may not exist, it is empty or it may not'
                        f' follow the proper structure.')
    
        # Compare each stat with its threshold.
        for data_name, data_stats in data.items():
            for phase, files in data_stats.items():
                for file, columns in files.items():
                    for column, nodes in columns.items():
                        for node_type, stats in nodes.items():
                            for stat, value in stats.items():
                                th_value = configurations[selected_conf][data_name][phase][file][column][node_type][stat]
                                if value[1] >= th_value:
                                    exceeded_thresholds.append({'value': value[1], 'threshold': th_value, 'stat': stat,
                                                                'column': column, 'node': value[0], 'file': file,
                                                                'phase': phase})
    
        try:
            output = '\n - '.join('{stat} {column} {value} >= {threshold} ({node}, {file}, '
                                  '{phase})'.format(**item) for item in exceeded_thresholds)
>           assert not exceeded_thresholds, f"Some thresholds were exceeded:\n - {output}"
E           AssertionError: Some thresholds were exceeded:
E              - max time_spent(s) 9.43 >= 9 (worker_10, integrity_check, stable_phase)
E              - max time_spent(s) 10.233 >= 10 (worker_10, agent-info_sync, stable_phase)
E           assert not [{'column': 'time_spent(s)', 'file': 'integrity_check', 'node': 'worker_10', 'phase': 'stable_phase', ...}, {'column': 'time_spent(s)', 'file': 'agent-info_sync', 'node': 'worker_10', 'phase': 'stable_phase', ...}]
test_cluster_performance.py:101: AssertionError
------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
Setup phase took 0:10:11s (2021/10/15 15:39:41 - 2021/10/15 15:49:52).
Stable phase took 0:13:47s (2021/10/15 15:49:52 - 2021/10/15 16:03:39).
------------------------------------------------------ generated html file: file:///home/selu/Git/wazuh-qa/tests/performance/test_cluster/report.html -------------------------------------------------------
========================================================================================= 1 failed in 0.45 seconds ==========================================================================================

The base branch was changed.

tests/performance/test_cluster/test_cluster_performance/test_cluster_performance.py

snaow

get_datetime_diff should be part of common Framework functions and no local test function

Selutario added 6 commits October 26, 2021 12:57

Add script to parse and obtain stats from cluster CSVs

bfa4462

Convert defaultdicts to dicts. Minor changes.

4e97633

Update docstring

3623008

Add max field to stats calculation

06b65e9

Add new ClusterEnvInfo class

2de0be6

Use master logs to define when setup phase starts for every node

7275730

Selutario self-assigned this Oct 26, 2021

Selutario linked an issue Oct 26, 2021 that may be closed by this pull request

Develop a test to check that the cluster performance is correct #1939

Closed

Selutario added 4 commits October 27, 2021 10:09

Use ternary operators

c7ba26a

Add cluster performance tests

4942868

Add README for the test

379339f

Move test inside test_cluster_performance folder

71cc270

Selutario force-pushed the 1939-cluster-performance-test branch from 54ee28b to 71cc270 Compare October 27, 2021 08:10

Improve assert's messages

29857c5

AdriiiPRodri requested review from AdriiiPRodri and davidjiglesias November 2, 2021 09:28

AdriiiPRodri previously approved these changes Nov 2, 2021

View reviewed changes

Selutario changed the base branch from feature/1938-cluster-stas-script to master November 2, 2021 11:27

Selutario mentioned this pull request Nov 3, 2021

Add cluster logs tests #2166

Merged

davidjiglesias approved these changes Nov 4, 2021

View reviewed changes

snaow reviewed Nov 30, 2021

View reviewed changes

tests/performance/test_cluster/test_cluster_performance/test_cluster_performance.py Show resolved Hide resolved

snaow approved these changes Nov 30, 2021

View reviewed changes

snaow merged commit 480efbe into master Nov 30, 2021

snaow deleted the 1939-cluster-performance-test branch November 30, 2021 22:50

Selutario mentioned this pull request Dec 1, 2021

Move get_datetime_diff to wazuh_testing #2287

Closed

snaow mentioned this pull request Jan 27, 2022

QA Release - Rev 430031 #2500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster performance test #2130

Add cluster performance test #2130

Selutario commented Oct 26, 2021 •

edited

Loading

snaow left a comment

Add cluster performance test #2130

Add cluster performance test #2130

Conversation

Selutario commented Oct 26, 2021 • edited Loading

Description

Configuration options

Logs example

snaow left a comment

Choose a reason for hiding this comment

Selutario commented Oct 26, 2021 •

edited

Loading