Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate agent-group files to wazuh-db - QA plan and execution #11240

Closed
pereyra-m opened this issue Dec 7, 2021 · 12 comments · Fixed by #11753 or #12095
Closed

Migrate agent-group files to wazuh-db - QA plan and execution #11240

pereyra-m opened this issue Dec 7, 2021 · 12 comments · Fixed by #11753 or #12095

Comments

@pereyra-m
Copy link
Member

pereyra-m commented Dec 7, 2021

Description

This is a tracking purposes issue to reflect all the work done on QA for the epic #10771. In this issue, we will expose the overall QA plan and the results of all the testing performed as well.

In the next sections, we add details and explain the criteria of all the QA instances.

Code reviews

For each PR created as part of this epic, the reviewer should take into account the next items.

  • Functional revision: Make sure that the changes satisfy all the requirements of the issue.
  • Performance: If the implementation performs a task with some complexity level, evaluate the performance of the implemented algorithm.
  • Memory: Evaluate the right usage of the memory in the heap and the stack as well.
  • Unit testability: Evaluate if the implementation will allow the development of unit test cases in an easy and effective way.
  • Mantenibilidad: Evaluate whether the code will be easy to be modified or extended in the future.
  • Spelling and language: Evaluate the spelling and gram in the comments.
  • Security: Evaluate whether the code is secure.
  • Interoperability: Evaluate whether the change should be coordinated and communicated to other teams.

Each reviewer will leave a comment in the review enumerating the criteria taken into account.

Static analysis of code

  • Each pull request should include the results of the scan-build execution.
  • After having the epic implemented, a Coverity analysis will be executed.

For details, check the issue #11755.

Unit testing

As part of this epic, the implementation should have 100% of coverage for all the new functions included in the code, as well as the modified functions that already had 100% of coverage

For details, check the issue #11754.

Exploratory testing

After having the epic implemented, all the team will perform exploratory testing over the implementation.
Here we have a list of relevant tests collected during the development:

global.db upgrade

Test ID Tittle Validations Reference link
gdbup1 Upgrade from a legacy global.db (lower than 3.10) A pre_upgrade backup should be generated. If the backup is created successfully, a new global.db file should be generated. If the backup creation fails, an error log should be logged and the global.db should be disabled. #11240 (comment)
gdbup2 Upgrade from a global.db of version greater or equal than 3.10 A pre_upgrade backup should be generated. If the backup is created successfully, all the schema upgrades should be applied up to version 4. If the backup creation fails, an error log should be logged and the global.db should be disabled. On the other hand, if any of the upgrades up to version 4 fails, an error log should be logged and the database should be restored to the original state. If the restoration to the original state doesn't work, the database should be disabled. #11240 (comment)
gdbup3 Error looking for metadata table verification During the upgrade, we handle the case in which sqlite fails checking if the metadata table exists in the database. This is essentially a sqlite error so, we should evaluate whether is possible to force this scenario. If this is possible, an error log should be logged and the global.db should be disabled. #11240 (comment)
gdbup4 Error reading the database version We handle the case in which fails to read the database version from a no-legacy database, with a metadata table and the version written in that table. We should evaluate whether is possible to force this scenario. If this is possible, a warning log should be logged and the global.db should be disabled. #11240 (comment)

global.db backups

Test ID Tittle Validations Reference link
gdbbak1 Database backup settings We should verify that having the database backup settings written or not having them written in the configuration file, they takes the correct value. The default values are specified in the implementation issue. This should be validated by using the interface implemented as part of the issue 11732 in order to also validate that this interface works as expected. #11240 (comment)
gdbbak2 Database backups periodical creation We should validate that the global.db backups are created in the period specified in the settings. In addition, we should verify that the backup creation is informed in the log files. #11240 (comment)
gdbbak3 Database backups max files We should verify that when the global.db backups are enabled, we only keep the number of backups specified in the <max_files> setting. In addition, we should verify that the oldest backup is removed when we create the backup max_files+1. #11240 (comment)

Single manager agent groups management

Test ID Tittle Validations Reference link
smagm1 Agent group assignment and unassignment We should verify that having a group created in the manager, we can assign and unassign an agent to this group during the agent registration, using the agent_groups tool, and using the Wazuh API. We should also verify that we can't assign an agent to a group that doesn't exist. Even when we specify a single group, or when we specify a list of groups and one of them doesn't exist. Finally, when we unassign agents from groups, we should verify that the agents always belong to a group. For all the cases we should verify the groups_hash column. wazuh/wazuh-qa#2504 and #11781
smagm2 Agent group assignment limit and name limit We should verify that during the agent registration, using the agent_groups tool, and using the Wazuh API, we can't assign an agent to more than 128 groups. In addition, we should verify that we can't create groups with an invalid name because of the size. wazuh/wazuh-qa#2504 and #11781
smagm3 Per-agent groups query We should verify that using the agent_groups tool and the Wazuh API as well, we can query the groups to which an agent is assigned. wazuh/wazuh-qa#2504 and #11781
smagm4 Groups priorities We should verify when an agent is added to more than one group, the last assignment is the one with the higher priority. In addition, we should verify that when having multiple groups to which an agent is assigned, if we unassign the agent from a group with middle-level priority, the remaining groups keep the same priority order. wazuh/wazuh-qa#2504 and #11781
smagm5 Groups guessing We should verify that when an agent registered in the manager and assigned to a group is removed, when it negotiates a new ID and key, in the first communication with the manager, this performs a guessing operation and determines the groups to which the agent was assigned. With guess_agent_group disabled (default) the agent will be set to default when connects, with guess_agent_group enabled, the last assigned group must be restored. wazuh/wazuh-qa#2504 and #11781
smagm6 Repeated group names We should verify that we cant create groups with the same name. The group table should not allow repeated rows. wazuh/wazuh-qa#2504 and #11781
smagm7 Multiple agent group creation and assignment with script calling agent-groups We should verify that performing the task won't corrupt the belongs table. wazuh/wazuh-qa#2504 and #11781

Cluster agent groups management

Test ID Tittle Validations Reference link
cagm1 Agent group assignment and unassignment We should verify that having a group created in the master, we can assign and unassign an agent to this group during the agent registration, using the agent_groups tool, and using the Wazuh API. This should work even if the agent is in communication with the master and with the workers as well. wazuh/wazuh-qa#2504 and #11781
cagm2 Groups priorities in agent communicated with the worker We should verify when an agent in communication with a worker node is added to more than one group, the last assignment is the one with the higher priority. In addition, we should verify that when having multiple groups to which this agent is assigned, if we unassign the agent from a group with middle-level priority, the remaining groups keep the same priority order. wazuh/wazuh-qa#2504 and #11781
cagm3 Groups guessing We should verify that when an agent registered in the master and assigned to a group is removed, when it negotiates a new ID and key, in the first communication with the worker, this performs a guessing operation and determines the groups to which the agent was assigned. With guess_agent_group disabled (default) the agent will be set to default when connects, with guess_agent_group enabled, the last assigned group must be restored. This information should be propagated to the master node. wazuh/wazuh-qa#2504 and #11781
cagm4 Groups integrity We should verify that when having groups created in a cluster, and agents in communication with the master and others with the workers that are assigned to different groups, if we manually modify the agent groups information in the workers, this causes an integrity check failure and triggers an agent groups data synchronization. wazuh/wazuh-qa#2504 and #11781
cagm5 Agents recently registered with groups We should verify that when registering a new agent with groups, the agent information is propagated to the worker before the group's information, allowing the worker to insert successfully the group information and being available when the agent connects with the worker manager wazuh/wazuh-qa#2504 and #11781

Installation and upgrade

Test ID Tittle Validations Reference link
iu1 Creation of backup/db folder and Removal of backup/groups folder During the installation, a new backup/db folder should be created to be used by the periodical global.db backup creation. We should verify that during clean installations the backup/groups folder is no longer created. In addition, during an upgrade, this folder should be removed. #11240 (comment)
iu2 Removal of files in agent-group folder We should verify that during an upgrade, when the manager starts, it properly adds the information in the agent-group files to the global.db. After this, the agent-group folder and its content shouldn't exist. #11240 (comment)
@MiguelazoDS
Copy link
Member

MiguelazoDS commented Dec 13, 2021

Issues found

2021/12/13 14:58:07 wazuh-remoted[84758] url.c:225 at wurl_request(): WARNING: Couldn't connect to download module socket 'queue/sockets/download'

UPDATED 02/03/22: shared download deprecated for group assignment #11737

  • agent-groups as well as agent-auth let add one more group than the maximum number specified.
    1
    6
    **UPDATED 02/03/22: shared download deprecated for group assignment UPDATED 02/03/22: shared download deprecated for group assignment Deprecate the groups assignment from the shared_download module in the Wazuh managers #11737
  • Editing the file with unexistent groups, it keeps in the file the last one and deletes the rest. Undefined behavior
    Adding ZZZ,ZZ,Z as groups for agent 019 ended with the first two deleted from the file, but the groups were successfully created. This behavior seems to occur when there's no default group in the file.
    image
    image
    UPDATED 02/03/22: The use of groups file per agent was deprecated as part of this development
  • It is not possible to delete groups created after editing the agent file.
    image
    This occurred because there were not any group folders in the /var/ossec/etc/shared path
    UPDATED 02/03/22: The use of groups file per agent was deprecated as part of this development
  • Running a script to insert 255 long name groups breaks belongs table.
2021/12/15 13:14:22 wazuh-modulesd:database: ERROR: Couldn't sync agent '002' group.
  • Registering agent using agent-auth won't connect. If enrollment is enabled it will request a new key losing the group information. Agent Wazuh 4.1.5
Click to expand
2022/01/03 16:53:59 agent-auth: INFO: Started (pid: 2460).
2022/01/03 16:53:59 agent-auth: INFO: Requesting a key from server: 192.168.33.20
2022/01/03 16:53:59 agent-auth: INFO: No authentication password provided
2022/01/03 16:53:59 agent-auth: INFO: Using agent name as: archlinux
2022/01/03 16:53:59 agent-auth: INFO: Waiting for server reply
2022/01/03 16:53:59 agent-auth: INFO: Valid key received
2022/01/03 16:54:06 ossec-agentd: ERROR: Connection socket: Connection reset by peer (104)
2022/01/03 16:54:06 ossec-agentd: ERROR: (1137): Lost connection with manager. Setting lock.
2022/01/03 16:54:06 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:06 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:06 ossec-agentd: WARNING: Process locked due to agent is offline. Waiting for connection...
2022/01/03 16:54:07 ossec-syscheckd: WARNING: Process locked due to agent is offline. Waiting for connection...
2022/01/03 16:54:16 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:16 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:26 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:26 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:36 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:36 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:46 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:46 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:54:46 ossec-agentd: INFO: Requesting a key from server: 192.168.33.20
2022/01/03 16:54:46 ossec-agentd: INFO: No authentication password provided
2022/01/03 16:54:46 ossec-agentd: INFO: Using agent name as: archlinux
2022/01/03 16:54:46 ossec-agentd: INFO: Waiting for server reply
2022/01/03 16:54:46 ossec-agentd: INFO: Valid key received
2022/01/03 16:54:46 ossec-agentd: INFO: Waiting 20 seconds before server connection
2022/01/03 16:55:06 ossec-agentd: INFO: (1410): Reading authentication keys file.
2022/01/03 16:55:06 ossec-agentd: INFO: Closing connection to server (192.168.33.20:1514/tcp).
2022/01/03 16:55:06 ossec-agentd: INFO: Trying to connect to server (192.168.33.20:1514/tcp).
2022/01/03 16:55:06 ossec-agentd: INFO: (4102): Connected to the server (192.168.33.20:1514/tcp).
2022/01/03 16:55:06 ossec-agentd: INFO: Server responded. Releasing lock.
2022/01/03 16:55:06 ossec-agentd: INFO: Agent is now online. Process unlocked, continuing...

UPDATED 02/03/22: Expected behavior, agent needs a restart to dump the information into the client.keys

@palaciosjeremias
Copy link
Contributor

palaciosjeremias commented Dec 13, 2021

WDB testing script.
This is a first iteration of the script that will be used for general commands validation with an example testing script for the sync_agent_groups_get command.
WDB_tester.zip

@MiguelazoDS
Copy link
Member

MiguelazoDS commented Feb 3, 2022

Test ID: gdbup2

Environment

Wazuh Manager:

  • Wazuh 3.13.3 to 4.4.0 (dev-10771)
  • Centos 8
  • Wazuh_DB debug level: 2

Test cases:

pre_upgrade backup generated. Upgrade schemas applied. 🟢

  • Logs.
    1
  • Pre upgrade backup.
    2

pre_upgrade backup failed. 🟢

  • Logs.
    2
  • Global database disabled.
    1

Upgrade schema failed. Restoration success. 🟢

Upgrade schema failed. Restoration failed. 🟢

  • Logs.
    1
  • Restoration failed. Global database is disabled.
    2

@DProvinciani
Copy link
Contributor

DProvinciani commented Feb 3, 2022

Test gdbbak1

First I did a clean installation of the Wazuh manager using the branch dev-10771-agent-groups-files-to-wazuh-db.

image

Then I added the next configuration block to the ossec.conf file.

image

Finally, I restarted the Wazuh manager and verified the settings by using the new interface implemented as part of #11732.

image

Then I removed the configuration block and restarted the Wazuh manager in order to verify that the settings took a default value.

image

Update

The strings used in the JSON output were modified and validated as part of the pull #12095.

@palaciosjeremias
Copy link
Contributor

palaciosjeremias commented Feb 4, 2022

Test gdbup1

SUCCESS

  • First Wazuh v3.9 was installed.

  • After that, an upgrade from packages to Wazuh v4.4 (branch dev-10771-agent-groups-files-to-wazuh-db) was executed.
    image

  • The expected global.db-backup-timestamp-pre_upgrade.gz file was created.
    image

And a new and functional global.db is created.
image

ERROR

  • To test a backup error, the vacuum query was forced to fail with a debugger.

  • After the fail, an ERROR log is registered.
    image

  • And any further command to the global DB is rejected.
    image

@LucioDonda
Copy link
Member

LucioDonda commented Feb 7, 2022

Test gdbup3

  • Wazuh 4.4.0 Installed from .deb and started
    Screenshot from 2022-02-07 09-22-07

  • checked global.db presence and metada table correctnes
    Screenshot from 2022-02-07 09-23-50

  • stop wazuh service and modification of metadata table in order to force error
    Screenshot from 2022-02-07 09-26-12

  • start manager and error
    Screenshot from 2022-02-07 09-27-54

  • Wazuh-db is not running, last error are:

2022/02/07 14:40:45 wazuh-db[7088] wdb_global.c:1746 at wdb_global_create_backup(): INFO: Created Global database backup "backup/db/global.db-backup-2022-02-07-14:40:45-pre_upgrade.gz"
2022/02/07 14:40:45 wazuh-db[7088] wdb_global.c:1791 at wdb_global_remove_old_backups(): INFO: Deleted Global database backup: "backup/db/global.db-backup-2022-02-07-11:33:40-pre_upgrade.gz"
2022/02/07 14:40:45 wazuh-db[7088] wdb_upgrade.c:141 at wdb_upgrade_global(): DEBUG: Updating database 'global' to version 1
2022/02/07 14:40:45 wazuh-db[7088] wdb.c:1331 at wdb_sql_exec(): WARNING: DB(global) wdb_sql_exec returned error: 'duplicate column name: sync_status'
2022/02/07 14:40:45 wazuh-db[7088] wdb.c:1254 at wdb_close(): DEBUG: Couldn't close database for agent global: refcount = 4294967295
2022/02/07 14:40:45 wazuh-db[7088] wdb_upgrade.c:146 at wdb_upgrade_global(): ERROR: Failed to update global.db to version 1. The global.db was restored to the original state.
2022/02/07 14:40:45 wazuh-remoted: INFO: Cannot connect to 'queue/db/wdb': Connection refused (111). Waiting 1 seconds to reconnect.

❌ This issue is getting fixed in this ticket and it shoul be retested.

@MiguelazoDS
Copy link
Member

MiguelazoDS commented Feb 7, 2022

Test ID: gdbbak2

Environment

Wazuh Manager:

  • Ubuntu Focal
  • Wazuh_DB debug level: 2

Test :

  • Change date to next day.
    1
  • Global DB backup created.
    2
  • Repeating process twice.
    4
  • Log.
    image

@pereyra-m
Copy link
Member Author

Test ID: gdbbak3

Environment

Wazuh Manager:

  • Ubuntu 20.04. Branch dev-10771

Issue found

The global.db backup is not created at startup, fixed in #12128

Test : manual backups creation 🟢

The following settings were used

    <wdb>
        <backup database='global'>
            <enabled>yes</enabled>
            <interval>1d</interval>
            <max_files>3</max_files>
        </backup>
    </wdb>

Two manual backups were added

prev_backups

Now that the maximum is reached, a new backup is requested and the oldest is removed

delete_message

after_delete

The same happens when the default <max_files> is increased and some backups have a tag

    <wdb>
        <backup database='global'>
            <enabled>yes</enabled>
            <interval>1d</interval>
            <max_files>5</max_files>
        </backup>
    </wdb>

We have the maximum allowed backups

max_files_increased

And the oldest is removed when a new one is requested

max_files_increased_deletion_message

max_files_increased_deleted

Setting again a low limit cleans all the required snapshots when a new one is requested

    <wdb>
        <backup database='global'>
            <enabled>yes</enabled>
            <interval>1d</interval>
            <max_files>1</max_files>
        </backup>
    </wdb>

only_one_backup_message

only_one_backup

Test : automatic backups creation 🟢

When the automatic backup mechanism reaches to the <max_files> limit, the oldest ones are also deleted

    <wdb>
        <backup database='global'>
            <enabled>yes</enabled>
            <interval>2s</interval>
            <max_files>10</max_files>
        </backup>
    </wdb>

automatic_mechanism_messages

automatic_mechanism

@MiguelazoDS
Copy link
Member

MiguelazoDS commented Feb 8, 2022

Test ID: iu1

Environment

Wazuh Manager:

  • Wazuh 3.13.3 to 4.4.0 (dev-10711)
  • Centos 8

Test case:

  • /var/ossec/backup directory Wazuh version 3.13.3
    1
  • groups folder deleted and new db folder created on upgrade
    image
    2
  • pre-upgrade and first global db backup created and stored inside db folder
    3
  • groups folder deleted and new db folder created on fresh install.
    4

@MiguelazoDS
Copy link
Member

MiguelazoDS commented Feb 8, 2022

Test ID: iu2

Environment

Wazuh Manager:

  • Wazuh 4.0.4 to 4.4.0 (dev-10711)
  • Centos 8

Wazuh Agents:

  • Wazuh 4.0.4
  • ArchLinux
  • Ubuntu Bionic

Test cases:

Prior upgrade:

  • Manager Wazuh version.
    1
  • Registered agents.
    2
  • Agent group information.
    3
  • Agent group files content.
    4

Post upgrade:

  • Manager Wazuh version and registered agents.
    5
  • agent-groups directory no longer exists.
    6
  • query group and belongs tables for agent 1.
    7
  • query group and belongs tables for agent 2.
    8
  • query group column from agent table.
    9

@MiguelazoDS
Copy link
Member

Test ID: gdbup4

Environment

Wazuh Manager:

  • Centos 8
  • Wazuh_DB debug level: 2

Test case:

  • Upgrading from a version that have a global DB with metadata table. (count = 1)
    2
    3
  • Force an read error by deleting the "metadata" table.
    4
  • Error log.
    5

@DProvinciani
Copy link
Contributor

Having completed all the QA instances, we close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment