Skip to content

Latest commit

 

History

History
380 lines (274 loc) · 25.5 KB

OverallArchitecture.md

File metadata and controls

380 lines (274 loc) · 25.5 KB

UnitTestBot overall architecture

Get the bird's-eye view of UnitTestBot overall architecture in the following chart. Check the purpose of each component in the descriptions below.

flowchart TB
    subgraph Clients
        direction LR
        IntellijPlugin        
        MavenPlugin["Maven/Gradle plugins"]
        GithubAction-->MavenPlugin
        ExternalJavaClient[\External Java Client\]
        CLI
        ContestEstimator    
                     
    end    

    subgraph Facades
        direction LR
        EngineMain[[EngineMain]]
        UtBotJavaApi[[UtBotJavaApi]]
        GenerateTestsAndSarifReport[[GenerateTestsAndSarifReport]]       
    end
    IntellijPlugin--Rd-->EngineMain
    MavenPlugin-->GenerateTestsAndSarifReport
    ExternalJavaClient-->UtBotJavaApi

    subgraph API["Generation API"]
        direction LR
        TestCaseGenerator[[TestCaseGenerator]]
        CodeGenerator[[CodeGenerator]]
    end
    Facades-->API
    CLI-->API
    ContestEstimator-->API
    
        

    subgraph Components
        direction LR
        SymbolicEngine-->jlearch
        SymbolicEngine-->ConcreteExecutor
        Fuzzer-->ConcreteExecutor
        SarifReport
        Minimizer
        CodeRenderer
        Summaries
        jlearch
    end    
    CodeGenerator-->CodeRenderer
    TestCaseGenerator-->SymbolicEngine
    TestCaseGenerator-->Fuzzer
    TestCaseGenerator-->Minimizer
    TestCaseGenerator-->Summaries
    GenerateTestsAndSarifReport-->SarifReport

    UTSettings((UTSettings))
    UTSettings<--Rd/local-->Components    
    UTSettings<---->Clients
    TestCaseGenerator--warmup-->ConcreteExecutor
    ConcreteExecutor--Rd-->InstrumentedProcess

Loading

Typical interaction between components

An interaction diagram for UnitTesBot components is presented below. See how it starts from IntelliJ IDEA plugin UI and follow the flow.

sequenceDiagram
    autonumber
    actor user as User
    participant ij as IDE process
    participant engine as Engine process
    participant concrete as Instrumented process
    
    user ->> ij: Invoke "Generate Tests with UnitTestBot"
    ij ->> ij: Calculate methods, framework to show
    ij ->> user: Show UI

    break User clicked "Cancel"
        user -->> user: Exit
    end
    user ->> ij: Click "Generate Tests"
    ij ->> ij: Calculate what JAR needs to be installed
        
    opt Some JARs need to be installed  
            ij ->> ij: Install JARs
    end

    ij ->> engine: Start process
    activate engine
    ij ->> engine: Setup up context
    
    loop For all files
        ij ->> engine: Generate UtExecution models
        loop For all UtExecution models: for the method found by engine
            engine ->> concrete: Run concretely            
        end
        engine --> engine: Minimize the number of tests for the method
        engine --> engine: Generate summaries for the method
    end
    ij ->> engine: Render code
    engine ->> ij: File with tests
    deactivate engine

Loading

Clients

IntelliJ IDEA plugin

The plugin provides

  • a UI for the IntelliJ-based IDEs to use UnitTestBot directly from source code,
  • the linkage between IntelliJ Platform API and UnitTestBot API,
  • support for the most popular programming languages and frameworks for end users (the plugin and its optional dependencies are described in plugin.xml and nearby, in the META-INF folder).

The main plugin module is utbot-intellij, providing support for Java and Kotlin.
Also, there is an auxiliary utbot-ui-commons module to support providers for other languages.

As for the UI, there are two entry points:

The main plugin-specific features are:

  • A common action for generating tests right from the editor or a project tree — with a generation scope from a single method up to the whole source root. See GenerateTestAction — the same for all supported languages.
  • Auto-installation of the user-chosen testing framework as a project library dependency (JUnit 4, JUnit 5, and TestNG are supported). See UtIdeaProjectModelModifier and the Maven-specific version: UtMavenProjectModelModifier.
  • Suggesting the location for a test source root and auto-generating the utbot_tests folder there, providing users with a sandbox in their code space.
  • Optimizing generated code with IDE-provided intentions (experimental). See IntentionHelper for details.
  • An option for distributing generation time between symbolic execution and fuzzing explicitly.
  • Running generated tests while showing coverage with the IDE-provided measurement tools. See RunConfigurationHelper for implementation.
  • Displaying the UnitTestBot-found code defects as IntelliJ-specific inspections and quickfixes in the "Problems" tool window. See the inspection package.
  • Two kinds of Javadoc comments in the generated code: rendered as plain text and structured via custom tags. See the javadoc package.
  • A self-documenting settings.properties file with the settings for low-level UnitTestBot tuning.

Gradle/Maven plugins

Plugins for Gradle/Maven build systems provide the UnitTestBot GenerateTestsAndSarifReportFacade component with the user-chosen settings (test generation timeout, testing framework, etc.). This component runs test generation and creates SARIF reports.

For more information on the plugins, please refer to the detailed design documents:

You can find the modules here:

GitHub Action

UnitTestBot GitHub Action displays the detected code defects in the GitHub "Security Code Scanning Alerts" section.

You can find UnitTestBot GitHub Action in the separate repository.

UnitTestBot GitHub Action uses the UnitTestBot Gradle plugin to run UnitTestBot on the user's repository and imports the SARIF output into the "Security Code Scanning Alerts" section. Please note that at the moment this action cannot work with Maven projects because the UnitTestBot Maven plugin has not been published yet.

For more information on UnitTestBot GitHub Action, please refer to the related docs. You can also find a detailed usage example.

Command-line interface (CLI)

With CLI, one can run UnitTestBot from the command line.

CLI implementation is placed in the utbot-cli module. UnitTestBot CLI has two main commands: generate and run — use --help to find more info on their options.

An executable CLI is distributed as a JAR file.

We provide Linux Docker images containing CLI. They are stored on GitHub Packages.

Contest estimator

Contest estimator runs UnitTestBot on the provided projects and returns the generation statistics such as instruction coverage.

Contest estimator is placed in the utbot-junit-contest module and has two entry points:

  • ContestEstimator.kt is the main entry point. It runs UnitTestBot on the specified projects, calculates statistics for the target classes and projects, and outputs them to a console.
  • StatisticsMonitoring.kt is an additional entry point, which does the same as the previous one but can be configured from a file and dumps the resulting statistics to a file. It is used to monitor and chart statistics nightly.

Components

Symbolic engine

Symbolic engine is a component maintaining the whole analysis process: from the moment it takes information about a method under test (MUT) till the moment it returns a set of execution results along with the information required to reproduce the MUT's execution paths. The engine uses symbolic execution to explore the paths in a MUT's control flow graph (CFG).

The technique is rather complex, so the engine consists of several subcomponents, each responsible for a certain part of the analysis:

  • UtBotSymbolicEngine.kt contains a UtBotSymbolicEngine class — it manages interaction between different parts of the system and controls the analysis flow. This class is an entry point for symbolic execution in UnitTestBot. Using UtBotSymbolicEngine API, the users or UnitTestBot components can start, suspend or stop the analysis. UtBotSymbolicEngine provides a connection between the components by taking execution states from PathSelector and pushing them either into Traverser or in ConcreteExecutor, depending on their status and settings. In a few words, the pipeline looks like this: the engine takes a state from PathSelector, pushes it into Traverser, and then gets an updated state from it. If this state is not terminal, UtBotSymbolicEngine pushes it to the queue in the path selector. If this state is terminal, it either calls ConcreteExecutor to get a concrete result state or yields a symbolic result into the resulting flow.
  • PathSelector is a class making a decision on which instruction of the program should be processed next. It is located in PathSelector.kt, but the whole selectors package is related to it. PathSelector has a pretty simple interface: it can put a state in the queue or return a state from that queue.
  • Traverser processes a given state. It contains information about CFG, a hierarchy of classes in the program, a symbolic type system, and mocking information. Traverser is the most important class in the symbolic engine module. It knows how to process instructions in CFG, how to update the dependent symbolic memory, and which constraints should be added to go through a certain path. Having processed an instruction from the given state, Traverser creates a new one, with updated memory and path constraints.

There are other important classes in the symbolic engine subsystem. Here are some of them:

  • Memory is responsible for the symbolic memory representation of a state in the program.
  • TypeRegistry contains information about a type system.
  • Mocker decides whether we should mock an object or not.

You can find all the engine-related classes in the engine module.

Concrete executor

ConcreteExecutor is the input point for the instrumented process used by UnitTestBot symbolic engine and fuzzer. This class provides a smooth and concise interaction between the instrumented process and a user, whereas the instrumented process executes a given function with the supplied arguments.

ConcreteExecutor is parameterized with Instrumentation and its return type via the generic arguments. Instrumentation is an interface, so inheritors have to implement the logic of a specific method invocation in an isolated environment as well as the transform function used for instrumenting classes. For our purposes, we use UtExecutionInstrumentation.

The main ConcreteExecutor function is

suspend fun executeAsync(
    kCallable: KCallable<*>,
    arguments: Array<Any?>,
    parameters: Any?
): TResult

It serializes the arguments and some parameters (e.g., static fields), sends it to the instrumented process and retrieves the result.

Internally, ConcreteExecutor uses Rd for interprocess communication and Kryo for objects serialization. You don't need to provide a marshaller, as Kryo serializes the objects (sometimes it fails).

ConcreteExecutor is placed in the utbot-instrumentation module. You can find the corresponding tests in the utbot-instrumentation-tests module.

Instrumented process

Instrumented process concretely runs the user functions with the specified arguments and returns the result of execution.

Additionally, Instrumented process evaluates instruction coverage and mocks function invocations and creating instances via the user's Java bytecode instrumentation.

The main concept is instrumentation. Instrumentation is a class that implements the corresponding interface. It transforms the user code and provides invoking user functions.

Instrumented process supports the following commands described in InstrumentedProcessModel.kt:

  • AddPaths tells where the Instrumented process should search for the user classes.
  • Warmup forces loading and instrumenting user classes.
  • SetInstrumentation tells which instrumentation the Instrumented process should use.
  • InvokeMethodCommand requests the Instumented process to invoke a given user function and awaits the results.
  • StopProcess just stops the Instrumented process.
  • CollectCoverage requests collecting code coverage for the specified class.
  • ComputeStaticField requests the specified static field value.

These commands are delivered from the other processes by Rd.

The main instrumentation of UnitTestBot is UtExecutionInstrumentation.

Code generator

Code generation and rendering are a part of the test generation process in UnitTestBot. UnitTestBot gets the synthetic representation of generated test cases from the fuzzer or the symbolic engine. This representation (or model) is implemented in the UtExecution class. The codegen module generates the real test code based on this UtExecution model and renders it in a human-readable form.

The codegen module

  • converts UtExecution test information into an Abstract Syntax Tree (AST) representation using CodeGenerator,
  • renders this AST according to the requested configuration (considering programming language, testing framework, mocking and parameterization options) using renderer.

The codegen entry points are:

  • CodeGenerator.generateAsString()
  • CodeGenerator.generateAsStringWithTestReport()

The detailed implementation is described in the Code generation and rendering design doc.

Fuzzer

Fuzzing is a versatile technique for "guessing" values to be used as method arguments. To generate these kinds of values, the fuzzer uses generators, mutators, and predefined values.

Fuzzing has been previously implemented in UnitTestBot as the solution for Java. For now, we have developed the generic platform that supports generating fuzzing tests for different languages. The Java fuzzing solution in UnitTestBot is marked as deprecated — it will soon be replaced with the fuzzing platform.

You can find the relevant code here:

  • utbot-fuzzing is the general fuzzing platform module. The related API is located in org/utbot/fuzzing/Api.kt.
  • utbot-fuzzer is the module with the fuzzing solution for Java. Find the corresponding API in org/utbot/fuzzing/JavaLanguage.kt.

Entry point for generating values (Java): org/utbot/fuzzing/JavaLanguage.kt#runJavaFuzzing(...)

You can find the detailed description in the Fuzzing Platform design doc.

Minimizer

Minimization is used to decrease the amount of UtExecution instances without decreasing coverage.

The entry point is the minimizeTestCase function. It receives a set of UtExecution instances and a grouping function (grouping by UtExecution::utExecutionResult). Then the minimization procedure divides the set of UtExecution instances into several groups. Each group is minimized independently.

We have different groups — here are some of them:

  • A successful regression suite that consists of UtSuccess and UtExplicitlyThrownException executions.
  • An error suite consisting of UtImplicitlyThrownException executions.
  • A timeout suite that consists of UtTimeoutException executions.
  • A crash suite consisting of executions where some parts of the engine failed.

Each UtExecution instance provided should have coverage information, otherwise we add this execution to the test suite instantly. Coverage data are usually obtained from the instrumented process and consist of covered lines.

To minimize the number of executions in a group, we use a simple greedy algorithm:

  1. Pick an execution that covers the maximum number of the previously uncovered lines.
  2. Add this execution to the final suite and mark new lines as covered.
  3. Repeat the first step and continue till there are executions containing uncovered lines.

The whole minimization procedure is located in the org.utbot.framework.minimization package inside the utbot-framework module.

Summarization module

Summarization is the process of generating detailed test descriptions consisting of

  • test method names
  • testing framework annotations (including @DisplayName)
  • Javadoc comments for tests
  • cluster comments for groups of tests (Regions)

Each of these description elements can be turned off via {userHome}/.utbot/settings.properties (which gets information from UtSettings.kt). If the summarization process fails due to an error or insufficient information, then the test method receives a unique name and no additional meta-information.

This meta-information is generated for each type of UtExecution model and thus may vary significantly. Also, Javadoc comments can be rendered in two styles: as plain text or in a special format enriched with the custom Javadoc tags.

The whole summarization subsystem is located in the utbot-summary module.

For detailed information, please refer to the Summarization architecture design document.

SARIF report generator

SARIF (Static Analysis Results Interchange Format) is a JSON-based format for displaying static analysis results.

All the necessary information about the format and its usage can be found in the official documentation and in the related GitHub Docs.

In UnitTestBot, the SarifReport class is responsible for generating SARIF reports. We use them to display UnitTestBot-detected errors such as unchecked exceptions, overflows, assertion errors, etc.

For example, for the class below

public class Main {
    int example(int x) {
        return 1 / x;
    }
}

UnitTestBot creates a report containing the following information:

  • java.lang.ArithmeticException: / by zero may occur in line 3
  • The exception occurs if x == 0
  • To reproduce this error, the user can run the generated MainTest.testExampleThrowsAEWithCornerCase test
  • The exception stack trace:
    • Main.example(Main.java:3)
    • MainTest.testExampleThrowsAEWithCornerCase(MainTest.java:39)

Cross-cutting subsystems

Rd

UnitTestBot consists of three processes (according to the execution order):

  • IDE process — the process where the plugin part executes.
  • Engine process — the process where the test generation engine executes.
  • Instrumented process — the process where concrete execution takes place.

These processes are built on top of the Reactive distributed communication framework (Rd) developed by JetBrains.

One of the main Rd concepts is a Lifetime — it helps to release shared resources upon the object's termination. You can find the Rd basic ideas and UnitTestBot implementation details in the Multiprocess architecture design doc.

Settings

In UnitTestBot, settings are the set of properties. Each property is a key=value pair and affects some important aspect of UnitTestBot behavior. UnitTestBot as an IntelliJ IDEA plugin, a CI-tool, or a CLI-tool has low-level core settings. The UnitTestBot plugin also has per-project plugin-specific settings.

Core settings are persisted in the settings file: {userHome}/.utbot/settings.properties. This file is designed for reading only. The defaults for the core settings are provided in source code (UtSettings.kt).

The plugin-specific settings are stored per project in the plugin configuration file: {projectDir}/.idea/utbot-settings.xml. Nobody is expected to edit this file manually.

The end user has three places to change UnitTestBot behavior:

  1. A {userHome}/.utbot/settings.properties file — for global settings.
  2. Plugin settings UI (File > Settings > Tools > UnitTestBot) — for per-project settings.
  3. Controls in the Generate Tests with UnitTestBot window dialog — for per-generation settings.

Logging

The UnitTestBot Java logging system is implemented across the IDE process, the Engine process, and the Instrumented process.

UnitTestBot Java logging relies on log4j2 library. The custom Rd logging system is recommended as the default one for the Instrumented process.

In the Logging document, you can find how to configure the logging system when UnitTestBot Java is used

  • as an IntelliJ IDEA plugin,
  • as Contest estimator or the Gradle/Maven plugins, via CLI or during the CI test runs.

Implementation details, log level and performance questions are also addressed here.