Add new minimal documentation #8

Signed-off-by: Philippe Ombredanne <[email protected]>
aboutcode-org · Sep 18, 2020 · b67b96f · b67b96f
1 parent fc215d3
commit b67b96f
Show file tree

Hide file tree

Showing 7 changed files with 401 additions and 55 deletions.
diff --git a/docs/installation.rst b/docs/installation.rst
@@ -13,6 +13,7 @@ On the offline install server:
  1. extract the ScanCode.io code,
  2. install dependencies
  3. prepare the database
+
 ::
 
  tar -xf scancodeio-1.0.1.tar.gz && cd scancode.io
@@ -29,6 +30,7 @@ Use as a development environment with::
 
  SCANCODEIO_WORKSPACE_LOCATION=/path/to/scancodeio/workspace/ make run
 
+
 Offline upgrade
 ---------------
 
@@ -49,9 +51,15 @@ On the offline install server:
  2. extract the new ScanCode.io code
  3. install dependencies
  4. migrate the database
+
 ::
 
  mv scancode.io scancode.io-$(date +"%Y-%m-%d_%H%M")
  tar -xf scancodeio-1.0.1.tar.gz && cd scancode.io
  make install
  make migrate
+
+Next Step
+---------
+
+- Getting started with Docker image analysis from the command line `scanpipe-tutorial-1.rst`.
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -0,0 +1,77 @@
+Why ScanCode.io
+===============
+
+Modern software is built from many open source packages assembled with new code.
+Knowing which free and open source code package is in use matters because:
+
+- knowning the license of third-party code is required before using it, and
+- you want to avoid using buggy, outdated or vulnerable components.
+
+Because it is so easy to include and reuse new code downloaded from the internet,
+it is often surprisingly hard to get a proper inventory of all the third-party
+code origins and licenses used in a software project.
+There are some great tools available to scan your code and help uncover these.
+
+And when you reuse only a few FOSS components in a single project, running one
+of these tools (such as the scancode-toolkit) by hand together
+with a spreadsheet may be enough to manage your software composition analysis.
+
+But when you scale up, running automated and reproducible analysis pipelines
+that are adapted to a software project unique context and technology platform is
+difficult. This will require deploying and running multiple specialized tools
+and merge their results with a consistent workflow.
+
+And when reusing thousands of open source packages is becoming commonplace,
+code scans pipelines need to be scripted as code and running on servers backed
+a database, not on a laptop.
+
+For instance when you analyze Docker container images, there could be hundreds
+to thousands of system packages (such as Debian, RPM, Alpine) and application
+packages (such as npm, PyPI, Rubygems, Maven) installed in an image side-by-side
+with your own code.
+
+Taking care of all these can be hard. ScanCode.io can help organize these
+complex code analysis as scripted pipelines and store their results in a uniform
+database for automated code analysis.
+
+
+What is ScanPipe
+----------------
+
+ScanPipe is a developer-friendly framework and application that helps software
+analysts and engineers build and manage real-life software composition analysis
+projects as scripted pipelines.
+
+ScanPipe was originally developed to help boost productivity of code analysts
+who work on a wide variety of software composition analysis projects.
+
+ScanPipe provides a unified framework to the infrastructure that is
+required to execute and organize these software composition analysis projects.
+
+
+## Should I Use ScanPipe?
+
+If you are working on a software composition analysis project, or you
+are planning to start a new one, consider the following questions:
+
+1. **Automation**: Is this project part of a larger compliance program and process (as opposed to a one-of) and do you need automation?
+2. **Complexity**: Does the project use many third-party components or technologies?
+3. **Reproducibility**: Is it important that results are reproducible, traceable and auditable?
+
+If you answered "yes" to any of the above, keep reading - ScanPipe can help you.
+If the answer is "no" to all of the above, which is a valid scenario e.g. when you
+are doing small-scale analysis, ScanPipe may provide only limited benefit for you.
+
+The first set of available pipelines help automate the analysis of Docker
+"container" images and virtual machine (VM) disk images that often harbor
+comprehensive software stacks from an operating system with its kernel through
+system and application packages to original and custom applications.
+
+
+Next step
+---------
+
+- Install ScanCode.io `installation.rst`.
+
+.. Some of this documentation is borrowed from the metaflow documentation and is also under Apache-2.0
+.. Copyright (c) Netflix
diff --git a/docs/scanpipe-api.rst b/docs/scanpipe-api.rst
@@ -0,0 +1,19 @@
+ScanPipe JSON REST API
+======================
+
+
+To get started locally with the API:
+
+1. run the server with::
+
+ make run
+
+2. open your web browser at http://127.0.0.1:8001/
+
+3. Visit the projects APIT endpoint at http://127.0.0.1:8001/api/projects/
+
+From the bottom of this page you can create a new project, add and upload an input
+file and add a pipeline to this project at once.
+
+If you add a pipeline, the pipeline starts immediately on project creation. 
+
diff --git a/docs/scanpipe-command-line.rst b/docs/scanpipe-command-line.rst
@@ -0,0 +1,126 @@
+ScanPipe Commands help
+======================
+
+The main entry point is the `scanpipe` command which is available directly when
+you are in the activated virtualenv or directly at this path: `<scancode.io root dir/bin/scanpipe>` .
+
+
+`$ scanpipe --help`
+-------------------
+
+List all the sub-commands available (including Django built-in commands).
+ScanPipe's own commands are listed under the `[scanpipe]` section. 
+
+For example::
+
+ $ scanpipe --help
+ ...
+ [scanpipe]
+ add-input
+ add-pipeline
+ create-project
+ graph
+ output
+ run
+ ...
+
+
+`$ scanpipe <subcommand> --help`
+--------------------------------
+
+Display help for the provided subcommand.
+
+For example::
+
+ $ scanpipe create-project --help
+ usage: scanpipe create-project [-h] [--pipeline PIPELINES] [--input INPUTS]
+ [--version] [-v {0,1,2,3}]
+ [--settings SETTINGS] [--pythonpath PYTHONPATH]
+ [--traceback] [--no-color] [--force-color]
+ [--skip-checks]
+ name
+ 
+ Create a ScanPipe project.
+ 
+ positional arguments:
+ name Project name.
+
+
+`$ scanpipe create-project <name>`
+----------------------------------
+
+Create a ScanPipe project using <name> as a Project name. The name must
+be unique.
+
+optional arguments:
+
+- `--pipeline PIPELINES` Pipelines locations to add on the project. The
+ pipelines are added and will be running in the order of the provided options.
+
+- `--input INPUTS` Input file locations to copy in the input/ workspace directory.
+
+
+`$ scanpipe add-input --project PROJECT <input ...>`
+----------------------------------------------------
+
+Copy the file found at the <input> path to the project named <PROJECT> workspace 
+"input" directory. You can use more than one <input> to copy multiple files at once.
+
+For example, assuming you have created beforehand a project named foo, this will
+copy `~/docker/alpine-base.tar` to the foo project input directory::
+
+ $ scanpipe add-input --project foo ~/docker/alpine-base.tar
+
+
+`$ scanpipe add-pipeline --project PROJECT <pipeline ...>`
+----------------------------------------------------------
+
+Add the <pipeline> foudn at this location to the project named <PROJECT>.
+You can use more than one <pipeline> to add multiple pipelines at once.
+The pipelines are added and will be running in the order of the provided options.
+
+For example, assuming you have created beforehand a project named foo, this will
+add the docker pipeline to your project::
+
+ $ scanpipe add-pipeline --project foo scanpipe/piplines/docker.py
+
+
+`$ scanpipe run --project PROJECT`
+----------------------------------
+
+Run all the pipelines of the project named <PROJECT>.
+
+
+`$ scanpipe run --project PROJECT --show`
+-----------------------------------------
+
+List all the pipelines added of the project named <PROJECT>.
+
+
+
+`$ scanpipe output --project PROJECT <output_file>`
+---------------------------------------------------
+
+Output the results of the project named <PROJECT> to the <output_file> as JSON.
+
+
+
+`$ scanpipe graph <pipeline ...>`
+---------------------------------
+
+Generate a pipeline graph image as PNG (using Graphviz). The graphic will name
+after the pipeline name with a .png extension.
+
+optional arguments:
+
+- `--output OUTPUT` Alternative output directory location to use. The
+ default is to create the image in the scancode.io root directory. 
+
+
+Next step
+---------
+
+- Explore ScanPipe Concepts `scanpipe-concepts.rst`.
+
+
+
diff --git a/docs/scanpipe-concepts.rst b/docs/scanpipe-concepts.rst
@@ -0,0 +1,88 @@
+ScanPipe Concepts
+=================
+
+Project
+-------
+
+A project is the encapsulates the analysis of software code:
+
+- it has a workspace which is a directory that contains the software code files under analysis
+- it is related to one or more code analysis pipelines scripts to automate its analysis
+- it tracks the project Codebase Resources e.g. its code files and directories
+- it tracks the project Discovered Packages e.g. its the system and application packages origin and license discovered in the codebase
+
+Multiple analysis pipelines can be run on a single project.
+
+In the database, a project is identified by its unique name.
+
+
+Project workspace
+-----------------
+
+A project workspace is the root directory where all the project files are stored.
+
+The following directories exists under this directory:
+
+- `input/` contains all the original uploaded and input files used of the project. For instance, it could be a codebase archive.
+- `codebase/` contains the files and directories (aka. resources) tracked as CodebaseResource records in the database.
+- `output/` contains all output files created by the pipelines: reports, scan results, etc.
+- `tmp/` is a scratch pad for temporary files generated during the pipelines runs.
+
+
+Pipelines
+---------
+
+A pipeline is a Python script that contains a series of steps from start to end
+to run in order perform a code analysis.
+
+It usually starts from the uploaded input files, and may extract these then
+generates CodebaseResource records in the database accordingly.
+
+Those resources can then be analyzed, scanned, matched as needed.
+Analysis results and reports are evetually posted at the end of pipeline run
+
+For now, all pipelines are located in the `scanpipe.pipelines` module.
+Each pipeline consist of a Python script including one subclass of the "Pipeline" class.
+Each step is a method of the Pipeline class decorated with @step decorator.
+At its end, a step states which is the next step to execute.
+
+One or more pipelines can be assigned to a project as a sequence. 
+If the one pipeline of a sequence completes successfully, the next pipeline in
+queue for this project is run automatically until all pipelines are executed.
+
+
+Codebase Resources
+------------------
+
+A project Codebase Resources are records of its code files and directories.
+CodebaseResource is a database model and each record is identified by its path
+under the project workspace.
+
+Some of the CodebaseResource interesting attributes are:
+
+- a status used to track the analysis status for this resource.
+- a type (such as file, directory or symlink)
+- various attributes to track detected copyrights, license expressions, copyright holders, related packages.
+
+In general the attributes and their names are the same that are used in ScanCode-Toolkit for files.
+
+
+Discovered Packages
+-------------------
+
+A project Discovered Packages are records of the system and application packages
+discovered in its code.
+DiscoveredPackage is a database model and each record is identified by its Package URL.
+Package URL is a grassroot efforts to create informative identifiers for software
+packages such as Debian, RPM, npm, Maven PyPI packages. See https:/package-url for details.
+
+
+Some of the DiscoveredPackage interesting attributes are:
+
+- type, name, version (all Package URL attributes)
+- homepage_url, download_url and other URLs
+- checksums (such as SHA1, MD5)
+- copyright, license_expression, declared_license
+
+
+In general the attributes and their names are the same that are used in ScanCode-Toolkit for packages.