diff --git a/CHANGELOG.md b/CHANGELOG.md index df374286f2b..869464ac31c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,10 @@ ## dbt next (release TBD) + +### Breaking changes +- Added a new dbt_project.yml version format. This emits a deprecation warning currently, but support for the existing version will be removed in a future dbt version ([#2300](https://github.com/fishtown-analytics/dbt/issues/2300), [#2312](https://github.com/fishtown-analytics/dbt/pull/2312)) +- The `graph` object available in some dbt contexts now has an additional member `sources` (along side the existing `nodes`). Sources have been removed from `nodes` and added to `sources` instead ([#2312](https://github.com/fishtown-analytics/dbt/pull/2312)) +- The 'location' field has been removed from bigquery catalogs ([#2382](https://github.com/fishtown-analytics/dbt/pull/2382)) + ### Features - Added --fail-fast argument for dbt run and dbt test to fail on first test failure or runtime error. ([#1649](https://github.com/fishtown-analytics/dbt/issues/1649), [#2224](https://github.com/fishtown-analytics/dbt/pull/2224)) - Support for appending query comments to SQL queries. ([#2138](https://github.com/fishtown-analytics/dbt/issues/2138), [#2199](https://github.com/fishtown-analytics/dbt/pull/2199)) @@ -9,7 +15,13 @@ - Users can supply paths as arguments to `--models` and `--select`, either explicitily by prefixing with `path:` or implicitly with no prefix. ([#454](https://github.com/fishtown-analytics/dbt/issues/454), [#2258](https://github.com/fishtown-analytics/dbt/pull/2258)) - dbt now builds the relation cache for "dbt compile" and "dbt ls" as well as "dbt run" ([#1705](https://github.com/fishtown-analytics/dbt/issues/1705), [#2319](https://github.com/fishtown-analytics/dbt/pull/2319)) - Snowflake now uses "show terse objects" to build the relations cache instead of selecting from the information schema ([#2174](https://github.com/fishtown-analytics/dbt/issues/2174), [#2322](https://github.com/fishtown-analytics/dbt/pull/2322)) +- Snowflake now uses "describe table" to get the columns in a relation ([#2260](https://github.com/fishtown-analytics/dbt/issues/2260), [#2324](https://github.com/fishtown-analytics/dbt/pull/2324)) +- Add a 'depends_on' attribute to the log record extra field ([#2316](https://github.com/fishtown-analytics/dbt/issues/2316), [#2341](https://github.com/fishtown-analytics/dbt/pull/2341)) +- Added a '--no-browser' argument to "dbt docs serve" so you can serve docs in an environment that only has a CLI browser which would otherwise deadlock dbt ([#2004](https://github.com/fishtown-analytics/dbt/issues/2004), [#2364](https://github.com/fishtown-analytics/dbt/pull/2364)) - Snowflake now uses "describe table" to get the columns in a relation ([#2260](https://github.com/fishtown-analytics/dbt/issues/2260), [#2324](https://github.com/fishtown-analytics/dbt/pull/2324)) +- Sources (and therefore freshness tests) can be enabled and disabled via dbt_project.yml ([#2283](https://github.com/fishtown-analytics/dbt/issues/2283), [#2312](https://github.com/fishtown-analytics/dbt/pull/2312), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357)) +- schema.yml files are now fully rendered in a context that is aware of vars declared in from dbt_project.yml files ([#2269](https://github.com/fishtown-analytics/dbt/issues/2269), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357)) +- Sources from dependencies can be overridden in schema.yml files ([#2287](https://github.com/fishtown-analytics/dbt/issues/2287), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357)) ### Fixes - When a jinja value is undefined, give a helpful error instead of failing with cryptic "cannot pickle ParserMacroCapture" errors ([#2110](https://github.com/fishtown-analytics/dbt/issues/2110), [#2184](https://github.com/fishtown-analytics/dbt/pull/2184)) @@ -22,6 +34,11 @@ - Return error message when profile is empty in profiles.yml. ([#2292](https://github.com/fishtown-analytics/dbt/issues/2292), [#2297](https://github.com/fishtown-analytics/dbt/pull/2297)) - Fix skipped node count in stdout at the end of a run ([#2095](https://github.com/fishtown-analytics/dbt/issues/2095), [#2310](https://github.com/fishtown-analytics/dbt/pull/2310)) - Fix an issue where BigQuery incorrectly used a relation's quote policy as the basis for the information schema's include policy, instead of the relation's include policy. ([#2188](https://github.com/fishtown-analytics/dbt/issues/2188), [#2325](https://github.com/fishtown-analytics/dbt/pull/2325)) +- Fix "dbt deps" command so it respects the "--project-dir" arg if specified. ([#2338](https://github.com/fishtown-analytics/dbt/issues/2338), [#2339](https://github.com/fishtown-analytics/dbt/issues/2339)) +- On `run_cli` API calls that are passed `--vars` differing from the server's `--vars`, the RPC server rebuilds the manifest for that call. ([#2265](https://github.com/fishtown-analytics/dbt/issues/2265), [#2363](https://github.com/fishtown-analytics/dbt/pull/2363)) +- Fix "Object of type Decimal is not JSON serializable" error when BigQuery queries returned numeric types in nested data structures ([#2336](https://github.com/fishtown-analytics/dbt/issues/2336), [#2348](https://github.com/fishtown-analytics/dbt/pull/2348)) +- No longer query the information_schema.schemata view on bigquery ([#2320](https://github.com/fishtown-analytics/dbt/issues/2320), [#2382](https://github.com/fishtown-analytics/dbt/pull/2382)) +- Add support for `sql_header` config in incremental models ([#2136](https://github.com/fishtown-analytics/dbt/issues/2136), [#2200](https://github.com/fishtown-analytics/dbt/pull/2200)) ### Under the hood - Added more tests for source inheritance ([#2264](https://github.com/fishtown-analytics/dbt/issues/2264), [#2291](https://github.com/fishtown-analytics/dbt/pull/2291)) @@ -33,6 +50,8 @@ Contributors: - [@jeremyyeo](https://github.com/jeremyyeo) [#2259](https://github.com/fishtown-analytics/dbt/pull/2259) - [@rodrigodelmonte](https://github.com/rodrigodelmonte) [#2298](https://github.com/fishtown-analytics/dbt/pull/2298) - [@sumanau7](https://github.com/sumanau7) ([#2279](https://github.com/fishtown-analytics/dbt/pull/2279), [#2263](https://github.com/fishtown-analytics/dbt/pull/2263), [#2297](https://github.com/fishtown-analytics/dbt/pull/2297)) + - [@nickwu241](https://github.com/nickwu241) [#2339](https://github.com/fishtown-analytics/dbt/issues/2339) + - [@Fokko](https://github.com/Fokko) [#2361](https://github.com/fishtown-analytics/dbt/pull/2361) ## dbt 0.16.1 (April 14, 2020) @@ -572,6 +591,7 @@ Over a dozen contributors wrote code for this release of dbt! Thanks for taking - [@josegalarza](https://github.com/josegalarza) ([#1571](https://github.com/fishtown-analytics/dbt/pull/1571)) - [@rmgpinto](https://github.com/rmgpinto) ([docs#31](https://github.com/fishtown-analytics/dbt-docs/pull/31), [docs#32](https://github.com/fishtown-analytics/dbt-docs/pull/32)) - [@groodt](https://github.com/groodt) ([docs#34](https://github.com/fishtown-analytics/dbt-docs/pull/34)) +- [@dcereijodo](https://github.com/dcereijodo) ([#2341](https://github.com/fishtown-analytics/dbt/pull/2341)) ## dbt 0.13.1 (May 13, 2019) diff --git a/core/dbt/adapters/base/__init__.py b/core/dbt/adapters/base/__init__.py index b4ddf791159..c0a8f01ebd4 100644 --- a/core/dbt/adapters/base/__init__.py +++ b/core/dbt/adapters/base/__init__.py @@ -10,5 +10,5 @@ SchemaSearchMap, ) from dbt.adapters.base.column import Column # noqa -from dbt.adapters.base.impl import BaseAdapter # noqa +from dbt.adapters.base.impl import AdapterConfig, BaseAdapter # noqa from dbt.adapters.base.plugin import AdapterPlugin # noqa diff --git a/core/dbt/adapters/base/impl.py b/core/dbt/adapters/base/impl.py index ca420f459d6..b7ea8212e87 100644 --- a/core/dbt/adapters/base/impl.py +++ b/core/dbt/adapters/base/impl.py @@ -1,10 +1,12 @@ import abc from concurrent.futures import as_completed, Future from contextlib import contextmanager +from dataclasses import dataclass from datetime import datetime +from itertools import chain from typing import ( - Optional, Tuple, Callable, Iterable, FrozenSet, Type, Dict, Any, List, - Mapping, Iterator, Union, Set + Optional, Tuple, Callable, Iterable, Type, Dict, Any, List, Mapping, + Iterator, Union, Set ) import agate @@ -23,6 +25,7 @@ from dbt.contracts.graph.compiled import CompileResultNode, CompiledSeedNode from dbt.contracts.graph.manifest import Manifest from dbt.contracts.graph.parsed import ParsedSeedNode +from dbt.contracts.graph.model_config import BaseConfig from dbt.exceptions import warn_or_error from dbt.node_types import NodeType from dbt.logger import GLOBAL_LOGGER as logger @@ -105,6 +108,11 @@ def _relation_name(rel: Optional[BaseRelation]) -> str: return str(rel) +@dataclass +class AdapterConfig(BaseConfig): + pass + + class BaseAdapter(metaclass=AdapterMeta): """The BaseAdapter provides an abstract base class for adapters. @@ -147,7 +155,7 @@ class BaseAdapter(metaclass=AdapterMeta): # A set of clobber config fields accepted by this adapter # for use in materializations - AdapterSpecificConfigs: FrozenSet[str] = frozenset() + AdapterSpecificConfigs: Type[AdapterConfig] = AdapterConfig def __init__(self, config): self.config = config @@ -282,7 +290,11 @@ def _get_cache_schemas( lowercase strings. """ info_schema_name_map = SchemaSearchMap() - for node in manifest.nodes.values(): + nodes: Iterator[CompileResultNode] = chain( + manifest.nodes.values(), + manifest.sources.values(), + ) + for node in nodes: if exec_only and node.resource_type not in NodeType.executable(): continue relation = self.Relation.create_from(self.config, node) diff --git a/core/dbt/adapters/factory.py b/core/dbt/adapters/factory.py index 439931e49e3..8cc8d20ae39 100644 --- a/core/dbt/adapters/factory.py +++ b/core/dbt/adapters/factory.py @@ -7,7 +7,7 @@ from dbt.logger import GLOBAL_LOGGER as logger from dbt.contracts.connection import Credentials, AdapterRequiredConfig -from dbt.adapters.base.impl import BaseAdapter +from dbt.adapters.base.impl import BaseAdapter, AdapterConfig from dbt.adapters.base.plugin import AdapterPlugin @@ -40,6 +40,12 @@ def get_relation_class_by_name(self, name: str) -> Type[BaseRelation]: adapter = self.get_adapter_class_by_name(name) return adapter.Relation + def get_config_class_by_name( + self, name: str + ) -> Type[AdapterConfig]: + adapter = self.get_adapter_class_by_name(name) + return adapter.AdapterSpecificConfigs + def load_plugin(self, name: str) -> Type[Credentials]: # this doesn't need a lock: in the worst case we'll overwrite PACKAGES # and adapter_type entries with the same value, as they're all @@ -137,6 +143,10 @@ def get_adapter_class_by_name(name: str) -> Type[BaseAdapter]: return FACTORY.get_adapter_class_by_name(name) +def get_config_class_by_name(name: str) -> Type[AdapterConfig]: + return FACTORY.get_config_class_by_name(name) + + def get_relation_class_by_name(name: str) -> Type[BaseRelation]: return FACTORY.get_relation_class_by_name(name) diff --git a/core/dbt/clients/agate_helper.py b/core/dbt/clients/agate_helper.py index 9e26dc88a71..29f285edac3 100644 --- a/core/dbt/clients/agate_helper.py +++ b/core/dbt/clients/agate_helper.py @@ -4,6 +4,7 @@ import datetime import isodate import json +import dbt.utils from typing import Iterable, List, Dict, Union, Optional, Any from dbt.exceptions import RuntimeException @@ -92,7 +93,7 @@ def table_from_data_flat(data, column_names: Iterable[str]) -> agate.Table: row = [] for value in list(_row.values()): if isinstance(value, (dict, list, tuple)): - row.append(json.dumps(value)) + row.append(json.dumps(value, cls=dbt.utils.JSONEncoder)) else: row.append(value) rows.append(row) diff --git a/core/dbt/clients/jinja.py b/core/dbt/clients/jinja.py index 8ce285c605e..3ea08256434 100644 --- a/core/dbt/clients/jinja.py +++ b/core/dbt/clients/jinja.py @@ -4,7 +4,9 @@ import re import tempfile import threading +from ast import literal_eval from contextlib import contextmanager +from itertools import chain, islice from typing import ( List, Union, Set, Optional, Dict, Any, Iterator, Type, NoReturn ) @@ -102,9 +104,51 @@ class NativeSandboxEnvironment(MacroFuzzEnvironment): code_generator_class = jinja2.nativetypes.NativeCodeGenerator +def quoted_native_concat(nodes): + """This is almost native_concat from the NativeTemplate, except in the + special case of a single argument that is a quoted string and returns a + string, the quotes are re-inserted. + """ + head = list(islice(nodes, 2)) + + if not head: + return None + + if len(head) == 1: + raw = head[0] + else: + raw = "".join([str(v) for v in chain(head, nodes)]) + + try: + result = literal_eval(raw) + except (ValueError, SyntaxError, MemoryError): + return raw + + if len(head) == 1 and len(raw) > 2 and isinstance(result, str): + return _requote_result(raw, result) + else: + return result + + class NativeSandboxTemplate(jinja2.nativetypes.NativeTemplate): # mypy: ignore environment_class = NativeSandboxEnvironment + def render(self, *args, **kwargs): + """Render the template to produce a native Python type. If the + result is a single node, its value is returned. Otherwise, the + nodes are concatenated as strings. If the result can be parsed + with :func:`ast.literal_eval`, the parsed value is returned. + Otherwise, the string is returned. + """ + vars = dict(*args, **kwargs) + + try: + return quoted_native_concat( + self.root_render_func(self.new_context(vars)) + ) + except Exception: + return self.environment.handle_exception() + NativeSandboxEnvironment.template_class = NativeSandboxTemplate # type: ignore @@ -425,7 +469,7 @@ def render_template(template, ctx: Dict[str, Any], node=None) -> str: return template.render(ctx) -def _requote_result(raw_value, rendered): +def _requote_result(raw_value: str, rendered: str) -> str: double_quoted = raw_value.startswith('"') and raw_value.endswith('"') single_quoted = raw_value.startswith("'") and raw_value.endswith("'") if double_quoted: @@ -451,12 +495,7 @@ def get_rendered( capture_macros=capture_macros, native=native, ) - - result = render_template(template, ctx, node) - - if native and isinstance(result, str): - result = _requote_result(string, result) - return result + return render_template(template, ctx, node) def undefined_error(msg) -> NoReturn: diff --git a/core/dbt/compilation.py b/core/dbt/compilation.py index c35ce5d5f49..e02a21f2b18 100644 --- a/core/dbt/compilation.py +++ b/core/dbt/compilation.py @@ -1,4 +1,3 @@ -import itertools import os from collections import defaultdict from typing import List, Dict, Any @@ -11,8 +10,8 @@ from dbt.linker import Linker from dbt.context.providers import generate_runtime_model +from dbt.contracts.graph.compiled import NonSourceNode from dbt.contracts.graph.manifest import Manifest -import dbt.contracts.project import dbt.exceptions import dbt.flags import dbt.config @@ -61,7 +60,7 @@ def print_compile_stats(stats): logger.info("Found {}".format(stat_line)) -def _node_enabled(node): +def _node_enabled(node: NonSourceNode): # Disabled models are already excluded from the manifest if node.resource_type == NodeType.Test and not node.config.enabled: return False @@ -69,14 +68,16 @@ def _node_enabled(node): return True -def _generate_stats(manifest): +def _generate_stats(manifest: Manifest): stats: Dict[NodeType, int] = defaultdict(int) - for node_name, node in itertools.chain( - manifest.nodes.items(), - manifest.macros.items()): + for node in manifest.nodes.values(): if _node_enabled(node): stats[node.resource_type] += 1 + for source in manifest.sources.values(): + stats[source.resource_type] += 1 + for macro in manifest.macros.values(): + stats[macro.resource_type] += 1 return stats @@ -183,24 +184,34 @@ def compile_node(self, node, manifest, extra_context=None): return injected_node - def write_graph_file(self, linker, manifest): + def write_graph_file(self, linker: Linker, manifest: Manifest): filename = graph_file_name graph_path = os.path.join(self.config.target_path, filename) if dbt.flags.WRITE_JSON: linker.write_graph(graph_path, manifest) - def link_node(self, linker, node, manifest): + def link_node( + self, linker: Linker, node: NonSourceNode, manifest: Manifest + ): linker.add_node(node.unique_id) for dependency in node.depends_on_nodes: - if manifest.nodes.get(dependency): + if dependency in manifest.nodes: linker.dependency( node.unique_id, - (manifest.nodes.get(dependency).unique_id)) + (manifest.nodes[dependency].unique_id) + ) + elif dependency in manifest.sources: + linker.dependency( + node.unique_id, + (manifest.sources[dependency].unique_id) + ) else: dbt.exceptions.dependency_not_found(node, dependency) - def link_graph(self, linker, manifest): + def link_graph(self, linker: Linker, manifest: Manifest): + for source in manifest.sources.values(): + linker.add_node(source.unique_id) for node in manifest.nodes.values(): self.link_node(linker, node, manifest) @@ -209,7 +220,7 @@ def link_graph(self, linker, manifest): if cycle: raise RuntimeError("Found a cycle: {}".format(cycle)) - def compile(self, manifest, write=True): + def compile(self, manifest: Manifest, write=True): linker = Linker() self.link_graph(linker, manifest) diff --git a/core/dbt/config/__init__.py b/core/dbt/config/__init__.py index 35e61970352..9681c880f6e 100644 --- a/core/dbt/config/__init__.py +++ b/core/dbt/config/__init__.py @@ -2,4 +2,3 @@ from .profile import Profile, PROFILES_DIR, read_user_config # noqa from .project import Project # noqa from .runtime import RuntimeConfig, UnsetProfileConfig # noqa -from .renderer import ConfigRenderer # noqa diff --git a/core/dbt/config/profile.py b/core/dbt/config/profile.py index 500257286af..b3c145348ca 100644 --- a/core/dbt/config/profile.py +++ b/core/dbt/config/profile.py @@ -16,7 +16,7 @@ from dbt.logger import GLOBAL_LOGGER as logger from dbt.utils import coerce_dict_str -from .renderer import ConfigRenderer +from .renderer import ProfileRenderer DEFAULT_THREADS = 1 DEFAULT_PROFILES_DIR = os.path.join(os.path.expanduser('~'), '.dbt') @@ -240,7 +240,7 @@ def render_profile( raw_profile: Dict[str, Any], profile_name: str, target_override: Optional[str], - renderer: ConfigRenderer, + renderer: ProfileRenderer, ) -> Tuple[str, Dict[str, Any]]: """This is a containment zone for the hateful way we're rendering profiles. @@ -268,7 +268,7 @@ def render_profile( raw_profile, profile_name, target_name ) - profile_data = renderer.render_profile_data(raw_profile_data) + profile_data = renderer.render_data(raw_profile_data) return target_name, profile_data @classmethod @@ -276,7 +276,7 @@ def from_raw_profile_info( cls, raw_profile: Dict[str, Any], profile_name: str, - renderer: ConfigRenderer, + renderer: ProfileRenderer, user_cfg: Optional[Dict[str, Any]] = None, target_override: Optional[str] = None, threads_override: Optional[int] = None, @@ -330,7 +330,7 @@ def from_raw_profiles( cls, raw_profiles: Dict[str, Any], profile_name: str, - renderer: ConfigRenderer, + renderer: ProfileRenderer, target_override: Optional[str] = None, threads_override: Optional[int] = None, ) -> 'Profile': @@ -380,7 +380,7 @@ def from_raw_profiles( def render_from_args( cls, args: Any, - renderer: ConfigRenderer, + renderer: ProfileRenderer, project_profile_name: Optional[str], ) -> 'Profile': """Given the raw profiles as read from disk and the name of the desired diff --git a/core/dbt/config/project.py b/core/dbt/config/project.py index 017f80ff5ba..157546093db 100644 --- a/core/dbt/config/project.py +++ b/core/dbt/config/project.py @@ -1,7 +1,11 @@ from copy import deepcopy from dataclasses import dataclass, field from itertools import chain -from typing import List, Dict, Any, Optional, TypeVar, Union, Tuple, Callable +from typing import ( + List, Dict, Any, Optional, TypeVar, Union, Tuple, Callable, Mapping +) +from typing_extensions import Protocol + import hashlib import os @@ -14,31 +18,24 @@ from dbt.exceptions import RecursionException from dbt.exceptions import SemverException from dbt.exceptions import validator_error_message -from dbt.exceptions import warn_or_error from dbt.helper_types import NoValue from dbt.semver import VersionSpecifier from dbt.semver import versions_compatible from dbt.version import get_installed_version -from dbt.ui import printer -from dbt.utils import deep_map -from dbt.source_config import SourceConfig +from dbt.utils import deep_map, MultiDict +from dbt.legacy_config_updater import ConfigUpdater, IsFQNResource from dbt.contracts.project import ( - Project as ProjectContract, + ProjectV1 as ProjectV1Contract, + ProjectV2 as ProjectV2Contract, + parse_project_config, SemverString, ) from dbt.contracts.project import PackageConfig from hologram import ValidationError -from .renderer import ConfigRenderer - - -UNUSED_RESOURCE_CONFIGURATION_PATH_MESSAGE = """\ -WARNING: Configuration paths exist in your dbt_project.yml file which do not \ -apply to any resources. -There are {} unused configuration paths:\n{} -""" +from .renderer import DbtProjectYamlRenderer INVALID_VERSION_ERROR = """\ @@ -94,32 +91,6 @@ def _load_yaml(path): return load_yaml_text(contents) -def _get_config_paths(config, path=(), paths=None): - if paths is None: - paths = set() - - for key, value in config.items(): - if isinstance(value, dict): - if key in SourceConfig.ConfigKeys: - if path not in paths: - paths.add(path) - else: - _get_config_paths(value, path + (key,), paths) - else: - if path not in paths: - paths.add(path) - - return frozenset(paths) - - -def _is_config_used(path, fqns): - if fqns: - for fqn in fqns: - if len(path) <= len(fqn) and fqn[:len(path)] == path: - return True - return False - - def package_data_from_root(project_root): package_filepath = resolve_path_from_base( 'packages.yml', project_root @@ -204,7 +175,7 @@ def _raw_project_from(project_root: str) -> Dict[str, Any]: def _query_comment_from_cfg( - cfg_query_comment: Union[QueryComment, NoValue, str] + cfg_query_comment: Union[QueryComment, NoValue, str, None] ) -> QueryComment: if not cfg_query_comment: return QueryComment(comment='') @@ -220,6 +191,9 @@ def _query_comment_from_cfg( @dataclass class PartialProject: + config_version: int = field(metadata=dict( + description='The version of the configuration file format' + )) profile_name: Optional[str] = field(metadata=dict( description='The unrendered profile name in the project, if set' )) @@ -249,12 +223,73 @@ def render_profile_name(self, renderer) -> Optional[str]: return renderer.render_value(self.profile_name) +class VarProvider(Protocol): + """Var providers are tied to a particular Project.""" + def vars_for( + self, node: IsFQNResource, adapter_type: str + ) -> Mapping[str, Any]: + raise NotImplementedError( + f'vars_for not implemented for {type(self)}!' + ) + + def to_dict(self): + raise NotImplementedError( + f'to_dict not implemented for {type(self)}!' + ) + + +class V1VarProvider(VarProvider): + def __init__( + self, + models: Dict[str, Any], + seeds: Dict[str, Any], + snapshots: Dict[str, Any], + ) -> None: + self.models = models + self.seeds = seeds + self.snapshots = snapshots + self.sources: Dict[str, Any] = {} + + def vars_for( + self, node: IsFQNResource, adapter_type: str + ) -> Mapping[str, Any]: + updater = ConfigUpdater(adapter_type) + return updater.get_project_config(node, self).get('vars', {}) + + def to_dict(self): + raise ValidationError( + 'to_dict was called on a v1 vars, but it should only be called ' + 'on v2 vars' + ) + + +class V2VarProvider(VarProvider): + def __init__( + self, + vars: Dict[str, Dict[str, Any]] + ) -> None: + self.vars = vars + + def vars_for( + self, node: IsFQNResource, adapter_type: str + ) -> Mapping[str, Any]: + # in v2, vars are only either project or globally scoped + + merged = MultiDict([self.vars]) + if node.package_name in self.vars: + merged.add(self.vars.get(node.package_name, {})) + return merged + + def to_dict(self): + return self.vars + + @dataclass class Project: project_name: str version: Union[SemverString, float] project_root: str - profile_name: str + profile_name: Optional[str] source_paths: List[str] macro_paths: List[str] data_paths: List[str] @@ -272,9 +307,12 @@ class Project: on_run_end: List[str] seeds: Dict[str, Any] snapshots: Dict[str, Any] + sources: Dict[str, Any] + vars: VarProvider dbt_version: List[VersionSpecifier] packages: Dict[str, Any] query_comment: QueryComment + config_version: int @property def all_source_paths(self) -> List[str]: @@ -333,7 +371,7 @@ def from_project_config( project=project_dict ) try: - cfg = ProjectContract.from_dict(project_dict) + cfg = parse_project_config(project_dict) except ValidationError as e: raise DbtProjectError(validator_error_message(e)) from e @@ -375,9 +413,37 @@ def from_project_config( if cfg.quoting is not None: quoting = cfg.quoting.to_dict() - models: Dict[str, Any] = cfg.models - seeds: Dict[str, Any] = cfg.seeds - snapshots: Dict[str, Any] = cfg.snapshots + models: Dict[str, Any] + seeds: Dict[str, Any] + snapshots: Dict[str, Any] + sources: Dict[str, Any] + vars_value: VarProvider + + if cfg.config_version == 1: + assert isinstance(cfg, ProjectV1Contract) + # extract everything named 'vars' + models = cfg.models + seeds = cfg.seeds + snapshots = cfg.snapshots + sources = {} + vars_value = V1VarProvider( + models=models, seeds=seeds, snapshots=snapshots + ) + elif cfg.config_version == 2: + assert isinstance(cfg, ProjectV2Contract) + models = cfg.models + seeds = cfg.seeds + snapshots = cfg.snapshots + sources = cfg.sources + if cfg.vars is None: + vars_dict: Dict[str, Any] = {} + else: + vars_dict = cfg.vars + vars_value = V2VarProvider(vars_dict) + else: + raise ValidationError( + f'Got unsupported config_version={cfg.config_version}' + ) on_run_start: List[str] = value_or(cfg.on_run_start, []) on_run_end: List[str] = value_or(cfg.on_run_end, []) @@ -424,6 +490,9 @@ def from_project_config( dbt_version=dbt_version, packages=packages, query_comment=query_comment, + sources=sources, + vars=vars_value, + config_version=cfg.config_version, ) # sanity check - this means an internal issue project.validate() @@ -472,6 +541,7 @@ def to_project_config(self, with_packages=False): 'require-dbt-version': [ v.to_version_string() for v in self.dbt_version ], + 'config-version': self.config_version, }) if self.query_comment: result['query-comment'] = self.query_comment.to_dict() @@ -479,11 +549,17 @@ def to_project_config(self, with_packages=False): if with_packages: result.update(self.packages.to_dict()) + if self.config_version == 2: + result.update({ + 'sources': self.sources, + 'vars': self.vars.to_dict() + }) + return result def validate(self): try: - ProjectContract.from_dict(self.to_project_config()) + ProjectV2Contract.from_dict(self.to_project_config()) except ValidationError as e: raise DbtProjectError(validator_error_message(e)) from e @@ -493,12 +569,18 @@ def render_from_dict( project_root: str, project_dict: Dict[str, Any], packages_dict: Dict[str, Any], - renderer: ConfigRenderer, + renderer: DbtProjectYamlRenderer, ) -> 'Project': - rendered_project = renderer.render_project(project_dict) + rendered_project = renderer.render_data(project_dict) rendered_project['project-root'] = project_root - rendered_packages = renderer.render_packages_data(packages_dict) - return cls.from_project_config(rendered_project, rendered_packages) + package_renderer = renderer.get_package_renderer() + rendered_packages = package_renderer.render_data(packages_dict) + try: + return cls.from_project_config(rendered_project, rendered_packages) + except DbtProjectError as exc: + if exc.path is None: + exc.path = os.path.join(project_root, 'dbt_project.yml') + raise @classmethod def partial_load( @@ -509,8 +591,10 @@ def partial_load( project_name = project_dict.get('name') profile_name = project_dict.get('profile') + config_version = project_dict.get('config-version', 1) return PartialProject( + config_version=config_version, profile_name=profile_name, project_name=project_name, project_root=project_root, @@ -519,55 +603,15 @@ def partial_load( @classmethod def from_project_root( - cls, project_root: str, renderer: ConfigRenderer + cls, project_root: str, renderer: DbtProjectYamlRenderer ) -> 'Project': partial = cls.partial_load(project_root) + renderer.version = partial.config_version return partial.render(renderer) def hashed_name(self): return hashlib.md5(self.project_name.encode('utf-8')).hexdigest() - def get_resource_config_paths(self): - """Return a dictionary with 'seeds' and 'models' keys whose values are - lists of lists of strings, where each inner list of strings represents - a configured path in the resource. - """ - return { - 'models': _get_config_paths(self.models), - 'seeds': _get_config_paths(self.seeds), - 'snapshots': _get_config_paths(self.snapshots), - } - - def get_unused_resource_config_paths(self, resource_fqns, disabled): - """Return a list of lists of strings, where each inner list of strings - represents a type + FQN path of a resource configuration that is not - used. - """ - disabled_fqns = frozenset(tuple(fqn) for fqn in disabled) - resource_config_paths = self.get_resource_config_paths() - unused_resource_config_paths = [] - for resource_type, config_paths in resource_config_paths.items(): - used_fqns = resource_fqns.get(resource_type, frozenset()) - fqns = used_fqns | disabled_fqns - - for config_path in config_paths: - if not _is_config_used(config_path, fqns): - unused_resource_config_paths.append( - (resource_type,) + config_path - ) - return unused_resource_config_paths - - def warn_for_unused_resource_config_paths(self, resource_fqns, disabled): - unused = self.get_unused_resource_config_paths(resource_fqns, disabled) - if len(unused) == 0: - return - - msg = UNUSED_RESOURCE_CONFIGURATION_PATH_MESSAGE.format( - len(unused), - '\n'.join('- {}'.format('.'.join(u)) for u in unused) - ) - warn_or_error(msg, log_fmt=printer.yellow('{}')) - def validate_version(self): """Ensure this package works with the installed version of dbt.""" installed = get_installed_version() @@ -589,3 +633,55 @@ def validate_version(self): ] ) raise DbtProjectError(msg) + + def as_v1(self): + if self.config_version == 1: + return self + + dct = self.to_project_config() + + mutated = deepcopy(dct) + # remove sources, it doesn't exist + mutated.pop('sources', None) + + common_config_keys = ['models', 'seeds', 'snapshots'] + + if 'vars' in dct and isinstance(dct['vars'], dict): + # stuff any 'vars' entries into the old-style + # models/seeds/snapshots dicts + for project_name, items in dct['vars'].items(): + if not isinstance(items, dict): + # can't translate top-level vars + continue + for cfgkey in ['models', 'seeds', 'snapshots']: + if project_name not in mutated[cfgkey]: + mutated[cfgkey][project_name] = {} + project_type_cfg = mutated[cfgkey][project_name] + if 'vars' not in project_type_cfg: + project_type_cfg['vars'] = {} + mutated[cfgkey][project_name]['vars'].update(items) + # remove this from the v1 form + mutated.pop('vars') + # ok, now we want to look through all the existing cfgkeys and mirror + # it, except expand the '+' prefix. + for cfgkey in common_config_keys: + if cfgkey not in dct: + continue + + mutated[cfgkey] = _flatten_config(dct[cfgkey]) + mutated['config-version'] = 1 + project = Project.from_project_config(mutated) + project.packages = self.packages + return project + + +def _flatten_config(dct: Dict[str, Any]): + result = {} + for key, value in dct.items(): + if isinstance(value, dict) and not key.startswith('+'): + result[key] = _flatten_config(value) + else: + if key.startswith('+'): + key = key[1:] + result[key] = value + return result diff --git a/core/dbt/config/renderer.py b/core/dbt/config/renderer.py index 1b269176be6..794e9b01871 100644 --- a/core/dbt/config/renderer.py +++ b/core/dbt/config/renderer.py @@ -1,137 +1,201 @@ -from typing import Dict, Any +from typing import Dict, Any, Tuple, Optional, Union from dbt.clients.jinja import get_rendered -from dbt.exceptions import DbtProfileError from dbt.exceptions import DbtProjectError from dbt.exceptions import RecursionException +from dbt.node_types import NodeType from dbt.utils import deep_map -class ConfigRenderer: - """A renderer provides configuration rendering for a given set of cli - variables and a render type. - """ - def __init__(self, context: Dict[str, Any]): +Keypath = Tuple[Union[str, int], ...] + + +class BaseRenderer: + def __init__(self, context: Dict[str, Any]) -> None: self.context = context - @staticmethod - def _is_deferred_render(keypath): + @property + def name(self): + return 'Rendering' + + def should_render_keypath(self, keypath: Keypath) -> bool: + return True + + def render_entry(self, value: Any, keypath: Keypath) -> Any: + if not self.should_render_keypath(keypath): + return value + + return self.render_value(value, keypath) + + def render_value( + self, value: Any, keypath: Optional[Keypath] = None + ) -> Any: + # keypath is ignored. + # if it wasn't read as a string, ignore it + if not isinstance(value, str): + return value + return get_rendered(value, self.context, native=True) + + def render_data( + self, data: Dict[str, Any] + ) -> Dict[str, Any]: + try: + return deep_map(self.render_entry, data) + except RecursionException: + raise DbtProjectError( + f'Cycle detected: {self.name} input has a reference to itself', + project=data + ) + + +class DbtProjectYamlRenderer(BaseRenderer): + def __init__( + self, context: Dict[str, Any], version: Optional[int] = None + ) -> None: + super().__init__(context) + self.version: Optional[int] = version + + @property + def name(self): + 'Project config' + + def get_package_renderer(self) -> BaseRenderer: + return PackageRenderer(self.context) + + def should_render_keypath_v1(self, keypath: Keypath) -> bool: if not keypath: - return False + return True first = keypath[0] # run hooks if first in {'on-run-start', 'on-run-end', 'query-comment'}: - return True + return False # models have two things to avoid - if first in {'seeds', 'models', 'snapshots'}: + if first in {'seeds', 'models', 'snapshots', 'seeds'}: # model-level hooks if 'pre-hook' in keypath or 'post-hook' in keypath: - return True + return False # model-level 'vars' declarations if 'vars' in keypath: - return True + return False - return False + return True - def _render_project_entry(self, value, keypath): - """Render an entry, in case it's jinja. This is meant to be passed to - deep_map. + def should_render_keypath_v2(self, keypath: Keypath) -> bool: + if not keypath: + return True - If the parsed entry is a string and has the name 'port', this will - attempt to cast it to an int, and on failure will return the parsed - string. + first = keypath[0] + # run hooks are not rendered + if first in {'on-run-start', 'on-run-end', 'query-comment'}: + return False - :param value Any: The value to potentially render - :param key str: The key to convert on. - :return Any: The rendered entry. - """ - # the project name is never rendered - if keypath == ('name',): - return value - # query comments and hooks should be treated as raw sql, they'll get - # rendered later. - # Same goes for 'vars' declarations inside 'models'/'seeds' - if self._is_deferred_render(keypath): - return value + # don't render vars blocks until runtime + if first == 'vars': + return False - return self.render_value(value) + if first in {'seeds', 'models', 'snapshots', 'seeds'}: + # model-level hooks + if 'pre-hook' in keypath or 'post-hook' in keypath: + return False + # model-level 'vars' declarations + if 'vars' in keypath: + return False - def render_value(self, value, keypath=None): - # keypath is ignored. - # if it wasn't read as a string, ignore it - if not isinstance(value, str): - return value - return str(get_rendered(value, self.context)) - - def _render_profile_data(self, value, keypath): - result = self.render_value(value) - if len(keypath) == 1 and keypath[-1] == 'port': - try: - result = int(result) - except ValueError: - # let the validator or connection handle this - pass - return result - - @staticmethod - def _is_schema_test(keypath) -> bool: - # we got passed an UnparsedSourceDefinition - if len(keypath) > 2 and keypath[0] == 'tables': - if keypath[2] == 'tests': - return True - elif keypath[2] == 'columns': - if len(keypath) > 4 and keypath[4] == 'tests': - return True - return False + return True - def _render_schema_source_data(self, value, keypath): - # things to not render: - # - descriptions - # - test arguments - if len(keypath) > 0 and keypath[-1] == 'description': - return value - elif self._is_schema_test(keypath): - return value + def should_render_keypath(self, keypath: Keypath) -> bool: + if self.version == 2: + return self.should_render_keypath_v2(keypath) + else: # could be None + return self.should_render_keypath_v1(keypath) - return self.render_value(value) + def render_data( + self, data: Dict[str, Any] + ) -> Dict[str, Any]: + if self.version is None: + self.version = data.get('current-version') - def render_project(self, as_parsed): - """Render the parsed data, returning a new dict (or whatever was read). - """ try: - return deep_map(self._render_project_entry, as_parsed) + return deep_map(self.render_entry, data) except RecursionException: raise DbtProjectError( - 'Cycle detected: Project input has a reference to itself', - project=as_parsed + f'Cycle detected: {self.name} input has a reference to itself', + project=data ) - def render_profile_data(self, as_parsed): - """Render the chosen profile entry, as it was parsed.""" - try: - return deep_map(self._render_profile_data, as_parsed) - except RecursionException: - raise DbtProfileError( - 'Cycle detected: Profile input has a reference to itself', - project=as_parsed - ) - def render_schema_source(self, as_parsed): - try: - return deep_map(self._render_schema_source_data, as_parsed) - except RecursionException: - raise DbtProfileError( - 'Cycle detected: schema.yml input has a reference to itself', - project=as_parsed - ) +class ProfileRenderer(BaseRenderer): + @property + def name(self): + 'Profile' - def render_packages_data(self, as_parsed): - try: - return deep_map(self.render_value, as_parsed) - except RecursionException: - raise DbtProfileError( - 'Cycle detected: schema.yml input has a reference to itself', - project=as_parsed - ) + +class SchemaYamlRenderer(BaseRenderer): + DOCUMENTABLE_NODES = frozenset( + n.pluralize() for n in NodeType.documentable() + ) + + @property + def name(self): + return 'Rendering yaml' + + def _is_norender_key(self, keypath: Keypath) -> bool: + """ + models: + - name: blah + - description: blah + tests: ... + - columns: + - name: + - description: blah + tests: ... + + Return True if it's tests or description - those aren't rendered + """ + if len(keypath) >= 2 and keypath[1] in ('tests', 'description'): + return True + + if ( + len(keypath) >= 4 and + keypath[1] == 'columns' and + keypath[3] in ('tests', 'description') + ): + return True + + return False + + # don't render descriptions or test keyword arguments + def should_render_keypath(self, keypath: Keypath) -> bool: + if len(keypath) < 2: + return True + + if keypath[0] not in self.DOCUMENTABLE_NODES: + return True + + if len(keypath) < 3: + return True + + if keypath[0] == NodeType.Source.pluralize(): + if keypath[2] == 'description': + return False + if keypath[2] == 'tables': + if self._is_norender_key(keypath[3:]): + return False + elif keypath[0] == NodeType.Macro.pluralize(): + if keypath[2] == 'arguments': + if self._is_norender_key(keypath[3:]): + return False + elif self._is_norender_key(keypath[1:]): + return False + else: # keypath[0] in self.DOCUMENTABLE_NODES: + if self._is_norender_key(keypath[1:]): + return False + return True + + +class PackageRenderer(BaseRenderer): + @property + def name(self): + return 'Packages config' diff --git a/core/dbt/config/runtime.py b/core/dbt/config/runtime.py index 28c656783a2..c15c1cbfe40 100644 --- a/core/dbt/config/runtime.py +++ b/core/dbt/config/runtime.py @@ -1,37 +1,59 @@ +import itertools +import os from copy import deepcopy from dataclasses import dataclass, fields -import os -from typing import Dict, Any, Type +from pathlib import Path +from typing import ( + Dict, Any, Optional, Mapping, Iterator, Iterable, Tuple, List, MutableSet, + Type +) from .profile import Profile from .project import Project -from .renderer import ConfigRenderer +from .renderer import DbtProjectYamlRenderer, ProfileRenderer from dbt import tracking from dbt.adapters.factory import get_relation_class_by_name +from dbt.helper_types import FQNPath, PathSet from dbt.context.base import generate_base_context from dbt.context.target import generate_target_context from dbt.contracts.connection import AdapterRequiredConfig, Credentials from dbt.contracts.graph.manifest import ManifestMetadata -from dbt.contracts.project import Configuration, UserConfig from dbt.logger import GLOBAL_LOGGER as logger -from dbt.exceptions import DbtProjectError, RuntimeException, DbtProfileError -from dbt.exceptions import validator_error_message +from dbt.ui import printer from dbt.utils import parse_cli_vars +from dbt.contracts.project import Configuration, UserConfig +from dbt.exceptions import ( + RuntimeException, + DbtProfileError, + DbtProjectError, + validator_error_message, + warn_or_error, + raise_compiler_error +) +from dbt.include.global_project import PACKAGES +from dbt.legacy_config_updater import ConfigUpdater + from hologram import ValidationError @dataclass class RuntimeConfig(Project, Profile, AdapterRequiredConfig): args: Any + profile_name: str cli_vars: Dict[str, Any] + dependencies: Optional[Mapping[str, 'RuntimeConfig']] = None def __post_init__(self): self.validate() @classmethod def from_parts( - cls, project: Project, profile: Profile, args: Any, + cls, + project: Project, + profile: Profile, + args: Any, + dependencies: Optional[Mapping[str, 'RuntimeConfig']] = None, ) -> 'RuntimeConfig': """Instantiate a RuntimeConfig from its components. @@ -72,6 +94,9 @@ def from_parts( dbt_version=project.dbt_version, packages=project.packages, query_comment=project.query_comment, + sources=project.sources, + vars=project.vars, + config_version=project.config_version, profile_name=profile.profile_name, target_name=profile.target_name, config=profile.config, @@ -79,6 +104,7 @@ def from_parts( credentials=profile.credentials, args=args, cli_vars=cli_vars, + dependencies=dependencies, ) def new_project(self, project_root: str) -> 'RuntimeConfig': @@ -95,7 +121,7 @@ def new_project(self, project_root: str) -> 'RuntimeConfig': profile.validate() # load the new project and its packages. Don't pass cli variables. - renderer = ConfigRenderer(generate_target_context(profile, {})) + renderer = DbtProjectYamlRenderer(generate_target_context(profile, {})) project = Project.from_project_root(project_root, renderer) @@ -135,6 +161,41 @@ def validate(self): if getattr(self.args, 'version_check', False): self.validate_version() + @classmethod + def _get_rendered_profile( + cls, + args: Any, + profile_renderer: ProfileRenderer, + profile_name: Optional[str], + ) -> Profile: + return Profile.render_from_args( + args, profile_renderer, profile_name + ) + + @classmethod + def collect_parts( + cls: Type['RuntimeConfig'], args: Any + ) -> Tuple[Project, Profile]: + # profile_name from the project + project_root = args.project_dir if args.project_dir else os.getcwd() + partial = Project.partial_load(project_root) + + # build the profile using the base renderer and the one fact we know + cli_vars: Dict[str, Any] = parse_cli_vars(getattr(args, 'vars', '{}')) + profile_renderer = ProfileRenderer(generate_base_context(cli_vars)) + profile_name = partial.render_profile_name(profile_renderer) + + profile = cls._get_rendered_profile( + args, profile_renderer, profile_name + ) + + # get a new renderer using our target information and render the + # project + ctx = generate_target_context(profile, cli_vars) + project_renderer = DbtProjectYamlRenderer(ctx, partial.config_version) + project = partial.render(project_renderer) + return (project, profile) + @classmethod def from_args(cls, args: Any) -> 'RuntimeConfig': """Given arguments, read in dbt_project.yml from the current directory, @@ -146,21 +207,7 @@ def from_args(cls, args: Any) -> 'RuntimeConfig': :raises DbtProfileError: If the profile is invalid or missing. :raises ValidationException: If the cli variables are invalid. """ - # profile_name from the project - partial = Project.partial_load(os.getcwd()) - - # build the profile using the base renderer and the one fact we know - cli_vars: Dict[str, Any] = parse_cli_vars(getattr(args, 'vars', '{}')) - renderer = ConfigRenderer(generate_base_context(cli_vars=cli_vars)) - profile_name = partial.render_profile_name(renderer) - profile = Profile.render_from_args( - args, renderer, profile_name - ) - - # get a new renderer using our target information and render the - # project - renderer = ConfigRenderer(generate_target_context(profile, cli_vars)) - project = partial.render(renderer) + project, profile = cls.collect_parts(args) return cls.from_parts( project=project, @@ -174,6 +221,156 @@ def get_metadata(self) -> ManifestMetadata: adapter_type=self.credentials.type ) + def _get_v2_config_paths( + self, + config, + path: FQNPath, + paths: MutableSet[FQNPath], + ) -> PathSet: + for key, value in config.items(): + if isinstance(value, dict) and not key.startswith('+'): + self._get_v2_config_paths(value, path + (key,), paths) + else: + paths.add(path) + return frozenset(paths) + + def _get_v1_config_paths( + self, + config: Dict[str, Any], + path: FQNPath, + paths: MutableSet[FQNPath], + ) -> PathSet: + keys = ConfigUpdater(self.credentials.type).ConfigKeys + + for key, value in config.items(): + if isinstance(value, dict): + if key in keys: + if path not in paths: + paths.add(path) + else: + self._get_v1_config_paths(value, path + (key,), paths) + else: + if path not in paths: + paths.add(path) + + return frozenset(paths) + + def _get_config_paths( + self, + config: Dict[str, Any], + path: FQNPath = (), + paths: Optional[MutableSet[FQNPath]] = None, + ) -> PathSet: + if paths is None: + paths = set() + + if self.config_version == 2: + return self._get_v2_config_paths(config, path, paths) + else: + return self._get_v1_config_paths(config, path, paths) + + def get_resource_config_paths(self) -> Dict[str, PathSet]: + """Return a dictionary with 'seeds' and 'models' keys whose values are + lists of lists of strings, where each inner list of strings represents + a configured path in the resource. + """ + return { + 'models': self._get_config_paths(self.models), + 'seeds': self._get_config_paths(self.seeds), + 'snapshots': self._get_config_paths(self.snapshots), + 'sources': self._get_config_paths(self.sources), + } + + def get_unused_resource_config_paths( + self, + resource_fqns: Mapping[str, PathSet], + disabled: PathSet, + ) -> List[FQNPath]: + """Return a list of lists of strings, where each inner list of strings + represents a type + FQN path of a resource configuration that is not + used. + """ + disabled_fqns = frozenset(tuple(fqn) for fqn in disabled) + resource_config_paths = self.get_resource_config_paths() + unused_resource_config_paths = [] + for resource_type, config_paths in resource_config_paths.items(): + used_fqns = resource_fqns.get(resource_type, frozenset()) + fqns = used_fqns | disabled_fqns + + for config_path in config_paths: + if not _is_config_used(config_path, fqns): + unused_resource_config_paths.append( + (resource_type,) + config_path + ) + return unused_resource_config_paths + + def warn_for_unused_resource_config_paths( + self, + resource_fqns: Mapping[str, PathSet], + disabled: PathSet, + ) -> None: + unused = self.get_unused_resource_config_paths(resource_fqns, disabled) + if len(unused) == 0: + return + + msg = UNUSED_RESOURCE_CONFIGURATION_PATH_MESSAGE.format( + len(unused), + '\n'.join('- {}'.format('.'.join(u)) for u in unused) + ) + warn_or_error(msg, log_fmt=printer.yellow('{}')) + + def load_dependencies(self) -> Mapping[str, 'RuntimeConfig']: + if self.dependencies is None: + all_projects = {self.project_name: self} + project_paths = itertools.chain( + map(Path, PACKAGES.values()), + self._get_project_directories() + ) + for project_name, project in self.load_projects(project_paths): + if project_name in all_projects: + raise_compiler_error( + f'dbt found more than one package with the name ' + f'"{project_name}" included in this project. Package ' + f'names must be unique in a project. Please rename ' + f'one of these packages.' + ) + all_projects[project_name] = project + self.dependencies = all_projects + return self.dependencies + + def load_projects( + self, paths: Iterable[Path] + ) -> Iterator[Tuple[str, 'RuntimeConfig']]: + for path in paths: + try: + project = self.new_project(str(path)) + except DbtProjectError as e: + raise DbtProjectError( + 'Failed to read package at {}: {}' + .format(path, e) + ) from e + else: + yield project.project_name, project + + def _get_project_directories(self) -> Iterator[Path]: + root = Path(self.project_root) / self.modules_path + + if root.exists(): + for path in root.iterdir(): + if path.is_dir() and not path.name.startswith('__'): + yield path + + def as_v1(self): + if self.config_version == 1: + return self + + return self.from_parts( + project=Project.as_v1(self), + profile=self, + args=self.args, + dependencies=self.dependencies, + ) + class UnsetCredentials(Credentials): def __init__(self): @@ -251,7 +448,11 @@ def to_target_dict(self): @classmethod def from_parts( - cls, project: Project, profile: Any, args: Any, + cls, + project: Project, + profile: Profile, + args: Any, + dependencies: Optional[Mapping[str, 'RuntimeConfig']] = None, ) -> 'RuntimeConfig': """Instantiate a RuntimeConfig from its components. @@ -286,6 +487,9 @@ def from_parts( dbt_version=project.dbt_version, packages=project.packages, query_comment=project.query_comment, + sources=project.sources, + vars=project.vars, + config_version=project.config_version, profile_name='', target_name='', config=UnsetConfig(), @@ -293,32 +497,20 @@ def from_parts( credentials=UnsetCredentials(), args=args, cli_vars=cli_vars, + dependencies=dependencies, ) @classmethod - def from_args(cls: Type[RuntimeConfig], args: Any) -> 'RuntimeConfig': - """Given arguments, read in dbt_project.yml from the current directory, - read in packages.yml if it exists, and use them to find the profile to - load. - - :param args: The arguments as parsed from the cli. - :raises DbtProjectError: If the project is invalid or missing. - :raises DbtProfileError: If the profile is invalid or missing. - :raises ValidationException: If the cli variables are invalid. - """ - # profile_name from the project - partial = Project.partial_load(os.getcwd()) - - # build the profile using the base renderer and the one fact we know - cli_vars: Dict[str, Any] = parse_cli_vars(getattr(args, 'vars', '{}')) - renderer = ConfigRenderer(generate_base_context(cli_vars=cli_vars)) - profile_name = partial.render_profile_name(renderer) - + def _get_rendered_profile( + cls, + args: Any, + profile_renderer: ProfileRenderer, + profile_name: Optional[str], + ) -> Profile: try: profile = Profile.render_from_args( - args, renderer, profile_name + args, profile_renderer, profile_name ) - cls = RuntimeConfig # we can return a real runtime config, do that except (DbtProjectError, DbtProfileError) as exc: logger.debug( 'Profile not loaded due to error: {}', exc, exc_info=True @@ -331,14 +523,41 @@ def from_args(cls: Type[RuntimeConfig], args: Any) -> 'RuntimeConfig': profile = UnsetProfile() # disable anonymous usage statistics tracking.do_not_track() + return profile - # get a new renderer using our target information and render the - # project - renderer = ConfigRenderer(generate_target_context(profile, cli_vars)) - project = partial.render(renderer) + @classmethod + def from_args(cls: Type[RuntimeConfig], args: Any) -> 'RuntimeConfig': + """Given arguments, read in dbt_project.yml from the current directory, + read in packages.yml if it exists, and use them to find the profile to + load. + + :param args: The arguments as parsed from the cli. + :raises DbtProjectError: If the project is invalid or missing. + :raises DbtProfileError: If the profile is invalid or missing. + :raises ValidationException: If the cli variables are invalid. + """ + project, profile = cls.collect_parts(args) + if not isinstance(profile, UnsetProfile): + # if it's a real profile, return a real config + cls = RuntimeConfig return cls.from_parts( project=project, profile=profile, - args=args, + args=args ) + + +UNUSED_RESOURCE_CONFIGURATION_PATH_MESSAGE = """\ +WARNING: Configuration paths exist in your dbt_project.yml file which do not \ +apply to any resources. +There are {} unused configuration paths:\n{} +""" + + +def _is_config_used(path, fqns): + if fqns: + for fqn in fqns: + if len(path) <= len(fqn) and fqn[:len(path)] == path: + return True + return False diff --git a/core/dbt/context/base.py b/core/dbt/context/base.py index ecacf966896..ed8fa800bae 100644 --- a/core/dbt/context/base.py +++ b/core/dbt/context/base.py @@ -1,15 +1,15 @@ import json import os from typing import ( - Any, Dict, NoReturn, Optional + Any, Dict, NoReturn, Optional, Mapping ) from dbt import flags from dbt import tracking from dbt.clients.jinja import undefined_error, get_rendered +from dbt.contracts.graph.compiled import CompiledResource from dbt.exceptions import raise_compiler_error, MacroReturn from dbt.logger import GLOBAL_LOGGER as logger -from dbt.utils import merge from dbt.version import __version__ as dbt_version import yaml @@ -98,40 +98,40 @@ class Var: "supplied to {} = {}" _VAR_NOTSET = object() - def __init__(self, model, context, overrides): - self.model = model - self.context = context - - # These are hard-overrides (eg. CLI vars) that should take - # precedence over context-based var definitions - self.overrides = overrides - - if model is None: - # during config parsing we have no model and no local vars - self.model_name = '' - local_vars = {} + def __init__( + self, + context: Mapping[str, Any], + cli_vars: Mapping[str, Any], + node: Optional[CompiledResource] = None + ) -> None: + self.context: Mapping[str, Any] = context + self.cli_vars: Mapping[str, Any] = cli_vars + self.node: Optional[CompiledResource] = node + self.merged: Mapping[str, Any] = self._generate_merged() + + def _generate_merged(self) -> Mapping[str, Any]: + return self.cli_vars + + @property + def node_name(self): + if self.node is not None: + return self.node.name else: - self.model_name = model.name - local_vars = model.local_vars() - - self.local_vars = merge(local_vars, overrides) - - def pretty_dict(self, data): - return json.dumps(data, sort_keys=True, indent=4) + return '' def get_missing_var(self, var_name): - pretty_vars = self.pretty_dict(self.local_vars) + dct = {k: self.merged[k] for k in self.merged} + pretty_vars = json.dumps(dct, sort_keys=True, indent=4) msg = self.UndefinedVarError.format( - var_name, self.model_name, pretty_vars + var_name, self.node_name, pretty_vars ) - raise_compiler_error(msg, self.model) + raise_compiler_error(msg, self.node) - def assert_var_defined(self, var_name, default): - if var_name not in self.local_vars and default is self._VAR_NOTSET: - return self.get_missing_var(var_name) + def has_var(self, var_name: str): + return var_name in self.merged def get_rendered_var(self, var_name): - raw = self.local_vars[var_name] + raw = self.merged[var_name] # if bool/int/float/etc are passed in, don't compile anything if not isinstance(raw, str): return raw @@ -139,7 +139,7 @@ def get_rendered_var(self, var_name): return get_rendered(raw, self.context) def __call__(self, var_name, default=_VAR_NOTSET): - if var_name in self.local_vars: + if self.has_var(var_name): return self.get_rendered_var(var_name) elif default is not self._VAR_NOTSET: return default @@ -255,7 +255,7 @@ def var(self) -> Var: from events where event_type = '{{ var("event_type", "activation") }}' """ - return Var(None, self._ctx, self.cli_vars) + return Var(self._ctx, self.cli_vars) @contextmember @staticmethod diff --git a/core/dbt/context/configured.py b/core/dbt/context/configured.py index b979ae849e8..7862fbdb655 100644 --- a/core/dbt/context/configured.py +++ b/core/dbt/context/configured.py @@ -7,7 +7,7 @@ from dbt.include.global_project import PACKAGES from dbt.include.global_project import PROJECT_NAME as GLOBAL_PROJECT_NAME -from dbt.context.base import contextproperty +from dbt.context.base import contextproperty, Var from dbt.context.target import TargetContext from dbt.exceptions import raise_duplicate_macro_name @@ -25,6 +25,55 @@ def project_name(self) -> str: return self.config.project_name +class ConfiguredVar(Var): + def __init__( + self, + context: Dict[str, Any], + config: AdapterRequiredConfig, + project_name: str, + ): + super().__init__(context, config.cli_vars) + self.config = config + self.project_name = project_name + + def __call__(self, var_name, default=Var._VAR_NOTSET): + my_config = self.config.load_dependencies()[self.project_name] + + # cli vars > active project > local project + if var_name in self.config.cli_vars: + return self.config.cli_vars[var_name] + + if self.config.config_version == 2 and my_config.config_version == 2: + + active_vars = self.config.vars.to_dict() + active_vars = active_vars.get(self.project_name, {}) + if var_name in active_vars: + return active_vars[var_name] + + if self.config.project_name != my_config.project_name: + config_vars = my_config.vars.to_dict() + config_vars = config_vars.get(self.project_name, {}) + if var_name in config_vars: + return config_vars[var_name] + + if default is not Var._VAR_NOTSET: + return default + + return self.get_missing_var(var_name) + + +class SchemaYamlContext(ConfiguredContext): + def __init__(self, config, project_name: str): + super().__init__(config) + self._project_name = project_name + + @contextproperty + def var(self) -> ConfiguredVar: + return ConfiguredVar( + self._ctx, self.config, self._project_name + ) + + FlatNamespace = Dict[str, MacroGenerator] NamespaceMember = Union[FlatNamespace, MacroGenerator] FullNamespace = Dict[str, NamespaceMember] @@ -134,3 +183,10 @@ def generate_query_header_context( ): ctx = QueryHeaderContext(config, manifest) return ctx.to_dict() + + +def generate_schema_yml( + config: AdapterRequiredConfig, project_name: str +) -> Dict[str, Any]: + ctx = SchemaYamlContext(config, project_name) + return ctx.to_dict() diff --git a/core/dbt/context/context_config.py b/core/dbt/context/context_config.py new file mode 100644 index 00000000000..a33be3357e4 --- /dev/null +++ b/core/dbt/context/context_config.py @@ -0,0 +1,195 @@ +from copy import deepcopy +from dataclasses import dataclass +from typing import List, Iterator, Dict, Any, TypeVar, Union + +from dbt.config import RuntimeConfig, Project +from dbt.contracts.graph.model_config import BaseConfig, get_config_for +from dbt.exceptions import InternalException +from dbt.legacy_config_updater import ConfigUpdater, IsFQNResource +from dbt.node_types import NodeType +from dbt.utils import fqn_search + + +@dataclass +class ModelParts(IsFQNResource): + fqn: List[str] + resource_type: NodeType + package_name: str + + +class LegacyContextConfig: + def __init__( + self, + active_project: RuntimeConfig, + own_project: Project, + fqn: List[str], + node_type: NodeType, + ): + self._config = None + self.active_project: RuntimeConfig = active_project + self.own_project: Project = own_project + + self.model = ModelParts( + fqn=fqn, + resource_type=node_type, + package_name=self.own_project.project_name, + ) + + self.updater = ConfigUpdater(active_project.credentials.type) + + # the config options defined within the model + self.in_model_config: Dict[str, Any] = {} + + def get_default(self) -> Dict[str, Any]: + defaults = {"enabled": True, "materialized": "view"} + + if self.model.resource_type == NodeType.Seed: + defaults['materialized'] = 'seed' + elif self.model.resource_type == NodeType.Snapshot: + defaults['materialized'] = 'snapshot' + + if self.model.resource_type == NodeType.Test: + defaults['severity'] = 'ERROR' + + return defaults + + def build_config_dict(self, base: bool = False) -> Dict[str, Any]: + defaults = self.get_default() + active_config = self.load_config_from_active_project() + + if self.active_project.project_name == self.own_project.project_name: + cfg = self.updater.merge( + defaults, active_config, self.in_model_config + ) + else: + own_config = self.load_config_from_own_project() + + cfg = self.updater.merge( + defaults, own_config, self.in_model_config, active_config + ) + + return cfg + + def _translate_adapter_aliases(self, config: Dict[str, Any]): + return self.active_project.credentials.translate_aliases(config) + + def update_in_model_config(self, config: Dict[str, Any]) -> None: + config = self._translate_adapter_aliases(config) + self.updater.update_into(self.in_model_config, config) + + def load_config_from_own_project(self) -> Dict[str, Any]: + return self.updater.get_project_config(self.model, self.own_project) + + def load_config_from_active_project(self) -> Dict[str, Any]: + return self.updater.get_project_config(self.model, self.active_project) + + +T = TypeVar('T', bound=BaseConfig) + + +class ContextConfigGenerator: + def __init__(self, active_project: RuntimeConfig): + self.active_project = active_project + + def get_node_project(self, project_name: str): + if project_name == self.active_project.project_name: + return self.active_project + dependencies = self.active_project.load_dependencies() + if project_name not in dependencies: + raise InternalException( + f'Project name {project_name} not found in dependencies ' + f'(found {list(dependencies)})' + ) + return dependencies[project_name] + + def project_configs( + self, project: Project, fqn: List[str], resource_type: NodeType + ) -> Iterator[Dict[str, Any]]: + if resource_type == NodeType.Seed: + model_configs = project.seeds + elif resource_type == NodeType.Snapshot: + model_configs = project.snapshots + elif resource_type == NodeType.Source: + model_configs = project.sources + else: + model_configs = project.models + for level_config in fqn_search(model_configs, fqn): + result = {} + for key, value in level_config.items(): + if key.startswith('+'): + result[key[1:]] = deepcopy(value) + elif not isinstance(value, dict): + result[key] = deepcopy(value) + + yield result + + def active_project_configs( + self, fqn: List[str], resource_type: NodeType + ) -> Iterator[Dict[str, Any]]: + return self.project_configs(self.active_project, fqn, resource_type) + + def _update_from_config( + self, result: T, partial: Dict[str, Any], validate: bool = False + ) -> T: + return result.update_from( + partial, + self.active_project.credentials.type, + validate=validate + ) + + def calculate_node_config( + self, + config_calls: List[Dict[str, Any]], + fqn: List[str], + resource_type: NodeType, + project_name: str, + base: bool, + ) -> BaseConfig: + own_config = self.get_node_project(project_name) + # defaults, own_config, config calls, active_config (if != own_config) + config_cls = get_config_for(resource_type, base=base) + # Calculate the defaults. We don't want to validate the defaults, + # because it might be invalid in the case of required config members + # (such as on snapshots!) + result = config_cls.from_dict({}, validate=False) + for fqn_config in self.project_configs(own_config, fqn, resource_type): + result = self._update_from_config(result, fqn_config) + for config_call in config_calls: + result = self._update_from_config(result, config_call) + + if own_config.project_name != self.active_project.project_name: + for fqn_config in self.active_project_configs(fqn, resource_type): + result = self._update_from_config(result, fqn_config) + + # this is mostly impactful in the snapshot config case + return result.finalize_and_validate() + + +class ContextConfig: + def __init__( + self, + active_project: RuntimeConfig, + fqn: List[str], + resource_type: NodeType, + project_name: str, + ) -> None: + self.config_calls: List[Dict[str, Any]] = [] + self.cfg_source = ContextConfigGenerator(active_project) + self.fqn = fqn + self.resource_type = resource_type + self.project_name = project_name + + def update_in_model_config(self, opts: Dict[str, Any]) -> None: + self.config_calls.append(opts) + + def build_config_dict(self, base: bool = False) -> Dict[str, Any]: + return self.cfg_source.calculate_node_config( + config_calls=self.config_calls, + fqn=self.fqn, + resource_type=self.resource_type, + project_name=self.project_name, + base=base, + ).to_dict() + + +ContextConfigType = Union[LegacyContextConfig, ContextConfig] diff --git a/core/dbt/context/docs.py b/core/dbt/context/docs.py index 676fbe71bd3..2ea70135688 100644 --- a/core/dbt/context/docs.py +++ b/core/dbt/context/docs.py @@ -12,28 +12,10 @@ from dbt.contracts.graph.parsed import ParsedMacro from dbt.context.base import contextmember -from dbt.context.configured import ConfiguredContext +from dbt.context.configured import SchemaYamlContext -class DocsParseContext(ConfiguredContext): - def __init__( - self, - config: RuntimeConfig, - node: Any, - ) -> None: - super().__init__(config) - self.node = node - - @contextmember - def doc(self, *args: str) -> str: - # when you call doc(), this is what happens at parse time - if len(args) != 1 and len(args) != 2: - doc_invalid_args(self.node, args) - # At parse time, nothing should care about what doc() returns - return '' - - -class DocsRuntimeContext(ConfiguredContext): +class DocsRuntimeContext(SchemaYamlContext): def __init__( self, config: RuntimeConfig, @@ -41,10 +23,9 @@ def __init__( manifest: Manifest, current_project: str, ) -> None: - super().__init__(config) + super().__init__(config, current_project) self.node = node self.manifest = manifest - self.current_project = current_project @contextmember def doc(self, *args: str) -> str: @@ -79,7 +60,7 @@ def doc(self, *args: str) -> str: target_doc = self.manifest.resolve_doc( doc_name, doc_package_name, - self.current_project, + self._project_name, self.node.package_name, ) @@ -89,14 +70,6 @@ def doc(self, *args: str) -> str: return target_doc.block_contents -def generate_parser_docs( - config: RuntimeConfig, - unparsed: Any, -) -> Dict[str, Any]: - ctx = DocsParseContext(config, unparsed) - return ctx.to_dict() - - def generate_runtime_docs( config: RuntimeConfig, target: Any, diff --git a/core/dbt/context/providers.py b/core/dbt/context/providers.py index 1d29e35103f..58ce2e0b306 100644 --- a/core/dbt/context/providers.py +++ b/core/dbt/context/providers.py @@ -1,7 +1,8 @@ import abc import os from typing import ( - Callable, Any, Dict, Optional, Union, List, TypeVar, Type + Callable, Any, Dict, Optional, Union, List, TypeVar, Type, Iterable, + Mapping, ) from typing_extensions import Protocol @@ -10,21 +11,23 @@ from dbt.adapters.factory import get_adapter from dbt.clients import agate_helper from dbt.clients.jinja import get_rendered -from dbt.config import RuntimeConfig +from dbt.config import RuntimeConfig, Project from dbt.context.base import ( contextmember, contextproperty, Var ) from dbt.context.configured import ManifestContext, MacroNamespace +from dbt.context.context_config import ContextConfigType from dbt.contracts.graph.manifest import Manifest, Disabled from dbt.contracts.graph.compiled import ( - NonSourceNode, CompiledSeedNode + NonSourceNode, CompiledSeedNode, CompiledResource, CompiledNode ) from dbt.contracts.graph.parsed import ( - ParsedMacro, ParsedSourceDefinition, ParsedSeedNode + ParsedMacro, ParsedSourceDefinition, ParsedSeedNode, ParsedNode ) from dbt.exceptions import ( InternalException, ValidationException, + RuntimeException, missing_config, raise_compiler_error, ref_invalid_args, @@ -35,10 +38,9 @@ ) from dbt.logger import GLOBAL_LOGGER as logger # noqa from dbt.node_types import NodeType -from dbt.source_config import SourceConfig from dbt.utils import ( - add_ephemeral_model_prefix, merge, AttrDict + add_ephemeral_model_prefix, merge, AttrDict, MultiDict ) import agate @@ -155,24 +157,15 @@ def __call__(self, *args: str) -> RelationProxy: class Config(Protocol): - def __init__(self, model, source_config): + def __init__(self, model, context_config: Optional[ContextConfigType]): ... -class Provider(Protocol): - execute: bool - Config: Type[Config] - DatabaseWrapper: Type[BaseDatabaseWrapper] - Var: Type[Var] - ref: Type[BaseRefResolver] - source: Type[BaseSourceResolver] - - # `config` implementations class ParseConfigObject(Config): - def __init__(self, model, source_config): + def __init__(self, model, context_config: Optional[ContextConfigType]): self.model = model - self.source_config = source_config + self.context_config = context_config def _transform_config(self, config): for oldkey in ('pre_hook', 'post_hook'): @@ -199,7 +192,13 @@ def __call__(self, *args, **kwargs): opts = self._transform_config(opts) - self.source_config.update_in_model_config(opts) + # it's ok to have a parse context with no context config, but you must + # not call it! + if self.context_config is None: + raise RuntimeException( + 'At parse time, did not receive a context config' + ) + self.context_config.update_in_model_config(opts) return '' def set(self, name, value): @@ -213,9 +212,11 @@ def get(self, name, validator=None, default=None): class RuntimeConfigObject(Config): - def __init__(self, model, source_config=None): + def __init__( + self, model, context_config: Optional[ContextConfigType] = None + ): self.model = model - # we never use or get a source config, only the parser cares + # we never use or get a config, only the parser cares def __call__(self, *args, **kwargs): return '' @@ -227,16 +228,10 @@ def _validate(self, validator, value): validator(value) def _lookup(self, name, default=_MISSING): - config = self.model.config - - if hasattr(config, name): - return getattr(config, name) - elif name in config.extra: - return config.extra[name] - elif default is not _MISSING: - return default - else: + result = self.model.config.get(name, default) + if result is _MISSING: missing_config(self.model, name) + return result def require(self, name, validator=None): to_return = self._lookup(name) @@ -320,6 +315,7 @@ def resolve( self.model, target_name, target_package, + disabled=isinstance(target_model, Disabled), ) self.validate(target_model, target_name, target_package) return self.create_relation(target_model, target_name) @@ -390,7 +386,7 @@ def resolve(self, source_name: str, table_name: str): self.model.package_name, ) - if target_source is None: + if target_source is None or isinstance(target_source, Disabled): source_target_not_found( self.model, source_name, @@ -400,17 +396,67 @@ def resolve(self, source_name: str, table_name: str): # `var` implementations. -class ParseVar(Var): +class ModelConfiguredVar(Var): + def __init__( + self, + context: Dict[str, Any], + config: RuntimeConfig, + node: CompiledResource, + ) -> None: + self.node: CompiledResource + self.config: RuntimeConfig = config + super().__init__(context, config.cli_vars, node=node) + + def packages_for_node(self) -> Iterable[Project]: + dependencies = self.config.load_dependencies() + package_name = self.node.package_name + + if package_name != self.config.project_name: + if package_name not in dependencies: + # I don't think this is actually reachable + raise_compiler_error( + f'Node package named {package_name} not found!', + self.node + ) + yield dependencies[package_name] + yield self.config + + def _generate_merged(self) -> Mapping[str, Any]: + cli_vars = self.config.cli_vars + + # once sources have FQNs, add ParsedSourceDefinition + if not isinstance(self.node, (CompiledNode, ParsedNode)): + return cli_vars + + adapter_type = self.config.credentials.type + + merged = MultiDict() + for project in self.packages_for_node(): + merged.add(project.vars.vars_for(self.node, adapter_type)) + merged.add(self.cli_vars) + return merged + + +class ParseVar(ModelConfiguredVar): def get_missing_var(self, var_name): # in the parser, just always return None. return None -class RuntimeVar(Var): +class RuntimeVar(ModelConfiguredVar): pass # Providers +class Provider(Protocol): + execute: bool + Config: Type[Config] + DatabaseWrapper: Type[BaseDatabaseWrapper] + Var: Type[ModelConfiguredVar] + ref: Type[BaseRefResolver] + source: Type[BaseSourceResolver] + + class ParseProvider(Provider): execute = False Config = ParseConfigObject @@ -438,15 +484,24 @@ class OperationProvider(RuntimeProvider): # Base context collection, used for parsing configs. class ProviderContext(ManifestContext): - def __init__(self, model, config, manifest, provider, source_config): + def __init__( + self, + model, + config: RuntimeConfig, + manifest: Manifest, + provider: Provider, + context_config: Optional[ContextConfigType], + ) -> None: if provider is None: raise InternalException( f"Invalid provider given to context: {provider}" ) + # mypy appeasement - we know it'll be a RuntimeConfig + self.config: RuntimeConfig super().__init__(config, manifest, model.package_name) self.sql_results: Dict[str, AttrDict] = {} self.model: Union[ParsedMacro, NonSourceNode] = model - self.source_config = source_config + self.context_config: Optional[ContextConfigType] = context_config self.provider: Provider = provider self.adapter = get_adapter(self.config) self.db_wrapper = self.provider.DatabaseWrapper(self.adapter) @@ -648,7 +703,7 @@ def ctx_config(self) -> Config: {%- set unique_key = config.require('unique_key') -%} ... """ # noqa - return self.provider.Config(self.model, self.source_config) + return self.provider.Config(self.model, self.context_config) @contextproperty def execute(self) -> bool: @@ -758,9 +813,11 @@ def schema(self) -> str: return self.config.credentials.schema @contextproperty - def var(self) -> Var: + def var(self) -> ModelConfiguredVar: return self.provider.Var( - self.model, context=self._ctx, overrides=self.config.cli_vars + context=self._ctx, + config=self.config, + node=self.model, ) @contextproperty('adapter') @@ -860,8 +917,7 @@ def graph(self) -> Dict[str, Any]: ## Accessing sources - To access the sources in your dbt project programatically, filter for - nodes where the `resource_type == 'source'`. + To access the sources in your dbt project programatically, use the "sources" attribute. Example usage: @@ -872,7 +928,7 @@ def graph(self) -> Dict[str, Any]: which begin with the string "event_" */ {% set sources = [] -%} - {% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "source") -%} + {% for node in graph.sources.values() -%} {%- if node.name.startswith('event_') and node.source_name == 'snowplow' -%} {%- do sources.append(source(node.source_name, node.name)) -%} {%- endif -%} @@ -1016,10 +1072,10 @@ def generate_parser_model( model: NonSourceNode, config: RuntimeConfig, manifest: Manifest, - source_config: SourceConfig, + context_config: ContextConfigType, ) -> Dict[str, Any]: ctx = ModelContext( - model, config, manifest, ParseProvider(), source_config + model, config, manifest, ParseProvider(), context_config ) return ctx.to_dict() diff --git a/core/dbt/contracts/connection.py b/core/dbt/contracts/connection.py index e786cf5f748..0868d0791a4 100644 --- a/core/dbt/contracts/connection.py +++ b/core/dbt/contracts/connection.py @@ -2,7 +2,8 @@ import itertools from dataclasses import dataclass, field from typing import ( - Any, ClassVar, Dict, Tuple, Iterable, Optional, NewType, List, Callable) + Any, ClassVar, Dict, Tuple, Iterable, Optional, NewType, List, Callable, +) from typing_extensions import Protocol from hologram import JsonSchemaMixin @@ -135,8 +136,10 @@ def from_dict(cls, data): return super().from_dict(data) @classmethod - def translate_aliases(cls, kwargs: Dict[str, Any]) -> Dict[str, Any]: - return translate_aliases(kwargs, cls._ALIASES) + def translate_aliases( + cls, kwargs: Dict[str, Any], recurse: bool = False + ) -> Dict[str, Any]: + return translate_aliases(kwargs, cls._ALIASES, recurse) def to_dict(self, omit_none=True, validate=False, *, with_aliases=False): serialized = super().to_dict(omit_none=omit_none, validate=validate) diff --git a/core/dbt/contracts/graph/compiled.py b/core/dbt/contracts/graph/compiled.py index 78adc9c4448..e68c72e2bb4 100644 --- a/core/dbt/contracts/graph/compiled.py +++ b/core/dbt/contracts/graph/compiled.py @@ -209,6 +209,7 @@ def _inject_ctes_into_sql(sql: str, ctes: List[InjectedCTE]) -> str: # for some types, the compiled type is the parsed type, so make this easy CompiledType = Union[Type[CompiledNode], Type[ParsedResource]] +CompiledResource = Union[ParsedResource, CompiledNode] def compiled_type_for(parsed: ParsedNode) -> CompiledType: @@ -243,8 +244,8 @@ def parsed_instance_for(compiled: CompiledNode) -> ParsedResource: NonSourceParsedNode = Union[ ParsedAnalysisNode, ParsedDataTestNode, - ParsedModelNode, ParsedHookNode, + ParsedModelNode, ParsedRPCNode, ParsedSchemaTestNode, ParsedSeedNode, @@ -252,7 +253,7 @@ def parsed_instance_for(compiled: CompiledNode) -> ParsedResource: ] -# This is anything that can be in manifest.nodes and isn't a Source. +# This is anything that can be in manifest.nodes. NonSourceNode = Union[ NonSourceCompiledNode, NonSourceParsedNode, diff --git a/core/dbt/contracts/graph/manifest.py b/core/dbt/contracts/graph/manifest.py index 8cb3c947802..e33257aaa2b 100644 --- a/core/dbt/contracts/graph/manifest.py +++ b/core/dbt/contracts/graph/manifest.py @@ -13,26 +13,27 @@ from hologram import JsonSchemaMixin +from dbt.contracts.graph.compiled import CompileResultNode, NonSourceNode from dbt.contracts.graph.parsed import ( - ParsedNode, ParsedMacro, ParsedDocumentation, ParsedNodePatch, - ParsedMacroPatch, ParsedSourceDefinition + ParsedMacro, ParsedDocumentation, ParsedNodePatch, ParsedMacroPatch, + ParsedSourceDefinition ) -from dbt.contracts.graph.compiled import CompileResultNode, NonSourceNode from dbt.contracts.util import Writable, Replaceable from dbt.exceptions import ( raise_duplicate_resource_name, InternalException, raise_compiler_error, - warn_or_error + warn_or_error, raise_invalid_patch ) +from dbt.helper_types import PathSet from dbt.include.global_project import PACKAGES from dbt.logger import GLOBAL_LOGGER as logger from dbt.node_types import NodeType -from dbt.ui import printer from dbt import deprecations from dbt import tracking import dbt.utils NodeEdgeMap = Dict[str, List[str]] MacroKey = Tuple[str, str] +SourceKey = Tuple[str, str] @dataclass @@ -139,8 +140,10 @@ class SourceFile(JsonSchemaMixin): sources: List[str] = field(default_factory=list) # any node patches in this file. The entries are names, not unique ids! patches: List[str] = field(default_factory=list) - # any macro patches in this file. The entries are pacakge, name pairs. + # any macro patches in this file. The entries are package, name pairs. macro_patches: List[MacroKey] = field(default_factory=list) + # any source patches in this file. The entries are package, name pairs + source_patches: List[SourceKey] = field(default_factory=list) @property def search_key(self) -> Optional[str]: @@ -226,7 +229,7 @@ def _sort_values(dct): return {k: sorted(v) for k, v in dct.items()} -def build_edges(nodes): +def build_edges(nodes: List[NonSourceNode]): """Build the forward and backward edges on the given list of ParsedNodes and return them as two separate dictionaries, each mapping unique IDs to lists of edges. @@ -382,20 +385,55 @@ def search(self, haystack: Iterable[N]) -> Optional[N]: return None +D = TypeVar('D') + + @dataclass -class Disabled: - target: ParsedNode +class Disabled(Generic[D]): + target: D + + +MaybeParsedSource = Optional[Union[ + ParsedSourceDefinition, + Disabled[ParsedSourceDefinition], +]] + + +MaybeNonSource = Optional[Union[ + NonSourceNode, + Disabled[NonSourceNode] +]] + + +T = TypeVar('T', bound=CompileResultNode) + + +def _update_into(dest: MutableMapping[str, T], new_item: T): + unique_id = new_item.unique_id + if unique_id not in dest: + raise dbt.exceptions.RuntimeException( + f'got an update_{new_item.resource_type} call with an ' + f'unrecognized {new_item.resource_type}: {new_item.unique_id}' + ) + existing = dest[unique_id] + if new_item.original_file_path != existing.original_file_path: + raise dbt.exceptions.RuntimeException( + f'cannot update a {new_item.resource_type} to have a new file ' + f'path!' + ) + dest[unique_id] = new_item @dataclass class Manifest: """The manifest for the full graph, after parsing and during compilation. """ - nodes: MutableMapping[str, CompileResultNode] + nodes: MutableMapping[str, NonSourceNode] + sources: MutableMapping[str, ParsedSourceDefinition] macros: MutableMapping[str, ParsedMacro] docs: MutableMapping[str, ParsedDocumentation] generated_at: datetime - disabled: List[ParsedNode] + disabled: List[CompileResultNode] files: MutableMapping[str, SourceFile] metadata: ManifestMetadata = field(default_factory=ManifestMetadata) flat_graph: Dict[str, Any] = field(default_factory=dict) @@ -412,6 +450,7 @@ def from_macros( files = {} return cls( nodes={}, + sources={}, macros=macros, docs={}, generated_at=datetime.utcnow(), @@ -419,19 +458,11 @@ def from_macros( files=files, ) - def update_node(self, new_node): - unique_id = new_node.unique_id - if unique_id not in self.nodes: - raise dbt.exceptions.RuntimeException( - 'got an update_node call with an unrecognized node: {}' - .format(unique_id) - ) - existing = self.nodes[unique_id] - if new_node.original_file_path != existing.original_file_path: - raise dbt.exceptions.RuntimeException( - 'cannot update a node to have a new file path!' - ) - self.nodes[unique_id] = new_node + def update_node(self, new_node: NonSourceNode): + _update_into(self.nodes, new_node) + + def update_source(self, new_source: ParsedSourceDefinition): + _update_into(self.sources, new_source) def build_flat_graph(self): """This attribute is used in context.common by each node, so we want to @@ -443,17 +474,30 @@ def build_flat_graph(self): 'nodes': { k: v.to_dict(omit_none=False) for k, v in self.nodes.items() }, + 'sources': { + k: v.to_dict(omit_none=False) for k, v in self.sources.items() + } } def find_disabled_by_name( self, name: str, package: Optional[str] = None - ) -> Optional[ParsedNode]: + ) -> Optional[NonSourceNode]: searcher: NameSearcher = NameSearcher( name, package, NodeType.refable() ) result = searcher.search(self.disabled) + return result + + def find_disabled_source_by_name( + self, source_name: str, table_name: str, package: Optional[str] = None + ) -> Optional[ParsedSourceDefinition]: + search_name = f'{source_name}.{table_name}' + searcher: NameSearcher = NameSearcher( + search_name, package, [NodeType.Source] + ) + result = searcher.search(self.disabled) if result is not None: - assert isinstance(result, ParsedNode) + assert isinstance(result, ParsedSourceDefinition) return result def find_docs_by_name( @@ -463,8 +507,6 @@ def find_docs_by_name( name, package, [NodeType.Documentation] ) result = searcher.search(self.docs.values()) - if result is not None: - assert isinstance(result, ParsedDocumentation) return result def find_refable_by_name( @@ -477,8 +519,6 @@ def find_refable_by_name( name, package, NodeType.refable() ) result = searcher.search(self.nodes.values()) - if result is not None: - assert not isinstance(result, ParsedSourceDefinition) return result def find_source_by_name( @@ -490,9 +530,7 @@ def find_source_by_name( name = f'{source_name}.{table_name}' searcher: NameSearcher = NameSearcher(name, package, [NodeType.Source]) - result = searcher.search(self.nodes.values()) - if result is not None: - assert isinstance(result, ParsedSourceDefinition) + result = searcher.search(self.sources.values()) return result def _find_macros_by_name( @@ -593,19 +631,17 @@ def find_materialization_macro_by_name( )) return candidates.last() - def get_resource_fqns(self) -> Dict[str, Set[Tuple[str, ...]]]: + def get_resource_fqns(self) -> Mapping[str, PathSet]: resource_fqns: Dict[str, Set[Tuple[str, ...]]] = {} - for unique_id, node in self.nodes.items(): - if node.resource_type == NodeType.Source: - continue # sources have no FQNs and can't be configured - resource_type_plural = node.resource_type + 's' + all_resources = chain(self.nodes.values(), self.sources.values()) + for resource in all_resources: + resource_type_plural = resource.resource_type.pluralize() if resource_type_plural not in resource_fqns: resource_fqns[resource_type_plural] = set() - resource_fqns[resource_type_plural].add(tuple(node.fqn)) - + resource_fqns[resource_type_plural].add(tuple(resource.fqn)) return resource_fqns - def add_nodes(self, new_nodes): + def add_nodes(self, new_nodes: Mapping[str, NonSourceNode]): """Add the given dict of new nodes to the manifest.""" for unique_id, node in new_nodes.items(): if unique_id in self.nodes: @@ -642,10 +678,6 @@ def patch_nodes( # nodes looking for matching names. We could use a NameSearcher if we # were ok with doing an O(n*m) search (one nodes scan per patch) for node in self.nodes.values(): - if node.resource_type == NodeType.Source: - continue - # appease mypy - we know this because of the check above - assert not isinstance(node, ParsedSourceDefinition) patch = patches.pop(node.name, None) if not patch: continue @@ -658,19 +690,9 @@ def patch_nodes( patch=patch, node=node, expected_key=expected_key ) else: - msg = printer.line_wrap_message( - f'''\ - '{node.name}' is a {node.resource_type} node, but it is - specified in the {patch.yaml_key} section of - {patch.original_file_path}. - - - - To fix this error, place the `{node.name}` - specification under the {expected_key} key instead. - ''' + raise_invalid_patch( + node, patch.yaml_key, patch.original_file_path ) - raise_compiler_error(msg) node.patch(patch) @@ -686,17 +708,21 @@ def patch_nodes( def get_used_schemas(self, resource_types=None): return frozenset({ - (node.database, node.schema) - for node in self.nodes.values() + (node.database, node.schema) for node in + chain(self.nodes.values(), self.sources.values()) if not resource_types or node.resource_type in resource_types }) def get_used_databases(self): - return frozenset(node.database for node in self.nodes.values()) + return frozenset( + x.database for x in + chain(self.nodes.values(), self.sources.values()) + ) def deepcopy(self): return Manifest( nodes={k: _deepcopy(v) for k, v in self.nodes.items()}, + sources={k: _deepcopy(v) for k, v in self.sources.items()}, macros={k: _deepcopy(v) for k, v in self.macros.items()}, docs={k: _deepcopy(v) for k, v in self.docs.items()}, generated_at=self.generated_at, @@ -706,10 +732,12 @@ def deepcopy(self): ) def writable_manifest(self): - forward_edges, backward_edges = build_edges(self.nodes.values()) + edge_members = list(chain(self.nodes.values(), self.sources.values())) + forward_edges, backward_edges = build_edges(edge_members) return WritableManifest( nodes=self.nodes, + sources=self.sources, macros=self.macros, docs=self.docs, generated_at=self.generated_at, @@ -719,25 +747,6 @@ def writable_manifest(self): parent_map=backward_edges, ) - @classmethod - def from_writable_manifest(cls, writable): - self = cls( - nodes=writable.nodes, - macros=writable.macros, - docs=writable.docs, - generated_at=writable.generated_at, - metadata=writable.metadata, - disabled=writable.disabled, - files=writable.files, - ) - self.metadata = writable.metadata - return self - - @classmethod - def from_dict(cls, data, validate=True): - writable = WritableManifest.from_dict(data=data, validate=validate) - return cls.from_writable_manifest(writable) - def to_dict(self, omit_none=True, validate=False): return self.writable_manifest().to_dict( omit_none=omit_none, validate=validate @@ -747,12 +756,15 @@ def write(self, path): self.writable_manifest().write(path) def expect(self, unique_id: str) -> CompileResultNode: - if unique_id not in self.nodes: + if unique_id in self.nodes: + return self.nodes[unique_id] + elif unique_id in self.sources: + return self.sources[unique_id] + else: # something terrible has happened raise dbt.exceptions.InternalException( 'Expected node {} not found in manifest'.format(unique_id) ) - return self.nodes[unique_id] def resolve_ref( self, @@ -760,14 +772,14 @@ def resolve_ref( target_model_package: Optional[str], current_project: str, node_package: str, - ) -> Optional[Union[NonSourceNode, Disabled]]: + ) -> MaybeNonSource: if target_model_package is not None: return self.find_refable_by_name( target_model_name, target_model_package) - target_model = None - disabled_target = None + target_model: Optional[NonSourceNode] = None + disabled_target: Optional[NonSourceNode] = None # first pass: look for models in the current_project # second pass: look for models in the node's package @@ -800,18 +812,28 @@ def resolve_source( target_table_name: str, current_project: str, node_package: str - ) -> Optional[ParsedSourceDefinition]: + ) -> MaybeParsedSource: candidate_targets = [current_project, node_package, None] - target_source = None + + target_source: Optional[ParsedSourceDefinition] = None + disabled_target: Optional[ParsedSourceDefinition] = None + for candidate in candidate_targets: target_source = self.find_source_by_name( target_source_name, target_table_name, candidate ) - if target_source is not None: + if target_source is not None and target_source.config.enabled: return target_source + if disabled_target is None: + disabled_target = self.find_disabled_source_by_name( + target_source_name, target_table_name, candidate + ) + + if disabled_target is not None: + return Disabled(disabled_target) return None def resolve_doc( @@ -845,11 +867,16 @@ def resolve_doc( @dataclass class WritableManifest(JsonSchemaMixin, Writable): - nodes: Mapping[str, CompileResultNode] = field( + nodes: Mapping[str, NonSourceNode] = field( metadata=dict(description=( 'The nodes defined in the dbt project and its dependencies' )), ) + sources: Mapping[str, ParsedSourceDefinition] = field( + metadata=dict(description=( + 'The sources defined in the dbt project and its dependencies', + )) + ) macros: Mapping[str, ParsedMacro] = field( metadata=dict(description=( 'The macros defined in the dbt project and its dependencies' @@ -860,7 +887,7 @@ class WritableManifest(JsonSchemaMixin, Writable): 'The docs defined in the dbt project and its dependencies' )) ) - disabled: Optional[List[ParsedNode]] = field(metadata=dict( + disabled: Optional[List[CompileResultNode]] = field(metadata=dict( description='A list of the disabled nodes in the target' )) generated_at: datetime = field(metadata=dict( diff --git a/core/dbt/contracts/graph/model_config.py b/core/dbt/contracts/graph/model_config.py new file mode 100644 index 00000000000..468695b521e --- /dev/null +++ b/core/dbt/contracts/graph/model_config.py @@ -0,0 +1,532 @@ +from dataclasses import field, Field, dataclass +from enum import Enum +from typing import ( + Any, List, Optional, Dict, MutableMapping, Union, Type, NewType, Tuple, + TypeVar +) + +# TODO: patch+upgrade hologram to avoid this jsonschema import +import jsonschema # type: ignore + +# This is protected, but we really do want to reuse this logic, and the cache! +# It would be nice to move the custom error picking stuff into hologram! +from hologram import _validate_schema +from hologram import JsonSchemaMixin, ValidationError +from hologram.helpers import StrEnum, register_pattern + +from dbt import hooks +from dbt.contracts.graph.unparsed import AdditionalPropertiesAllowed +from dbt.exceptions import CompilationException, InternalException +from dbt.contracts.util import Replaceable, list_str +from dbt.node_types import NodeType + + +def _get_meta_value(cls: Type[Enum], fld: Field, key: str, default: Any): + # a metadata field might exist. If it does, it might have a matching key. + # If it has both, make sure the value is valid and return it. If it + # doesn't, return the default. + if fld.metadata: + value = fld.metadata.get(key, default) + else: + value = default + + try: + return cls(value) + except ValueError as exc: + raise InternalException( + f'Invalid {cls} value: {value}' + ) from exc + + +def _set_meta_value( + obj: Enum, key: str, existing: Optional[Dict[str, Any]] = None +) -> Dict[str, Any]: + if existing is None: + result = {} + else: + result = existing.copy() + result.update({key: obj}) + return result + + +MERGE_KEY = 'merge' + + +class MergeBehavior(Enum): + Append = 1 + Update = 2 + Clobber = 3 + + @classmethod + def from_field(cls, fld: Field) -> 'MergeBehavior': + return _get_meta_value(cls, fld, MERGE_KEY, cls.Clobber) + + def meta(self, existing: Optional[Dict[str, Any]] = None): + return _set_meta_value(self, MERGE_KEY, existing) + + +SHOW_HIDE_KEY = 'show_hide' + + +class ShowBehavior(Enum): + Show = 1 + Hide = 2 + + @classmethod + def from_field(cls, fld: Field) -> 'ShowBehavior': + return _get_meta_value(cls, fld, SHOW_HIDE_KEY, cls.Show) + + def meta(self, existing: Optional[Dict[str, Any]] = None): + return _set_meta_value(self, SHOW_HIDE_KEY, existing) + + +def _listify(value: Any) -> List: + if isinstance(value, list): + return value[:] + else: + return [value] + + +def _merge_field_value( + merge_behavior: MergeBehavior, + self_value: Any, + other_value: Any, +): + if merge_behavior == MergeBehavior.Clobber: + return other_value + elif merge_behavior == MergeBehavior.Append: + return _listify(self_value) + _listify(other_value) + elif merge_behavior == MergeBehavior.Update: + if not isinstance(self_value, dict): + raise InternalException(f'expected dict, got {self_value}') + if not isinstance(other_value, dict): + raise InternalException(f'expected dict, got {other_value}') + value = self_value.copy() + value.update(other_value) + return value + else: + raise InternalException( + f'Got an invalid merge_behavior: {merge_behavior}' + ) + + +def insensitive_patterns(*patterns: str): + lowercased = [] + for pattern in patterns: + lowercased.append( + ''.join('[{}{}]'.format(s.upper(), s.lower()) for s in pattern) + ) + return '^({})$'.format('|'.join(lowercased)) + + +Severity = NewType('Severity', str) +register_pattern(Severity, insensitive_patterns('warn', 'error')) + + +class SnapshotStrategy(StrEnum): + Timestamp = 'timestamp' + Check = 'check' + + +class All(StrEnum): + All = 'all' + + +@dataclass +class Hook(JsonSchemaMixin, Replaceable): + sql: str + transaction: bool = True + index: Optional[int] = None + + +T = TypeVar('T', bound='BaseConfig') + + +@dataclass +class BaseConfig( + AdditionalPropertiesAllowed, Replaceable, MutableMapping[str, Any] +): + # Implement MutableMapping so this config will behave as some macros expect + # during parsing (notably, syntax like `{{ node.config['schema'] }}`) + def __getitem__(self, key): + """Handle parse-time use of `config` as a dictionary, making the extra + values available during parsing. + """ + if hasattr(self, key): + return getattr(self, key) + else: + return self._extra[key] + + def __setitem__(self, key, value): + if hasattr(self, key): + setattr(self, key, value) + else: + self._extra[key] = value + + def __delitem__(self, key): + if hasattr(self, key): + msg = ( + 'Error, tried to delete config key "{}": Cannot delete ' + 'built-in keys' + ).format(key) + raise CompilationException(msg) + else: + del self._extra[key] + + def __iter__(self): + for fld, _ in self._get_fields(): + yield fld.name + + for key in self._extra: + yield key + + def __len__(self): + return len(self._get_fields()) + len(self._extra) + + @classmethod + def _extract_dict( + cls, src: Dict[str, Any], data: Dict[str, Any] + ) -> Dict[str, Any]: + """Find all the items in data that match a target_field on this class, + and merge them with the data found in `src` for target_field, using the + field's specified merge behavior. Matching items will be removed from + `data` (but _not_ `src`!). + + Returns a dict with the merge results. + + That means this method mutates its input! Any remaining values in data + were not merged. + """ + result = {} + + for fld, target_field in cls._get_fields(): + if target_field not in data: + continue + + data_attr = data.pop(target_field) + if target_field not in src: + result[target_field] = data_attr + continue + + merge_behavior = MergeBehavior.from_field(fld) + self_attr = src[target_field] + + result[target_field] = _merge_field_value( + merge_behavior=merge_behavior, + self_value=self_attr, + other_value=data_attr, + ) + return result + + def to_dict( + self, + omit_none: bool = True, + validate: bool = False, + *, + omit_hidden: bool = True, + ) -> Dict[str, Any]: + result = super().to_dict(omit_none=omit_none, validate=validate) + if omit_hidden and not omit_none: + for fld, target_field in self._get_fields(): + if target_field not in result: + continue + + # if the field is not None, preserve it regardless of the + # setting. This is in line with existing behavior, but isn't + # an endorsement of it! + if result[target_field] is not None: + continue + + show_behavior = ShowBehavior.from_field(fld) + if show_behavior == ShowBehavior.Hide: + del result[target_field] + return result + + def update_from( + self: T, data: Dict[str, Any], adapter_type: str, validate: bool = True + ) -> T: + """Given a dict of keys, update the current config from them, validate + it, and return a new config with the updated values + """ + # sadly, this is a circular import + from dbt.adapters.factory import get_config_class_by_name + dct = self.to_dict(omit_none=False, validate=False, omit_hidden=False) + + adapter_config_cls = get_config_class_by_name(adapter_type) + + self_merged = self._extract_dict(dct, data) + dct.update(self_merged) + + adapter_merged = adapter_config_cls._extract_dict(dct, data) + dct.update(adapter_merged) + + # any remaining fields must be "clobber" + dct.update(data) + + # any validation failures must have come from the update + return self.from_dict(dct, validate=validate) + + def finalize_and_validate(self: T) -> T: + self.to_dict(validate=True) + return self.replace() + + +@dataclass +class SourceConfig(BaseConfig): + enabled: bool = True + + +@dataclass +class NodeConfig(BaseConfig): + enabled: bool = True + materialized: str = 'view' + persist_docs: Dict[str, Any] = field(default_factory=dict) + post_hook: List[Hook] = field( + default_factory=list, + metadata=MergeBehavior.Append.meta(), + ) + pre_hook: List[Hook] = field( + default_factory=list, + metadata=MergeBehavior.Append.meta(), + ) + vars: Dict[str, Any] = field( + default_factory=dict, + metadata=MergeBehavior.Update.meta(), + ) + quoting: Dict[str, Any] = field( + default_factory=dict, + metadata=MergeBehavior.Update.meta(), + ) + # This is actually only used by seeds. Should it be available to others? + # That would be a breaking change! + column_types: Dict[str, Any] = field( + default_factory=dict, + metadata=MergeBehavior.Update.meta(), + ) + # these fields are all config-only (they're ultimately applied to the node) + alias: Optional[str] = field( + default=None, + metadata=ShowBehavior.Hide.meta(), + ) + schema: Optional[str] = field( + default=None, + metadata=ShowBehavior.Hide.meta(), + ) + database: Optional[str] = field( + default=None, + metadata=ShowBehavior.Hide.meta(), + ) + tags: Union[List[str], str] = field( + default_factory=list_str, + # TODO: hide this one? + metadata=MergeBehavior.Append.meta(), + ) + + @classmethod + def from_dict(cls, data, validate=True): + for key in hooks.ModelHookType: + if key in data: + data[key] = [hooks.get_hook_dict(h) for h in data[key]] + return super().from_dict(data, validate=validate) + + @classmethod + def field_mapping(cls): + return {'post_hook': 'post-hook', 'pre_hook': 'pre-hook'} + + +@dataclass +class SeedConfig(NodeConfig): + materialized: str = 'seed' + quote_columns: Optional[bool] = None + + +@dataclass +class TestConfig(NodeConfig): + severity: Severity = Severity('ERROR') + + +SnapshotVariants = Union[ + 'TimestampSnapshotConfig', + 'CheckSnapshotConfig', + 'GenericSnapshotConfig', +] + + +def _relevance_without_strategy(error: jsonschema.ValidationError): + # calculate the 'relevance' of an error the normal jsonschema way, except + # if the validator is in the 'strategy' field and its conflicting with the + # 'enum'. This suppresses `"'timestamp' is not one of ['check']` and such + if 'strategy' in error.path and error.validator in {'enum', 'not'}: + length = 1 + else: + length = -len(error.path) + validator = error.validator + return length, validator not in {'anyOf', 'oneOf'} + + +@dataclass +class SnapshotWrapper(JsonSchemaMixin): + """This is a little wrapper to let us serialize/deserialize the + SnapshotVariants union. + """ + config: SnapshotVariants # mypy: ignore + + @classmethod + def validate(cls, data: Any): + schema = _validate_schema(cls) + validator = jsonschema.Draft7Validator(schema) + error = jsonschema.exceptions.best_match( + validator.iter_errors(data), + key=_relevance_without_strategy, + ) + if error is not None: + raise ValidationError.create_from(error) from error + + +@dataclass +class EmptySnapshotConfig(NodeConfig): + materialized: str = 'snapshot' + + +@dataclass(init=False) +class SnapshotConfig(EmptySnapshotConfig): + unique_key: str = field(init=False, metadata=dict(init_required=True)) + target_schema: str = field(init=False, metadata=dict(init_required=True)) + target_database: Optional[str] = None + + def __init__( + self, + unique_key: str, + target_schema: str, + target_database: Optional[str] = None, + **kwargs + ) -> None: + self.unique_key = unique_key + self.target_schema = target_schema + self.target_database = target_database + # kwargs['materialized'] = materialized + super().__init__(**kwargs) + + # type hacks... + @classmethod + def _get_fields(cls) -> List[Tuple[Field, str]]: # type: ignore + fields: List[Tuple[Field, str]] = [] + for old_field, name in super()._get_fields(): + new_field = old_field + # tell hologram we're really an initvar + if old_field.metadata and old_field.metadata.get('init_required'): + new_field = field(init=True, metadata=old_field.metadata) + new_field.name = old_field.name + new_field.type = old_field.type + new_field._field_type = old_field._field_type # type: ignore + fields.append((new_field, name)) + return fields + + def finalize_and_validate(self: 'SnapshotConfig') -> SnapshotVariants: + data = self.to_dict() + return SnapshotWrapper.from_dict({'config': data}).config + + +@dataclass(init=False) +class GenericSnapshotConfig(SnapshotConfig): + strategy: str = field(init=False, metadata=dict(init_required=True)) + + def __init__(self, strategy: str, **kwargs) -> None: + self.strategy = strategy + super().__init__(**kwargs) + + @classmethod + def _collect_json_schema( + cls, definitions: Dict[str, Any] + ) -> Dict[str, Any]: + # this is the method you want to override in hologram if you want + # to do clever things about the json schema and have classes that + # contain instances of your JsonSchemaMixin respect the change. + schema = super()._collect_json_schema(definitions) + + # Instead of just the strategy we'd calculate normally, say + # "this strategy except none of our specialization strategies". + strategies = [schema['properties']['strategy']] + for specialization in (TimestampSnapshotConfig, CheckSnapshotConfig): + strategies.append( + {'not': specialization.json_schema()['properties']['strategy']} + ) + + schema['properties']['strategy'] = { + 'allOf': strategies + } + return schema + + +@dataclass(init=False) +class TimestampSnapshotConfig(SnapshotConfig): + strategy: str = field( + init=False, + metadata=dict( + restrict=[str(SnapshotStrategy.Timestamp)], + init_required=True, + ), + ) + updated_at: str = field(init=False, metadata=dict(init_required=True)) + + def __init__( + self, strategy: str, updated_at: str, **kwargs + ) -> None: + self.strategy = strategy + self.updated_at = updated_at + super().__init__(**kwargs) + + +@dataclass(init=False) +class CheckSnapshotConfig(SnapshotConfig): + strategy: str = field( + init=False, + metadata=dict( + restrict=[str(SnapshotStrategy.Check)], + init_required=True, + ), + ) + # TODO: is there a way to get this to accept tuples of strings? Adding + # `Tuple[str, ...]` to the list of types results in this: + # ['email'] is valid under each of {'type': 'array', 'items': + # {'type': 'string'}}, {'type': 'array', 'items': {'type': 'string'}} + # but without it, parsing gets upset about values like `('email',)` + # maybe hologram itself should support this behavior? It's not like tuples + # are meaningful in json + check_cols: Union[All, List[str]] = field( + init=False, + metadata=dict(init_required=True), + ) + + def __init__( + self, strategy: str, check_cols: Union[All, List[str]], + **kwargs + ) -> None: + self.strategy = strategy + self.check_cols = check_cols + super().__init__(**kwargs) + + +RESOURCE_TYPES: Dict[NodeType, Type[BaseConfig]] = { + NodeType.Source: SourceConfig, + NodeType.Seed: SeedConfig, + NodeType.Test: TestConfig, + NodeType.Model: NodeConfig, + NodeType.Snapshot: SnapshotConfig, +} + + +# base resource types are like resource types, except nothing has mandatory +# configs. +BASE_RESOURCE_TYPES: Dict[NodeType, Type[BaseConfig]] = RESOURCE_TYPES.copy() +BASE_RESOURCE_TYPES.update({ + NodeType.Snapshot: EmptySnapshotConfig +}) + + +def get_config_for(resource_type: NodeType, base=False) -> Type[BaseConfig]: + if base: + lookup = BASE_RESOURCE_TYPES + else: + lookup = RESOURCE_TYPES + return lookup.get(resource_type, NodeConfig) diff --git a/core/dbt/contracts/graph/parsed.py b/core/dbt/contracts/graph/parsed.py index 965bccf49fe..f395c5c912c 100644 --- a/core/dbt/contracts/graph/parsed.py +++ b/core/dbt/contracts/graph/parsed.py @@ -1,118 +1,46 @@ import os -from dataclasses import dataclass, field, Field +from dataclasses import dataclass, field +from pathlib import Path from typing import ( Optional, Union, List, Dict, Any, - Type, + Sequence, Tuple, - NewType, - MutableMapping, + Iterator, ) from hologram import JsonSchemaMixin -from hologram.helpers import ( - StrEnum, register_pattern -) from dbt.clients.system import write_file import dbt.flags from dbt.contracts.graph.unparsed import ( UnparsedNode, UnparsedDocumentation, Quoting, Docs, UnparsedBaseNode, FreshnessThreshold, ExternalTable, - AdditionalPropertiesAllowed, HasYamlMetadata, MacroArgument + HasYamlMetadata, MacroArgument, UnparsedSourceDefinition, + UnparsedSourceTableDefinition, UnparsedColumn, TestDef ) -from dbt.contracts.util import Replaceable, list_str +from dbt.contracts.util import Replaceable from dbt.logger import GLOBAL_LOGGER as logger # noqa from dbt.node_types import NodeType -class SnapshotStrategy(StrEnum): - Timestamp = 'timestamp' - Check = 'check' - - -class All(StrEnum): - All = 'all' - - -@dataclass -class Hook(JsonSchemaMixin, Replaceable): - sql: str - transaction: bool = True - index: Optional[int] = None - - -def insensitive_patterns(*patterns: str): - lowercased = [] - for pattern in patterns: - lowercased.append( - ''.join('[{}{}]'.format(s.upper(), s.lower()) for s in pattern) - ) - return '^({})$'.format('|'.join(lowercased)) - - -Severity = NewType('Severity', str) -register_pattern(Severity, insensitive_patterns('warn', 'error')) - - -@dataclass -class NodeConfig( - AdditionalPropertiesAllowed, Replaceable, MutableMapping[str, Any] -): - enabled: bool = True - materialized: str = 'view' - persist_docs: Dict[str, Any] = field(default_factory=dict) - post_hook: List[Hook] = field(default_factory=list) - pre_hook: List[Hook] = field(default_factory=list) - vars: Dict[str, Any] = field(default_factory=dict) - quoting: Dict[str, Any] = field(default_factory=dict) - column_types: Dict[str, Any] = field(default_factory=dict) - tags: Union[List[str], str] = field(default_factory=list_str) - - @classmethod - def field_mapping(cls): - return {'post_hook': 'post-hook', 'pre_hook': 'pre-hook'} - - # Implement MutableMapping so this config will behave as some macros expect - # during parsing (notably, syntax like `{{ node.config['schema'] }}`) - - def __getitem__(self, key): - """Handle parse-time use of `config` as a dictionary, making the extra - values available during parsing. - """ - if hasattr(self, key): - return getattr(self, key) - else: - return self._extra[key] - - def __setitem__(self, key, value): - if hasattr(self, key): - setattr(self, key, value) - else: - self._extra[key] = value - - def __delitem__(self, key): - if hasattr(self, key): - msg = ( - 'Error, tried to delete config key "{}": Cannot delete ' - 'built-in keys' - ).format(key) - raise dbt.exceptions.CompilationException(msg) - else: - del self._extra[key] - - def __iter__(self): - for fld, _ in self._get_fields(): - yield fld.name - - for key in self._extra: - yield key - - def __len__(self): - return len(self._get_fields()) + len(self._extra) +from .model_config import ( + NodeConfig, + SeedConfig, + TestConfig, + SourceConfig, + EmptySnapshotConfig, + SnapshotVariants, +) +# import these 3 so the SnapshotVariants forward ref works. +from .model_config import ( # noqa + TimestampSnapshotConfig, + CheckSnapshotConfig, + GenericSnapshotConfig, +) @dataclass @@ -209,6 +137,7 @@ class ParsedNodeMandatory( Replaceable ): alias: str + config: NodeConfig = field(default_factory=NodeConfig) @property def identifier(self): @@ -217,7 +146,6 @@ def identifier(self): @dataclass class ParsedNodeDefaults(ParsedNodeMandatory): - config: NodeConfig = field(default_factory=NodeConfig) tags: List[str] = field(default_factory=list) refs: List[List[str]] = field(default_factory=list) sources: List[List[Any]] = field(default_factory=list) @@ -266,10 +194,6 @@ class ParsedRPCNode(ParsedNode): resource_type: NodeType = field(metadata={'restrict': [NodeType.RPCCall]}) -class SeedConfig(NodeConfig): - quote_columns: Optional[bool] = None - - @dataclass class ParsedSeedNode(ParsedNode): resource_type: NodeType = field(metadata={'restrict': [NodeType.Seed]}) @@ -281,11 +205,6 @@ def empty(self): return False -@dataclass -class TestConfig(NodeConfig): - severity: Severity = Severity('error') - - @dataclass class TestMetadata(JsonSchemaMixin): namespace: Optional[str] @@ -311,98 +230,6 @@ class ParsedSchemaTestNode(ParsedNode, HasTestMetadata): config: TestConfig = field(default_factory=TestConfig) -@dataclass(init=False) -class _SnapshotConfig(NodeConfig): - unique_key: str = field(init=False, metadata=dict(init_required=True)) - target_schema: str = field(init=False, metadata=dict(init_required=True)) - target_database: Optional[str] = None - - def __init__( - self, - unique_key: str, - target_schema: str, - target_database: Optional[str] = None, - **kwargs - ) -> None: - self.unique_key = unique_key - self.target_schema = target_schema - self.target_database = target_database - super().__init__(**kwargs) - - # type hacks... - @classmethod - def _get_fields(cls) -> List[Tuple[Field, str]]: # type: ignore - fields: List[Tuple[Field, str]] = [] - for old_field, name in super()._get_fields(): - new_field = old_field - # tell hologram we're really an initvar - if old_field.metadata and old_field.metadata.get('init_required'): - new_field = field(init=True, metadata=old_field.metadata) - new_field.name = old_field.name - new_field.type = old_field.type - new_field._field_type = old_field._field_type # type: ignore - fields.append((new_field, name)) - return fields - - -@dataclass(init=False) -class GenericSnapshotConfig(_SnapshotConfig): - strategy: str = field(init=False, metadata=dict(init_required=True)) - - def __init__(self, strategy: str, **kwargs) -> None: - self.strategy = strategy - super().__init__(**kwargs) - - -@dataclass(init=False) -class TimestampSnapshotConfig(_SnapshotConfig): - strategy: str = field( - init=False, - metadata=dict( - restrict=[str(SnapshotStrategy.Timestamp)], - init_required=True, - ), - ) - updated_at: str = field(init=False, metadata=dict(init_required=True)) - - def __init__( - self, strategy: str, updated_at: str, **kwargs - ) -> None: - self.strategy = strategy - self.updated_at = updated_at - super().__init__(**kwargs) - - -@dataclass(init=False) -class CheckSnapshotConfig(_SnapshotConfig): - strategy: str = field( - init=False, - metadata=dict( - restrict=[str(SnapshotStrategy.Check)], - init_required=True, - ), - ) - # TODO: is there a way to get this to accept tuples of strings? Adding - # `Tuple[str, ...]` to the list of types results in this: - # ['email'] is valid under each of {'type': 'array', 'items': - # {'type': 'string'}}, {'type': 'array', 'items': {'type': 'string'}} - # but without it, parsing gets upset about values like `('email',)` - # maybe hologram itself should support this behavior? It's not like tuples - # are meaningful in json - check_cols: Union[All, List[str]] = field( - init=False, - metadata=dict(init_required=True), - ) - - def __init__( - self, strategy: str, check_cols: Union[All, List[str]], - **kwargs - ) -> None: - self.strategy = strategy - self.check_cols = check_cols - super().__init__(**kwargs) - - @dataclass class IntermediateSnapshotNode(ParsedNode): # at an intermediate stage in parsing, where we've built something better @@ -412,59 +239,13 @@ class IntermediateSnapshotNode(ParsedNode): # uses a regular node config, which the snapshot parser will then convert # into a full ParsedSnapshotNode after rendering. resource_type: NodeType = field(metadata={'restrict': [NodeType.Snapshot]}) - - -def _create_if_else_chain( - key: str, - criteria: List[Tuple[str, Type[JsonSchemaMixin]]], - default: Type[JsonSchemaMixin] -) -> Dict[str, Any]: - """Mutate a given schema key that contains a 'oneOf' to instead be an - 'if-then-else' chain. This results is much better/more consistent errors - from jsonschema. - """ - schema: Dict[str, Any] = {} - result: Dict[str, Any] = {} - criteria = criteria[:] - while criteria: - if_clause, then_clause = criteria.pop() - schema['if'] = {'properties': { - key: {'enum': [if_clause]} - }} - schema['then'] = then_clause.json_schema() - schema['else'] = {} - schema = schema['else'] - schema.update(default.json_schema()) - return result + config: EmptySnapshotConfig = field(default_factory=EmptySnapshotConfig) @dataclass class ParsedSnapshotNode(ParsedNode): resource_type: NodeType = field(metadata={'restrict': [NodeType.Snapshot]}) - config: Union[ - CheckSnapshotConfig, - TimestampSnapshotConfig, - GenericSnapshotConfig, - ] - - @classmethod - def json_schema(cls, embeddable: bool = False) -> Dict[str, Any]: - schema = super().json_schema(embeddable) - - # mess with config - configs: List[Tuple[str, Type[JsonSchemaMixin]]] = [ - (str(SnapshotStrategy.Check), CheckSnapshotConfig), - (str(SnapshotStrategy.Timestamp), TimestampSnapshotConfig), - ] - - if embeddable: - dest = schema[cls.__name__]['properties'] - else: - dest = schema['properties'] - dest['config'] = _create_if_else_chain( - 'strategy', configs, GenericSnapshotConfig - ) - return schema + config: SnapshotVariants @dataclass @@ -525,12 +306,66 @@ def search_name(self): return self.name +def normalize_test(testdef: TestDef) -> Dict[str, Any]: + if isinstance(testdef, str): + return {testdef: {}} + else: + return testdef + + +@dataclass +class UnpatchedSourceDefinition(UnparsedBaseNode, HasUniqueID, HasFqn): + source: UnparsedSourceDefinition + table: UnparsedSourceTableDefinition + resource_type: NodeType = field(metadata={'restrict': [NodeType.Source]}) + patch_path: Optional[Path] = None + + @property + def name(self) -> str: + return '{0.name}_{1.name}'.format(self.source, self.table) + + @property + def quote_columns(self) -> Optional[bool]: + result = None + if self.source.quoting.column is not None: + result = self.source.quoting.column + if self.table.quoting.column is not None: + result = self.table.quoting.column + return result + + @property + def columns(self) -> Sequence[UnparsedColumn]: + if self.table.columns is None: + return [] + else: + return self.table.columns + + def get_tests( + self + ) -> Iterator[Tuple[Dict[str, Any], Optional[UnparsedColumn]]]: + for test in self.tests: + yield normalize_test(test), None + + for column in self.columns: + if column.tests is not None: + for test in column.tests: + yield normalize_test(test), column + + @property + def tests(self) -> List[TestDef]: + if self.table.tests is None: + return [] + else: + return self.table.tests + + @dataclass class ParsedSourceDefinition( - UnparsedBaseNode, - HasUniqueID, - HasRelationMetadata, - HasFqn): + UnparsedBaseNode, + HasUniqueID, + HasRelationMetadata, + HasFqn +): name: str source_name: str source_description: str @@ -546,6 +381,8 @@ class ParsedSourceDefinition( meta: Dict[str, Any] = field(default_factory=dict) source_meta: Dict[str, Any] = field(default_factory=dict) tags: List[str] = field(default_factory=list) + config: SourceConfig = field(default_factory=SourceConfig) + patch_path: Optional[Path] = None @property def is_refable(self): diff --git a/core/dbt/contracts/graph/searcher.py b/core/dbt/contracts/graph/searcher.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/core/dbt/contracts/graph/unparsed.py b/core/dbt/contracts/graph/unparsed.py index 0997004168d..e354a2a3454 100644 --- a/core/dbt/contracts/graph/unparsed.py +++ b/core/dbt/contracts/graph/unparsed.py @@ -1,5 +1,7 @@ from dbt.node_types import NodeType from dbt.contracts.util import Replaceable, Mergeable +# trigger the PathEncoder +import dbt.helper_types # noqa:F401 from dbt.exceptions import CompilationException from hologram import JsonSchemaMixin @@ -7,6 +9,7 @@ from dataclasses import dataclass, field from datetime import timedelta +from pathlib import Path from typing import Optional, List, Union, Dict, Any, Sequence @@ -243,11 +246,15 @@ class UnparsedSourceTableDefinition(HasColumnTests, HasTests): freshness: Optional[FreshnessThreshold] = field( default_factory=FreshnessThreshold ) - external: Optional[ExternalTable] = field( - default_factory=ExternalTable - ) + external: Optional[ExternalTable] = None tags: List[str] = field(default_factory=list) + def to_dict(self, omit_none=True, validate=False): + result = super().to_dict(omit_none=omit_none, validate=validate) + if omit_none and self.freshness is None: + result['freshness'] = None + return result + @dataclass class UnparsedSourceDefinition(JsonSchemaMixin, Replaceable): @@ -269,6 +276,87 @@ class UnparsedSourceDefinition(JsonSchemaMixin, Replaceable): def yaml_key(self) -> 'str': return 'sources' + def to_dict(self, omit_none=True, validate=False): + result = super().to_dict(omit_none=omit_none, validate=validate) + if omit_none and self.freshness is None: + result['freshness'] = None + return result + + +@dataclass +class SourceTablePatch(JsonSchemaMixin): + name: str + description: Optional[str] = None + meta: Optional[Dict[str, Any]] = None + data_type: Optional[str] = None + docs: Optional[Docs] = None + loaded_at_field: Optional[str] = None + identifier: Optional[str] = None + quoting: Quoting = field(default_factory=Quoting) + freshness: Optional[FreshnessThreshold] = field( + default_factory=FreshnessThreshold + ) + external: Optional[ExternalTable] = None + tags: Optional[List[str]] = None + tests: Optional[List[TestDef]] = None + columns: Optional[Sequence[UnparsedColumn]] = None + + def to_patch_dict(self) -> Dict[str, Any]: + dct = self.to_dict(omit_none=True) + remove_keys = ('name') + for key in remove_keys: + if key in dct: + del dct[key] + + if self.freshness is None: + dct['freshness'] = None + + return dct + + +@dataclass +class SourcePatch(JsonSchemaMixin, Replaceable): + name: str = field( + metadata=dict(description='The name of the source to override'), + ) + overrides: str = field( + metadata=dict(description='The package of the source to override'), + ) + path: Path = field( + metadata=dict(description='The path to the patch-defining yml file'), + ) + description: Optional[str] = None + meta: Optional[Dict[str, Any]] = None + database: Optional[str] = None + schema: Optional[str] = None + loader: Optional[str] = None + quoting: Optional[Quoting] = None + freshness: Optional[Optional[FreshnessThreshold]] = field( + default_factory=FreshnessThreshold + ) + loaded_at_field: Optional[str] = None + tables: Optional[List[SourceTablePatch]] = None + tags: Optional[List[str]] = None + + def to_patch_dict(self) -> Dict[str, Any]: + dct = self.to_dict(omit_none=True) + remove_keys = ('name', 'overrides', 'tables', 'path') + for key in remove_keys: + if key in dct: + del dct[key] + + if self.freshness is None: + dct['freshness'] = None + + return dct + + def get_table_named(self, name: str) -> Optional[SourceTablePatch]: + if self.tables is not None: + for table in self.tables: + if table.name == name: + return table + return None + @dataclass class UnparsedDocumentation(JsonSchemaMixin, Replaceable): diff --git a/core/dbt/contracts/project.py b/core/dbt/contracts/project.py index 691a4a5ab81..d028a11fbf1 100644 --- a/core/dbt/contracts/project.py +++ b/core/dbt/contracts/project.py @@ -138,7 +138,7 @@ class RegistryPackageMetadata( @dataclass -class Project(HyphenatedJsonSchemaMixin, Replaceable): +class ProjectV1(HyphenatedJsonSchemaMixin, Replaceable): name: Name version: Union[SemverString, float] project_root: Optional[str] = None @@ -163,9 +163,10 @@ class Project(HyphenatedJsonSchemaMixin, Replaceable): snapshots: Dict[str, Any] = field(default_factory=dict) packages: List[PackageSpec] = field(default_factory=list) query_comment: Optional[Union[QueryComment, NoValue, str]] = NoValue() + config_version: int = 1 @classmethod - def from_dict(cls, data, validate=True): + def from_dict(cls, data, validate=True) -> 'ProjectV1': result = super().from_dict(data, validate=validate) if result.name in BANNED_PROJECT_NAMES: raise ValidationError( @@ -175,6 +176,68 @@ def from_dict(cls, data, validate=True): return result +@dataclass +class ProjectV2(HyphenatedJsonSchemaMixin, Replaceable): + name: Name + version: Union[SemverString, float] + config_version: int + project_root: Optional[str] = None + source_paths: Optional[List[str]] = None + macro_paths: Optional[List[str]] = None + data_paths: Optional[List[str]] = None + test_paths: Optional[List[str]] = None + analysis_paths: Optional[List[str]] = None + docs_paths: Optional[List[str]] = None + target_path: Optional[str] = None + snapshot_paths: Optional[List[str]] = None + clean_targets: Optional[List[str]] = None + profile: Optional[str] = None + log_path: Optional[str] = None + modules_path: Optional[str] = None + quoting: Optional[Quoting] = None + on_run_start: Optional[List[str]] = field(default_factory=list_str) + on_run_end: Optional[List[str]] = field(default_factory=list_str) + require_dbt_version: Optional[Union[List[str], str]] = None + models: Dict[str, Any] = field(default_factory=dict) + seeds: Dict[str, Any] = field(default_factory=dict) + snapshots: Dict[str, Any] = field(default_factory=dict) + analyses: Dict[str, Any] = field(default_factory=dict) + sources: Dict[str, Any] = field(default_factory=dict) + vars: Optional[Dict[str, Any]] = field( + default=None, + metadata=dict( + description='map project names to their vars override dicts', + ), + ) + packages: List[PackageSpec] = field(default_factory=list) + query_comment: Optional[Union[QueryComment, NoValue, str]] = NoValue() + + @classmethod + def from_dict(cls, data, validate=True) -> 'ProjectV2': + result = super().from_dict(data, validate=validate) + if result.name in BANNED_PROJECT_NAMES: + raise ValidationError( + f'Invalid project name: {result.name} is a reserved word' + ) + + return result + + +def parse_project_config( + data: Dict[str, Any], validate=True +) -> Union[ProjectV1, ProjectV2]: + config_version = data.get('config-version', 1) + if config_version == 1: + return ProjectV1.from_dict(data, validate=validate) + elif config_version == 2: + return ProjectV2.from_dict(data, validate=validate) + else: + raise ValidationError( + f'Got an unexpected config-version={config_version}, expected ' + f'1 or 2' + ) + + @dataclass class UserConfig(ExtensibleJsonSchemaMixin, Replaceable, UserConfigContract): send_anonymous_usage_stats: bool = DEFAULT_SEND_ANONYMOUS_USAGE_STATS @@ -214,7 +277,7 @@ class ConfiguredQuoting(Quoting, Replaceable): @dataclass -class Configuration(Project, ProfileConfig): +class Configuration(ProjectV2, ProfileConfig): cli_vars: Dict[str, Any] = field( default_factory=dict, metadata={'preserve_underscore': True}, @@ -224,4 +287,4 @@ class Configuration(Project, ProfileConfig): @dataclass class ProjectList(JsonSchemaMixin): - projects: Dict[str, Project] + projects: Dict[str, Union[ProjectV2, ProjectV1]] diff --git a/core/dbt/contracts/results.py b/core/dbt/contracts/results.py index 8b3ee378d2a..c9f9b06c97b 100644 --- a/core/dbt/contracts/results.py +++ b/core/dbt/contracts/results.py @@ -294,6 +294,7 @@ def key(self) -> CatalogKey: @dataclass class CatalogResults(JsonSchemaMixin, Writable): nodes: Dict[str, CatalogTable] + sources: Dict[str, CatalogTable] generated_at: datetime errors: Optional[List[str]] _compile_results: Optional[Any] = None diff --git a/core/dbt/contracts/rpc.py b/core/dbt/contracts/rpc.py index 473f9619c97..aab03527009 100644 --- a/core/dbt/contracts/rpc.py +++ b/core/dbt/contracts/rpc.py @@ -520,6 +520,7 @@ def from_result( ) -> 'PollCatalogCompleteResult': return cls( nodes=base.nodes, + sources=base.sources, generated_at=base.generated_at, errors=base.errors, _compile_results=base._compile_results, diff --git a/core/dbt/contracts/util.py b/core/dbt/contracts/util.py index 12e6005b80a..e90694fedaf 100644 --- a/core/dbt/contracts/util.py +++ b/core/dbt/contracts/util.py @@ -5,7 +5,7 @@ def list_str() -> List[str]: - """Mypy gets upset about¸stuff like: + """Mypy gets upset about stuff like: from dataclasses import dataclass, field from typing import Optional, List diff --git a/core/dbt/deprecations.py b/core/dbt/deprecations.py index 20df3919de7..b253e7518f7 100644 --- a/core/dbt/deprecations.py +++ b/core/dbt/deprecations.py @@ -92,23 +92,19 @@ class ModelsKeyNonModelDeprecation(DBTDeprecation): ''' -class BigQueryPartitionByStringDeprecation(DBTDeprecation): - _name = 'bq-partition-by-string' - - _description = ''' - As of dbt v0.16.0, the `partition_by` config in BigQuery accepts a - dictionary containing `field` and `data_type`. - - - - Provided partition_by: {raw_partition_by} - +class DbtProjectYamlDeprecation(DBTDeprecation): + _name = 'dbt-project-yaml-v1' + _description = '''\ + dbt v0.17.0 introduces a new config format for the dbt_project.yml file. + Support for the existing version 1 format will be removed in a future + release of dbt. The following packages are currently configured with + config version 1:{project_names} - - dbt inferred: {inferred_partition_by} + For upgrading instructions, consult the documentation: - For more information, see: - https://docs.getdbt.com/docs/upgrading-to-0-16-0 + https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-0-17-0 ''' @@ -154,7 +150,7 @@ def warn(name, *args, **kwargs): NotADictionaryDeprecation(), ColumnQuotingDeprecation(), ModelsKeyNonModelDeprecation(), - BigQueryPartitionByStringDeprecation(), + DbtProjectYamlDeprecation(), ] deprecations: Dict[str, DBTDeprecation] = { diff --git a/core/dbt/deps/resolver.py b/core/dbt/deps/resolver.py index 8f571d35712..f810bab981c 100644 --- a/core/dbt/deps/resolver.py +++ b/core/dbt/deps/resolver.py @@ -4,7 +4,8 @@ from dbt.exceptions import raise_dependency_error, InternalException from dbt.context.target import generate_target_context -from dbt.config import Project, ConfigRenderer, RuntimeConfig +from dbt.config import Project, RuntimeConfig +from dbt.config.renderer import DbtProjectYamlRenderer from dbt.deps.base import BasePackage, PinnedPackage, UnpinnedPackage from dbt.deps.local import LocalUnpinnedPackage from dbt.deps.git import GitUnpinnedPackage @@ -97,7 +98,9 @@ def __iter__(self) -> Iterator[UnpinnedPackage]: def _check_for_duplicate_project_names( - final_deps: List[PinnedPackage], config: Project, renderer: ConfigRenderer + final_deps: List[PinnedPackage], + config: Project, + renderer: DbtProjectYamlRenderer, ): seen: Set[str] = set() for package in final_deps: @@ -123,7 +126,8 @@ def resolve_packages( pending = PackageListing.from_contracts(packages) final = PackageListing() - renderer = ConfigRenderer(generate_target_context(config, config.cli_vars)) + ctx = generate_target_context(config, config.cli_vars) + renderer = DbtProjectYamlRenderer(ctx, config.config_version) while pending: next_pending = PackageListing() diff --git a/core/dbt/exceptions.py b/core/dbt/exceptions.py index 6b4f2f5ed72..3505373bbf0 100644 --- a/core/dbt/exceptions.py +++ b/core/dbt/exceptions.py @@ -1,6 +1,6 @@ import builtins import functools -from typing import NoReturn, Optional +from typing import NoReturn, Optional, Mapping, Any from dbt.logger import GLOBAL_LOGGER as logger from dbt.node_types import NodeType @@ -281,10 +281,20 @@ class DbtConfigError(RuntimeException): CODE = 10007 MESSAGE = "DBT Configuration Error" - def __init__(self, message, project=None, result_type='invalid_project'): + def __init__( + self, message, project=None, result_type='invalid_project', path=None + ): self.project = project super().__init__(message) self.result_type = result_type + self.path = path + + def __str__(self, prefix='! ') -> str: + msg = super().__str__(prefix) + if self.path is None: + return msg + else: + return f'{msg}\n\nError encountered in {self.path}' class FailFastException(RuntimeException): @@ -462,8 +472,10 @@ def doc_target_not_found( raise_compiler_error(msg, model) -def _get_target_failure_msg(model, target_model_name, target_model_package, - include_path, reason): +def _get_target_failure_msg( + model, target_name: str, target_model_package: Optional[str], + include_path: bool, reason: str, target_kind: str +) -> str: target_package_string = '' if target_model_package is not None: target_package_string = "in package '{}' ".format(target_model_package) @@ -472,52 +484,73 @@ def _get_target_failure_msg(model, target_model_name, target_model_package, if include_path: source_path_string = ' ({})'.format(model.original_file_path) - return "{} '{}'{} depends on a node named '{}' {}which {}".format( + return "{} '{}'{} depends on a {} named '{}' {}which {}".format( model.resource_type.title(), model.unique_id, source_path_string, - target_model_name, + target_kind, + target_name, target_package_string, reason ) -def get_target_disabled_msg(model, target_model_name, target_model_package): - return _get_target_failure_msg(model, target_model_name, - target_model_package, include_path=True, - reason='is disabled') - - -def get_target_not_found_msg(model, target_model_name, target_model_package): - return _get_target_failure_msg(model, target_model_name, - target_model_package, include_path=True, - reason='was not found') - - -def get_target_not_found_or_disabled_msg(model, target_model_name, - target_model_package): - return _get_target_failure_msg(model, target_model_name, - target_model_package, include_path=False, - reason='was not found or is disabled') +def get_target_not_found_or_disabled_msg( + model, target_model_name: str, target_model_package: Optional[str], + disabled: Optional[bool] = None, +) -> str: + if disabled is None: + reason = 'was not found or is disabled' + elif disabled is True: + reason = 'is disabled' + else: + reason = 'was not found' + return _get_target_failure_msg( + model, target_model_name, target_model_package, include_path=True, + reason=reason, target_kind='node' + ) -def ref_target_not_found(model, target_model_name, target_model_package): - msg = get_target_not_found_or_disabled_msg(model, target_model_name, - target_model_package) +def ref_target_not_found( + model, + target_model_name: str, + target_model_package: Optional[str], + disabled: Optional[bool] = None, +) -> NoReturn: + msg = get_target_not_found_or_disabled_msg( + model, target_model_name, target_model_package, disabled + ) raise_compiler_error(msg, model) -def source_disabled_message(model, target_name, target_table_name): - return ("{} '{}' ({}) depends on source '{}.{}' which was not found" - .format(model.resource_type.title(), - model.unique_id, - model.original_file_path, - target_name, - target_table_name)) +def get_source_not_found_or_disabled_msg( + model, + target_name: str, + target_table_name: str, + disabled: Optional[bool] = None, +) -> str: + full_name = f'{target_name}.{target_table_name}' + if disabled is None: + reason = 'was not found or is disabled' + elif disabled is True: + reason = 'is disabled' + else: + reason = 'was not found' + return _get_target_failure_msg( + model, full_name, None, include_path=True, + reason=reason, target_kind='source' + ) -def source_target_not_found(model, target_name, target_table_name) -> NoReturn: - msg = source_disabled_message(model, target_name, target_table_name) +def source_target_not_found( + model, + target_name: str, + target_table_name: str, + disabled: Optional[bool] = None +) -> NoReturn: + msg = get_source_not_found_or_disabled_msg( + model, target_name, target_table_name, disabled + ) raise_compiler_error(msg, model) @@ -752,26 +785,62 @@ def raise_patch_targets_not_found(patches): ) +def _fix_dupe_msg(path_1: str, path_2: str, name: str, type_name: str) -> str: + if path_1 == path_2: + return ( + f'remove one of the {type_name} entries for {name} in this file:\n' + f' - {path_1!s}\n' + ) + else: + return ( + f'remove the {type_name} entry for {name} in one of these files:\n' + f' - {path_1!s}\n{path_2!s}' + ) + + def raise_duplicate_patch_name(patch_1, patch_2): name = patch_1.name + fix = _fix_dupe_msg( + patch_1.original_file_path, + patch_2.original_file_path, + name, + 'resource', + ) raise_compiler_error( f'dbt found two schema.yml entries for the same resource named ' f'{name}. Resources and their associated columns may only be ' - f'described a single time. To fix this, remove the resource entry ' - f'for {name} in one of these files:\n - ' - f'{patch_1.original_file_path}\n - {patch_2.original_file_path}' + f'described a single time. To fix this, {fix}' ) def raise_duplicate_macro_patch_name(patch_1, patch_2): package_name = patch_1.package_name name = patch_1.name + fix = _fix_dupe_msg( + patch_1.original_file_path, + patch_2.original_file_path, + name, + 'macros' + ) raise_compiler_error( f'dbt found two schema.yml entries for the same macro in package ' f'{package_name} named {name}. Macros may only be described a single ' - f'time. To fix this, remove the macros entry for {name} in one ' - f'of these files:' - f'\n - {patch_1.original_file_path}\n - {patch_2.original_file_path}' + f'time. To fix this, {fix}' + ) + + +def raise_duplicate_source_patch_name(patch_1, patch_2): + name = f'{patch_1.overrides}.{patch_1.name}' + fix = _fix_dupe_msg( + patch_1.path, + patch_2.path, + name, + 'sources', + ) + raise_compiler_error( + f'dbt found two schema.yml entries for the same source named ' + f'{patch_1.name} in package {patch_1.overrides}. Sources may only be ' + f'overridden a single time. To fix this, {fix}' ) @@ -791,12 +860,45 @@ def raise_unrecognized_credentials_type(typename, supported_types): ) +def raise_invalid_patch( + node, patch_section: str, patch_path: str, +) -> NoReturn: + from dbt.ui.printer import line_wrap_message + msg = line_wrap_message( + f'''\ + '{node.name}' is a {node.resource_type} node, but it is + specified in the {patch_section} section of + {patch_path}. + + + + To fix this error, place the `{node.name}` + specification under the {node.resource_type.pluralize()} key instead. + ''' + ) + raise_compiler_error(msg, node) + + def raise_not_implemented(msg): raise NotImplementedException( "ERROR: {}" .format(msg)) +def raise_duplicate_alias( + kwargs: Mapping[str, Any], aliases: Mapping[str, str], canonical_key: str +) -> NoReturn: + # dupe found: go through the dict so we can have a nice-ish error + key_names = ', '.join( + "{}".format(k) for k in kwargs if + aliases.get(k) == canonical_key + ) + + raise AliasException( + f'Got duplicate keys: ({key_names}) all map to "{canonical_key}"' + ) + + def warn_or_error(msg, node=None, log_fmt=None): if dbt.flags.WARN_ERROR: raise_compiler_error(msg, node) diff --git a/core/dbt/graph/selector.py b/core/dbt/graph/selector.py index e0a3fcb5fe2..4c8e21d4183 100644 --- a/core/dbt/graph/selector.py +++ b/core/dbt/graph/selector.py @@ -166,16 +166,16 @@ def _node_iterator( yield unique_id, node def parsed_nodes(self, included_nodes): - return self._node_iterator( - included_nodes, - exclude=(NodeType.Source,), - include=None) + for unique_id, node in self.manifest.nodes.items(): + if unique_id not in included_nodes: + continue + yield unique_id, node def source_nodes(self, included_nodes): - return self._node_iterator( - included_nodes, - exclude=None, - include=(NodeType.Source,)) + for unique_id, source in self.manifest.sources.items(): + if unique_id not in included_nodes: + continue + yield unique_id, source def search(self, included_nodes, selector): raise NotImplementedError('subclasses should implement this') @@ -396,13 +396,20 @@ def select_nodes(self, graph, raw_include_specs, raw_exclude_specs): return selected_nodes def _is_graph_member(self, node_name): - node = self.manifest.nodes[node_name] - if node.resource_type == NodeType.Source: + if node_name in self.manifest.sources: return True + node = self.manifest.nodes[node_name] return not node.empty and node.config.enabled def _is_match(self, node_name, resource_types, tags, required): - node = self.manifest.nodes[node_name] + if node_name in self.manifest.nodes: + node = self.manifest.nodes[node_name] + elif node_name in self.manifest.sources: + node = self.manifest.sources[node_name] + else: + raise dbt.exceptions.InternalException( + f'Node {node_name} not found in the manifest!' + ) if node.resource_type not in resource_types: return False tags = set(tags) diff --git a/core/dbt/helper_types.py b/core/dbt/helper_types.py index ea415be9f55..ca69e019864 100644 --- a/core/dbt/helper_types.py +++ b/core/dbt/helper_types.py @@ -1,7 +1,8 @@ # never name this package "types", or mypy will crash in ugly ways from dataclasses import dataclass from datetime import timedelta -from typing import NewType +from pathlib import Path +from typing import NewType, Tuple, AbstractSet from hologram import ( FieldEncoder, JsonSchemaMixin, JsonDict, ValidationError @@ -38,6 +39,25 @@ def json_schema(self) -> JsonDict: return {'type': 'number'} +class PathEncoder(FieldEncoder): + def to_wire(self, value: Path) -> str: + return str(value) + + def to_python(self, value) -> Path: + if isinstance(value, Path): + return value + try: + return Path(value) + except TypeError: + raise ValidationError( + 'cannot encode {} into timedelta'.format(value) + ) from None + + @property + def json_schema(self) -> JsonDict: + return {'type': 'string'} + + class NVEnum(StrEnum): novalue = 'novalue' @@ -54,4 +74,9 @@ class NoValue(JsonSchemaMixin): JsonSchemaMixin.register_field_encoders({ Port: PortEncoder(), timedelta: TimeDeltaFieldEncoder(), + Path: PathEncoder(), }) + + +FQNPath = Tuple[str, ...] +PathSet = AbstractSet[FQNPath] diff --git a/core/dbt/hooks.py b/core/dbt/hooks.py index a5d47f01f67..26403226e3b 100644 --- a/core/dbt/hooks.py +++ b/core/dbt/hooks.py @@ -1,8 +1,6 @@ from hologram.helpers import StrEnum import json -from dbt.contracts.graph.parsed import Hook - from typing import Union, Dict, Any @@ -21,9 +19,3 @@ def get_hook_dict(source: Union[str, Dict[str, Any]]) -> Dict[str, Any]: return json.loads(source) except ValueError: return {'sql': source} - - -def get_hook(source, index): - hook_dict = get_hook_dict(source) - hook_dict.setdefault('index', index) - return Hook.from_dict(hook_dict) diff --git a/core/dbt/include/global_project/dbt_project.yml b/core/dbt/include/global_project/dbt_project.yml index 87bfe0dd8eb..dec6d7d452f 100644 --- a/core/dbt/include/global_project/dbt_project.yml +++ b/core/dbt/include/global_project/dbt_project.yml @@ -1,4 +1,4 @@ - +config-version: 2 name: dbt version: 1.0 diff --git a/core/dbt/include/global_project/macros/materializations/common/merge.sql b/core/dbt/include/global_project/macros/materializations/common/merge.sql index dcbcc1a356d..778fdf8ac72 100644 --- a/core/dbt/include/global_project/macros/materializations/common/merge.sql +++ b/core/dbt/include/global_project/macros/materializations/common/merge.sql @@ -10,14 +10,15 @@ {%- endmacro %} -{% macro get_insert_overwrite_merge_sql(target, source, dest_columns, predicates) -%} - {{ adapter_macro('get_insert_overwrite_merge_sql', target, source, dest_columns, predicates) }} +{% macro get_insert_overwrite_merge_sql(target, source, dest_columns, predicates, include_sql_header=false) -%} + {{ adapter_macro('get_insert_overwrite_merge_sql', target, source, dest_columns, predicates, include_sql_header) }} {%- endmacro %} {% macro default__get_merge_sql(target, source, unique_key, dest_columns, predicates) -%} {%- set predicates = [] if predicates is none else [] + predicates -%} {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} + {%- set sql_header = config.get('sql_header', none) -%} {% if unique_key %} {% set unique_key_match %} @@ -28,6 +29,8 @@ {% do predicates.append('FALSE') %} {% endif %} + {{ sql_header if sql_header is not none }} + merge into {{ target }} as DBT_INTERNAL_DEST using {{ source }} as DBT_INTERNAL_SOURCE on {{ predicates | join(' and ') }} @@ -84,9 +87,12 @@ {% endmacro %} -{% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates) -%} +{% macro default__get_insert_overwrite_merge_sql(target, source, dest_columns, predicates, include_sql_header) -%} {%- set predicates = [] if predicates is none else [] + predicates -%} {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} + {%- set sql_header = config.get('sql_header', none) -%} + + {{ sql_header if sql_header is not none and include_sql_header }} merge into {{ target }} as DBT_INTERNAL_DEST using {{ source }} as DBT_INTERNAL_SOURCE diff --git a/core/dbt/legacy_config_updater.py b/core/dbt/legacy_config_updater.py new file mode 100644 index 00000000000..0e1dd2433df --- /dev/null +++ b/core/dbt/legacy_config_updater.py @@ -0,0 +1,213 @@ +# TODO: rename this module. +from typing import Dict, Any, Mapping, List +from typing_extensions import Protocol + +import dbt.exceptions + +from dbt.utils import deep_merge, fqn_search +from dbt.node_types import NodeType +from dbt.adapters.factory import get_config_class_by_name + + +class HasConfigFields(Protocol): + seeds: Dict[str, Any] + snapshots: Dict[str, Any] + models: Dict[str, Any] + sources: Dict[str, Any] + + +class IsFQNResource(Protocol): + fqn: List[str] + resource_type: NodeType + package_name: str + + +def _listify(value) -> List: + if isinstance(value, tuple): + value = list(value) + elif not isinstance(value, list): + value = [value] + + return value + + +class ConfigUpdater: + AppendListFields = {'pre-hook', 'post-hook', 'tags'} + ExtendDictFields = {'vars', 'column_types', 'quoting', 'persist_docs'} + DefaultClobberFields = { + 'enabled', + 'materialized', + + # these 2 are additional - not defined in the NodeConfig object + 'sql_header', + 'incremental_strategy', + + # these 3 are "special" - not defined in NodeConfig, instead set by + # update_parsed_node_name in parsing + 'alias', + 'schema', + 'database', + + # tests + 'severity', + + # snapshots + 'unique_key', + 'target_database', + 'target_schema', + 'strategy', + 'updated_at', + # this is often a list, but it should replace and not append (sometimes + # it's 'all') + 'check_cols', + # seeds + 'quote_columns', + } + + @property + def ClobberFields(self): + return self.DefaultClobberFields | self.AdapterSpecificConfigs + + @property + def ConfigKeys(self): + return ( + self.AppendListFields | self.ExtendDictFields | self.ClobberFields + ) + + def __init__(self, adapter_type: str): + config_class = get_config_class_by_name(adapter_type) + self.AdapterSpecificConfigs = { + target_name for _, target_name in + config_class._get_fields() + } + + def update_config_keys_into( + self, mutable_config: Dict[str, Any], new_configs: Mapping[str, Any] + ) -> Dict[str, Any]: + """Update mutable_config with the contents of new_configs, but only + include "expected" config values. + + Returns dict where the keys are what was updated and the update values + are what the updates were. + """ + + relevant_configs: Dict[str, Any] = { + key: new_configs[key] for key + in new_configs if key in self.ConfigKeys + } + + for key in self.AppendListFields: + append_fields = _listify(relevant_configs.get(key, [])) + mutable_config[key].extend([ + f for f in append_fields if f not in mutable_config[key] + ]) + + for key in self.ExtendDictFields: + dict_val = relevant_configs.get(key, {}) + try: + mutable_config[key].update(dict_val) + except (ValueError, TypeError, AttributeError): + dbt.exceptions.raise_compiler_error( + 'Invalid config field: "{}" must be a dict'.format(key) + ) + + for key in self.ClobberFields: + if key in relevant_configs: + mutable_config[key] = relevant_configs[key] + + return relevant_configs + + def update_into( + self, mutable_config: Dict[str, Any], new_config: Mapping[str, Any] + ) -> None: + """Update mutable_config with the contents of new_config.""" + for key, value in new_config.items(): + if key in self.AppendListFields: + current_list: List = _listify(mutable_config.get(key, [])) + current_list.extend(_listify(value)) + mutable_config[key] = current_list + elif key in self.ExtendDictFields: + current_dict: Dict = mutable_config.get(key, {}) + try: + current_dict.update(value) + except (ValueError, TypeError, AttributeError): + dbt.exceptions.raise_compiler_error( + 'Invalid config field: "{}" must be a dict'.format(key) + ) + mutable_config[key] = current_dict + else: # key in self.ClobberFields + mutable_config[key] = value + + def get_project_config( + self, model: IsFQNResource, project: HasConfigFields + ) -> Dict[str, Any]: + # most configs are overwritten by a more specific config, but pre/post + # hooks are appended! + config: Dict[str, Any] = {} + for k in self.AppendListFields: + config[k] = [] + for k in self.ExtendDictFields: + config[k] = {} + + if model.resource_type == NodeType.Seed: + model_configs = project.seeds + elif model.resource_type == NodeType.Snapshot: + model_configs = project.snapshots + elif model.resource_type == NodeType.Source: + model_configs = project.sources + else: + model_configs = project.models + + if model_configs is None: + return config + + # mutates config + self.update_config_keys_into(config, model_configs) + + for level_config in fqn_search(model_configs, model.fqn): + relevant_configs = self.update_config_keys_into( + config, level_config + ) + + # mutates config + relevant_configs = self.update_config_keys_into( + config, level_config + ) + + # TODO: does this do anything? Doesn't update_config_keys_into + # handle the clobber case? + clobber_configs = { + k: v for (k, v) in relevant_configs.items() + if k not in self.AppendListFields and + k not in self.ExtendDictFields + } + + config.update(clobber_configs) + + return config + + def get_project_vars( + self, project_vars: Dict[str, Any], + ): + config: Dict[str, Any] = {} + # this is pretty trivial, since the new project vars don't care about + # FQNs or resource types + self.update_config_keys_into(config, project_vars) + return config + + def merge(self, *configs: Dict[str, Any]) -> Dict[str, Any]: + merged_config: Dict[str, Any] = {} + for config in configs: + # Do not attempt to deep merge clobber fields + config = config.copy() + clobber = { + key: config.pop(key) for key in list(config.keys()) + if key in self.ClobberFields + } + intermediary_merged = deep_merge( + merged_config, config + ) + intermediary_merged.update(clobber) + + merged_config.update(intermediary_merged) + return merged_config diff --git a/core/dbt/logger.py b/core/dbt/logger.py index dd585b6ab5c..a209dbad31d 100644 --- a/core/dbt/logger.py +++ b/core/dbt/logger.py @@ -298,6 +298,7 @@ def mapping_keys(self): ('original_file_path', 'node_path'), ('name', 'node_name'), ('resource_type', 'resource_type'), + ('depends_on_nodes', 'depends_on'), ] def process_config(self, record): diff --git a/core/dbt/main.py b/core/dbt/main.py index 0da30df2695..f1ca3be0d2f 100644 --- a/core/dbt/main.py +++ b/core/dbt/main.py @@ -536,6 +536,11 @@ def _build_docs_serve_subparser(subparsers, base_subparser): Specify the port number for the docs server. ''' ) + serve_sub.add_argument( + '--no-browser', + dest='open_browser', + action='store_false', + ) serve_sub.set_defaults(cls=serve_task.ServeTask, which='serve', rpc_method=None) return serve_sub diff --git a/core/dbt/parser/base.py b/core/dbt/parser/base.py index 48bc294d17b..03ed3a8a1ab 100644 --- a/core/dbt/parser/base.py +++ b/core/dbt/parser/base.py @@ -15,6 +15,9 @@ from dbt.adapters.factory import get_adapter from dbt.clients.jinja import get_rendered from dbt.config import Project, RuntimeConfig +from dbt.context.context_config import ( + LegacyContextConfig, ContextConfig, ContextConfigType +) from dbt.contracts.graph.manifest import ( Manifest, SourceFile, FilePath, FileHash ) @@ -24,7 +27,6 @@ CompilationException, validator_error_message, InternalException ) from dbt.node_types import NodeType -from dbt.source_config import SourceConfig from dbt.parser.results import ParseResult, ManifestNodes from dbt.parser.search import FileBlock @@ -160,13 +162,17 @@ def default_schema(self): def default_database(self): return self.root_project.credentials.database + def get_fqn_prefix(self, path: str) -> List[str]: + no_ext = os.path.splitext(path)[0] + fqn = [self.project.project_name] + fqn.extend(dbt.utils.split_path(no_ext)[:-1]) + return fqn + def get_fqn(self, path: str, name: str) -> List[str]: """Get the FQN for the node. This impacts node selection and config application. """ - no_ext = os.path.splitext(path)[0] - fqn = [self.project.project_name] - fqn.extend(dbt.utils.split_path(no_ext)[:-1]) + fqn = self.get_fqn_prefix(path) fqn.append(name) return fqn @@ -201,7 +207,8 @@ def _create_parsetime_node( self, block: ConfiguredBlockType, path: str, - config: SourceConfig, + config: ContextConfigType, + fqn: List[str], name=None, **kwargs, ) -> IntermediateNode: @@ -215,7 +222,7 @@ def _create_parsetime_node( 'alias': name, 'schema': self.default_schema, 'database': self.default_database, - 'fqn': config.fqn, + 'fqn': fqn, 'name': name, 'root_path': self.project.project_root, 'resource_type': self.resource_type, @@ -242,16 +249,16 @@ def _create_parsetime_node( raise CompilationException(msg, node=node) def _context_for( - self, parsed_node: IntermediateNode, config: SourceConfig + self, parsed_node: IntermediateNode, config: ContextConfigType ) -> Dict[str, Any]: return generate_parser_model( parsed_node, self.root_project, self.macro_manifest, config ) def render_with_context( - self, parsed_node: IntermediateNode, config: SourceConfig + self, parsed_node: IntermediateNode, config: ContextConfigType ) -> None: - """Given the parsed node and a SourceConfig to use during parsing, + """Given the parsed node and a ContextConfigType to use during parsing, render the node's sql wtih macro capture enabled. Note: this mutates the config object when config() calls are rendered. @@ -283,13 +290,13 @@ def update_parsed_node_name( self._update_node_alias(parsed_node, config_dict) def update_parsed_node( - self, parsed_node: IntermediateNode, config: SourceConfig + self, parsed_node: IntermediateNode, config: ContextConfigType ) -> None: - """Given the SourceConfig used for parsing and the parsed node, + """Given the ContextConfigType used for parsing and the parsed node, generate and set the true values to use, overriding the temporary parse values set in _build_intermediate_parsed_node. """ - config_dict = config.config + config_dict = config.build_config_dict() # Set tags on node provided in config blocks model_tags = config_dict.get('tags', []) @@ -313,17 +320,41 @@ def update_parsed_node( for hook in hooks: get_rendered(hook.sql, context, parsed_node, capture_macros=True) - def initial_config(self, fqn: List[str]) -> SourceConfig: - return SourceConfig(self.root_project, self.project, fqn, - self.resource_type) + def initial_config(self, fqn: List[str]) -> ContextConfigType: + config_version = min( + [self.project.config_version, self.root_project.config_version] + ) + # it would be nice to assert that if the main config is v2, the + # dependencies are all v2. or vice-versa. + if config_version == 1: + return LegacyContextConfig( + self.root_project.as_v1(), + self.project.as_v1(), + fqn, + self.resource_type, + ) + elif config_version == 2: + return ContextConfig( + self.root_project, + fqn, + self.resource_type, + self.project.project_name, + ) + else: + raise InternalException( + f'Got an unexpected project version={config_version}, ' + f'expected 1 or 2' + ) - def config_dict(self, config: SourceConfig) -> Dict[str, Any]: - config_dict = config.config + def config_dict( + self, config: ContextConfigType, + ) -> Dict[str, Any]: + config_dict = config.build_config_dict(base=True) self._mangle_hooks(config_dict) return config_dict def render_update( - self, node: IntermediateNode, config: SourceConfig + self, node: IntermediateNode, config: ContextConfigType ) -> None: try: self.render_with_context(node, config) @@ -343,12 +374,13 @@ def parse_node(self, block: ConfiguredBlockType) -> FinalNode: compiled_path: str = self.get_compiled_path(block) fqn = self.get_fqn(compiled_path, block.name) - config: SourceConfig = self.initial_config(fqn) + config: ContextConfigType = self.initial_config(fqn) node = self._create_parsetime_node( block=block, path=compiled_path, - config=config + config=config, + fqn=fqn, ) self.render_update(node, config) result = self.transform(node) diff --git a/core/dbt/parser/hooks.py b/core/dbt/parser/hooks.py index b724c02dffe..b7cb39edd9f 100644 --- a/core/dbt/parser/hooks.py +++ b/core/dbt/parser/hooks.py @@ -1,11 +1,11 @@ from dataclasses import dataclass from typing import Iterable, Iterator, Union, List, Tuple +from dbt.context.context_config import ContextConfigType from dbt.contracts.graph.manifest import FilePath from dbt.contracts.graph.parsed import ParsedHookNode from dbt.exceptions import InternalException from dbt.node_types import NodeType, RunHookType -from dbt.source_config import SourceConfig from dbt.parser.base import SimpleParser from dbt.parser.search import FileBlock from dbt.utils import get_pseudo_hook_path @@ -89,13 +89,14 @@ def _create_parsetime_node( self, block: HookBlock, path: str, - config: SourceConfig, + config: ContextConfigType, + fqn: List[str], name=None, **kwargs, ) -> ParsedHookNode: return super()._create_parsetime_node( - block=block, path=path, config=config, + block=block, path=path, config=config, fqn=fqn, index=block.index, name=name, tags=[str(block.hook_type)] ) diff --git a/core/dbt/parser/manifest.py b/core/dbt/parser/manifest.py index 75f9084b207..45ec2fc7b16 100644 --- a/core/dbt/parser/manifest.py +++ b/core/dbt/parser/manifest.py @@ -1,25 +1,27 @@ -import itertools import os import pickle from datetime import datetime -from typing import Dict, Optional, Mapping, Callable, Any, List, Type, Union +from typing import ( + Dict, Optional, Mapping, Callable, Any, List, Type, Union, MutableMapping +) -from dbt.include.global_project import PACKAGES import dbt.exceptions import dbt.flags +from dbt import deprecations +from dbt.helper_types import PathSet +from dbt.include.global_project import PACKAGES from dbt.logger import GLOBAL_LOGGER as logger, DbtProcessState from dbt.node_types import NodeType from dbt.clients.jinja import get_rendered from dbt.clients.system import make_directory from dbt.config import Project, RuntimeConfig from dbt.context.docs import generate_runtime_docs -from dbt.contracts.graph.compiled import CompileResultNode, NonSourceNode +from dbt.contracts.graph.compiled import NonSourceNode from dbt.contracts.graph.manifest import Manifest, FilePath, FileHash, Disabled from dbt.contracts.graph.parsed import ( - ParsedSourceDefinition, ParsedNode, ParsedMacro, ColumnInfo + ParsedSourceDefinition, ParsedNode, ParsedMacro, ColumnInfo, ) -from dbt.exceptions import raise_compiler_error from dbt.parser.base import BaseParser, Parser from dbt.parser.analysis import AnalysisParser from dbt.parser.data_test import DataTestParser @@ -32,6 +34,7 @@ from dbt.parser.search import FileBlock from dbt.parser.seeds import SeedParser from dbt.parser.snapshots import SnapshotParser +from dbt.parser.sources import patch_sources from dbt.version import __version__ @@ -64,7 +67,7 @@ def make_parse_result( """Make a ParseResult from the project configuration and the profile.""" # if any of these change, we need to reject the parser vars_hash = FileHash.from_contents( - '\0'.join([ + '\x00'.join([ getattr(config.args, 'vars', '{}') or '{}', getattr(config.args, 'profile', '') or '', getattr(config.args, 'target', '') or '', @@ -303,14 +306,21 @@ def process_manifest(self, manifest: Manifest): process_docs(manifest, self.root_project) def create_manifest(self) -> Manifest: - nodes: Dict[str, CompileResultNode] = {} - nodes.update(self.results.nodes) - nodes.update(self.results.sources) + # before we do anything else, patch the sources. This mutates + # results.disabled, so it needs to come before the final 'disabled' + # list is created + sources = patch_sources(self.results, self.root_project) disabled = [] for value in self.results.disabled.values(): disabled.extend(value) + + nodes: MutableMapping[str, NonSourceNode] = { + k: v for k, v in self.results.nodes.items() + } + manifest = Manifest( nodes=nodes, + sources=sources, macros=self.results.macros, docs=self.results.docs, generated_at=datetime.utcnow(), @@ -331,7 +341,16 @@ def load_all( macro_hook: Callable[[Manifest], Any], ) -> Manifest: with PARSING_STATE: - projects = load_all_projects(root_config) + projects = root_config.load_dependencies() + v1_configs = [] + for project in projects.values(): + if project.config_version == 1: + v1_configs.append(f'\n\n - {project.project_name}') + if v1_configs: + deprecations.warn( + 'dbt-project-yaml-v1', + project_names=''.join(v1_configs) + ) loader = cls(root_config, projects, macro_hook) loader.load(internal_manifest=internal_manifest) loader.write_parse_results() @@ -348,13 +367,15 @@ def load_internal(cls, root_config: RuntimeConfig) -> Manifest: return loader.load_only_macros() -def _check_resource_uniqueness(manifest): - names_resources: Dict[str, CompileResultNode] = {} - alias_resources: Dict[str, CompileResultNode] = {} +def _check_resource_uniqueness(manifest: Manifest) -> None: + names_resources: Dict[str, NonSourceNode] = {} + alias_resources: Dict[str, NonSourceNode] = {} for resource, node in manifest.nodes.items(): if node.resource_type not in NodeType.refable(): continue + # appease mypy - sources aren't refable! + assert not isinstance(node, ParsedSourceDefinition) name = node.name alias = "{}.{}".format(node.schema, node.alias) @@ -375,13 +396,15 @@ def _check_resource_uniqueness(manifest): alias_resources[alias] = node -def _warn_for_unused_resource_config_paths(manifest, config): - resource_fqns = manifest.get_resource_fqns() - disabled_fqns = [n.fqn for n in manifest.disabled] +def _warn_for_unused_resource_config_paths( + manifest: Manifest, config: RuntimeConfig +) -> None: + resource_fqns: Mapping[str, PathSet] = manifest.get_resource_fqns() + disabled_fqns: PathSet = frozenset(tuple(n.fqn) for n in manifest.disabled) config.warn_for_unused_resource_config_paths(resource_fqns, disabled_fqns) -def _check_manifest(manifest, config): +def _check_manifest(manifest: Manifest, config: RuntimeConfig) -> None: _check_resource_uniqueness(manifest) _warn_for_unused_resource_config_paths(manifest, config) @@ -403,25 +426,6 @@ def _load_projects(config, paths): yield project.project_name, project -def _project_directories(config): - root = os.path.join(config.project_root, config.modules_path) - - dependencies = [] - if os.path.exists(root): - dependencies = os.listdir(root) - - for name in dependencies: - full_obj = os.path.join(root, name) - - if not os.path.isdir(full_obj) or name.startswith('__'): - # exclude non-dirs and dirs that start with __ - # the latter could be something like __pycache__ - # for the global dbt modules dir - continue - - yield full_obj - - def _get_node_column(node, column_name): """Given a ParsedNode, add some fields that might be missing. Return a reference to the dict that refers to the given column, creating it if @@ -484,12 +488,15 @@ def process_docs(manifest: Manifest, config: RuntimeConfig): manifest, config.project_name, ) - if node.resource_type == NodeType.Source: - assert isinstance(node, ParsedSourceDefinition) # appease mypy - _process_docs_for_source(ctx, node) - else: - assert not isinstance(node, ParsedSourceDefinition) - _process_docs_for_node(ctx, node) + _process_docs_for_node(ctx, node) + for source in manifest.sources.values(): + ctx = generate_runtime_docs( + config, + source, + manifest, + config.project_name, + ) + _process_docs_for_source(ctx, source) for macro in manifest.macros.values(): ctx = generate_runtime_docs( config, @@ -547,9 +554,6 @@ def _process_refs_for_node( def process_refs(manifest: Manifest, current_project: str): for node in manifest.nodes.values(): - if node.resource_type == NodeType.Source: - continue - assert not isinstance(node, ParsedSourceDefinition) _process_refs_for_node(manifest, current_project, node) return manifest @@ -557,7 +561,7 @@ def process_refs(manifest: Manifest, current_project: str): def _process_sources_for_node( manifest: Manifest, current_project: str, node: NonSourceNode ): - target_source = None + target_source: Optional[Union[Disabled, ParsedSourceDefinition]] = None for source_name, table_name in node.sources: target_source = manifest.resolve_source( source_name, @@ -566,13 +570,15 @@ def _process_sources_for_node( node.package_name, ) - if target_source is None: + if target_source is None or isinstance(target_source, Disabled): # this folows the same pattern as refs node.config.enabled = False dbt.utils.invalid_source_fail_unless_test( node, source_name, - table_name) + table_name, + disabled=(isinstance(target_source, Disabled)) + ) continue target_source_id = target_source.unique_id node.depends_on.nodes.append(target_source_id) @@ -612,24 +618,6 @@ def process_node( _process_docs_for_node(ctx, node) -def load_all_projects(config: RuntimeConfig) -> Mapping[str, Project]: - all_projects = {config.project_name: config} - project_paths = itertools.chain( - internal_project_names(), - _project_directories(config) - ) - for project_name, project in _load_projects(config, project_paths): - if project_name in all_projects: - raise_compiler_error( - f'dbt found more than one package with the name ' - f'"{project_name}" included in this project. Package names ' - f'must be unique in a project. Please rename one of these ' - f'packages.' - ) - all_projects[project_name] = project - return all_projects - - def load_internal_projects(config): return dict(_load_projects(config, internal_project_names())) diff --git a/core/dbt/parser/results.py b/core/dbt/parser/results.py index 7aeab74aa9d..aa7ffca06aa 100644 --- a/core/dbt/parser/results.py +++ b/core/dbt/parser/results.py @@ -4,8 +4,9 @@ from hologram import JsonSchemaMixin from dbt.contracts.graph.manifest import ( - SourceFile, RemoteFile, FileHash, MacroKey + SourceFile, RemoteFile, FileHash, MacroKey, SourceKey ) +from dbt.contracts.graph.compiled import CompileResultNode from dbt.contracts.graph.parsed import ( HasUniqueID, ParsedAnalysisNode, @@ -15,19 +16,19 @@ ParsedMacro, ParsedMacroPatch, ParsedModelNode, - ParsedNode, ParsedNodePatch, ParsedRPCNode, ParsedSeedNode, ParsedSchemaTestNode, ParsedSnapshotNode, - ParsedSourceDefinition, + UnpatchedSourceDefinition, ) +from dbt.contracts.graph.unparsed import SourcePatch from dbt.contracts.util import Writable, Replaceable from dbt.exceptions import ( raise_duplicate_resource_name, raise_duplicate_patch_name, raise_duplicate_macro_patch_name, CompilationException, InternalException, - raise_compiler_error, + raise_compiler_error, raise_duplicate_source_patch_name ) from dbt.node_types import NodeType from dbt.ui import printer @@ -67,13 +68,14 @@ class ParseResult(JsonSchemaMixin, Writable, Replaceable): profile_hash: FileHash project_hashes: MutableMapping[str, FileHash] nodes: MutableMapping[str, ManifestNodes] = dict_field() - sources: MutableMapping[str, ParsedSourceDefinition] = dict_field() + sources: MutableMapping[str, UnpatchedSourceDefinition] = dict_field() docs: MutableMapping[str, ParsedDocumentation] = dict_field() macros: MutableMapping[str, ParsedMacro] = dict_field() macro_patches: MutableMapping[MacroKey, ParsedMacroPatch] = dict_field() patches: MutableMapping[str, ParsedNodePatch] = dict_field() + source_patches: MutableMapping[SourceKey, SourcePatch] = dict_field() files: MutableMapping[str, SourceFile] = dict_field() - disabled: MutableMapping[str, List[ParsedNode]] = dict_field() + disabled: MutableMapping[str, List[CompileResultNode]] = dict_field() dbt_version: str = __version__ def get_file(self, source_file: SourceFile) -> SourceFile: @@ -85,24 +87,30 @@ def get_file(self, source_file: SourceFile) -> SourceFile: return self.files[key] def add_source( - self, source_file: SourceFile, node: ParsedSourceDefinition + self, source_file: SourceFile, source: UnpatchedSourceDefinition ): - # nodes can't be overwritten! - _check_duplicates(node, self.sources) - self.sources[node.unique_id] = node - self.get_file(source_file).sources.append(node.unique_id) + # sources can't be overwritten! + _check_duplicates(source, self.sources) + self.sources[source.unique_id] = source + self.get_file(source_file).sources.append(source.unique_id) - def add_node(self, source_file: SourceFile, node: ManifestNodes): + def add_node_nofile(self, node: ManifestNodes): # nodes can't be overwritten! _check_duplicates(node, self.nodes) self.nodes[node.unique_id] = node + + def add_node(self, source_file: SourceFile, node: ManifestNodes): + self.add_node_nofile(node) self.get_file(source_file).nodes.append(node.unique_id) - def add_disabled(self, source_file: SourceFile, node: ParsedNode): + def add_disabled_nofile(self, node: CompileResultNode): if node.unique_id in self.disabled: self.disabled[node.unique_id].append(node) else: self.disabled[node.unique_id] = [node] + + def add_disabled(self, source_file: SourceFile, node: CompileResultNode): + self.add_disabled_nofile(node) self.get_file(source_file).nodes.append(node.unique_id) def add_macro(self, source_file: SourceFile, macro: ParsedMacro): @@ -140,7 +148,7 @@ def add_doc(self, source_file: SourceFile, doc: ParsedDocumentation): def add_patch( self, source_file: SourceFile, patch: ParsedNodePatch ) -> None: - # matches can't be overwritten + # patches can't be overwritten if patch.name in self.patches: raise_duplicate_patch_name(patch, self.patches[patch.name]) self.patches[patch.name] = patch @@ -156,11 +164,21 @@ def add_macro_patch( self.macro_patches[key] = patch self.get_file(source_file).macro_patches.append(key) + def add_source_patch( + self, source_file: SourceFile, patch: SourcePatch + ) -> None: + # source patches must be unique + key = (patch.overrides, patch.name) + if key in self.source_patches: + raise_duplicate_source_patch_name(patch, self.source_patches[key]) + self.source_patches[key] = patch + self.get_file(source_file).source_patches.append(key) + def _get_disabled( self, unique_id: str, match_file: SourceFile, - ) -> List[ParsedNode]: + ) -> List[CompileResultNode]: if unique_id not in self.disabled: raise InternalException( 'called _get_disabled with id={}, but it does not exist' diff --git a/core/dbt/parser/schema_test_builders.py b/core/dbt/parser/schema_test_builders.py index 654b5db2d2d..24f649a3eb7 100644 --- a/core/dbt/parser/schema_test_builders.py +++ b/core/dbt/parser/schema_test_builders.py @@ -3,14 +3,13 @@ from copy import deepcopy from dataclasses import dataclass from typing import ( - Generic, TypeVar, Dict, Any, Tuple, Optional, List, Sequence + Generic, TypeVar, Dict, Any, Tuple, Optional, List, ) from dbt.clients.jinja import get_rendered, SCHEMA_TEST_KWARGS_NAME +from dbt.contracts.graph.parsed import UnpatchedSourceDefinition from dbt.contracts.graph.unparsed import ( - UnparsedNodeUpdate, UnparsedSourceDefinition, - UnparsedSourceTableDefinition, UnparsedColumn, UnparsedMacroUpdate, - UnparsedAnalysisUpdate, TestDef + UnparsedNodeUpdate, UnparsedMacroUpdate, UnparsedAnalysisUpdate, TestDef, ) from dbt.exceptions import raise_compiler_error from dbt.parser.search import FileBlock @@ -62,54 +61,23 @@ def from_file_block(cls, src: FileBlock, data: Dict[str, Any]): ) -@dataclass -class SourceTarget: - source: UnparsedSourceDefinition - table: UnparsedSourceTableDefinition - - @property - def name(self) -> str: - return '{0.name}_{1.name}'.format(self.source, self.table) - - @property - def quote_columns(self) -> Optional[bool]: - result = None - if self.source.quoting.column is not None: - result = self.source.quoting.column - if self.table.quoting.column is not None: - result = self.table.quoting.column - return result - - @property - def columns(self) -> Sequence[UnparsedColumn]: - if self.table.columns is None: - return [] - else: - return self.table.columns - - @property - def tests(self) -> List[TestDef]: - if self.table.tests is None: - return [] - else: - return self.table.tests - - -Testable = TypeVar('Testable', SourceTarget, UnparsedNodeUpdate) +Testable = TypeVar( + 'Testable', UnparsedNodeUpdate, UnpatchedSourceDefinition +) ColumnTarget = TypeVar( 'ColumnTarget', - SourceTarget, UnparsedNodeUpdate, UnparsedAnalysisUpdate, + UnpatchedSourceDefinition, ) Target = TypeVar( 'Target', - SourceTarget, UnparsedNodeUpdate, UnparsedMacroUpdate, UnparsedAnalysisUpdate, + UnpatchedSourceDefinition, ) @@ -321,7 +289,7 @@ def macro_name(self) -> str: def get_test_name(self) -> Tuple[str, str]: if isinstance(self.target, UnparsedNodeUpdate): name = self.name - elif isinstance(self.target, SourceTarget): + elif isinstance(self.target, UnpatchedSourceDefinition): name = 'source_' + self.name else: raise self._bad_type() @@ -342,7 +310,7 @@ def build_raw_sql(self) -> str: def build_model_str(self): if isinstance(self.target, UnparsedNodeUpdate): fmt = "{{{{ ref('{0.name}') }}}}" - elif isinstance(self.target, SourceTarget): + elif isinstance(self.target, UnpatchedSourceDefinition): fmt = "{{{{ source('{0.source.name}', '{0.table.name}') }}}}" else: raise self._bad_type() diff --git a/core/dbt/parser/schemas.py b/core/dbt/parser/schemas.py index 276e6d45780..42169f1167a 100644 --- a/core/dbt/parser/schemas.py +++ b/core/dbt/parser/schemas.py @@ -1,45 +1,50 @@ import itertools import os -from abc import abstractmethod +from abc import ABCMeta, abstractmethod from typing import ( Iterable, Dict, Any, Union, List, Optional, Generic, TypeVar, Type ) -from hologram import ValidationError +from hologram import ValidationError, JsonSchemaMixin from dbt.adapters.factory import get_adapter from dbt.clients.jinja import get_rendered, add_rendered_test_kwargs from dbt.clients.yaml_helper import load_yaml_text -from dbt.config import RuntimeConfig, ConfigRenderer -from dbt.context.docs import generate_parser_docs +from dbt.config.renderer import SchemaYamlRenderer +from dbt.context.context_config import ( + ContextConfigType, + ContextConfigGenerator, +) +from dbt.context.configured import generate_schema_yml from dbt.context.target import generate_target_context from dbt.contracts.graph.manifest import SourceFile +from dbt.contracts.graph.model_config import SourceConfig from dbt.contracts.graph.parsed import ( ParsedNodePatch, ParsedSourceDefinition, ColumnInfo, ParsedSchemaTestNode, ParsedMacroPatch, + UnpatchedSourceDefinition, ) from dbt.contracts.graph.unparsed import ( UnparsedSourceDefinition, UnparsedNodeUpdate, UnparsedColumn, - UnparsedMacroUpdate, UnparsedAnalysisUpdate, - UnparsedSourceTableDefinition, FreshnessThreshold, + UnparsedMacroUpdate, UnparsedAnalysisUpdate, SourcePatch, + HasDocs, HasColumnDocs, HasColumnTests, FreshnessThreshold, ) from dbt.exceptions import ( validator_error_message, JSONValidationException, raise_invalid_schema_yml_version, ValidationException, - CompilationException, warn_or_error + CompilationException, warn_or_error, InternalException ) from dbt.node_types import NodeType from dbt.parser.base import SimpleParser from dbt.parser.search import FileBlock, FilesystemSearcher from dbt.parser.schema_test_builders import ( - TestBuilder, SourceTarget, Target, SchemaTestBlock, TargetBlock, YamlBlock, - TestBlock, + TestBuilder, SchemaTestBlock, TargetBlock, YamlBlock, + TestBlock, Testable ) -from dbt.source_config import SourceConfig from dbt.utils import ( get_pseudo_test_path, coerce_dict_str ) @@ -80,24 +85,34 @@ class ParserRef: def __init__(self): self.column_info: Dict[str, ColumnInfo] = {} - def add(self, column: UnparsedColumn, description, data_type, meta): + def add( + self, + column: Union[HasDocs, UnparsedColumn], + description: str, + data_type: Optional[str], + meta: Dict[str, Any], + ): + tags: List[str] = [] + tags.extend(getattr(column, 'tags', ())) self.column_info[column.name] = ColumnInfo( name=column.name, description=description, data_type=data_type, meta=meta, - tags=column.tags, + tags=tags, ) - -def column_info( - config: RuntimeConfig, - target: UnparsedSchemaYaml, - *descriptions: str, -) -> None: - context = generate_parser_docs(config, target) - for description in descriptions: - get_rendered(description, context) + @classmethod + def from_target( + cls, target: Union[HasColumnDocs, HasColumnTests] + ) -> 'ParserRef': + refs = cls() + for column in target.columns: + description = column.description + data_type = column.data_type + meta = column.meta + refs.add(column, description, data_type, meta) + return refs def _trimmed(inp: str) -> str: @@ -106,21 +121,38 @@ def _trimmed(inp: str) -> str: return inp[:44] + '...' + inp[-3:] +def merge_freshness( + base: Optional[FreshnessThreshold], update: Optional[FreshnessThreshold] +) -> Optional[FreshnessThreshold]: + if base is not None and update is not None: + return base.merged(update) + elif base is None and update is not None: + return update + else: + return None + + class SchemaParser(SimpleParser[SchemaTestBlock, ParsedSchemaTestNode]): - """ - The schema parser is really big because schemas are really complicated! - - There are basically three phases to the schema parser: - - read_yaml_{models,sources}: read in yaml as a dictionary, then - validate it against the basic structures required so we can start - parsing (NodeTarget, SourceTarget) - - these return potentially many Targets per yaml block, since each - source can have multiple tables - - parse_target_{model,source}: Read in the underlying target, parse and - return a list of all its tests (model and column tests), collect - any refs/descriptions, and return a parsed entity with the - appropriate information. - """ + def __init__( + self, results, project, root_project, macro_manifest, + ) -> None: + super().__init__(results, project, root_project, macro_manifest) + all_v_2 = ( + self.root_project.config_version == 2 and + self.project.config_version == 2 + ) + if all_v_2: + ctx = generate_schema_yml( + self.root_project, self.project.project_name + ) + else: + ctx = generate_target_context( + self.root_project, self.root_project.cli_vars + ) + + self.raw_renderer = SchemaYamlRenderer(ctx) + self.config_generator = ContextConfigGenerator(self.root_project) + @classmethod def get_compiled_path(cls, block: FileBlock) -> str: # should this raise an error? @@ -196,23 +228,141 @@ def parse_column_tests( for test in column.tests: self.parse_test(block, test, column) - def parse_node(self, block: SchemaTestBlock) -> ParsedSchemaTestNode: - """In schema parsing, we rewrite most of the part of parse_node that - builds the initial node to be parsed, but rendering is basically the - same - """ - render_ctx = generate_target_context( - self.root_project, self.root_project.cli_vars + def parse_source( + self, target: UnpatchedSourceDefinition + ) -> ParsedSourceDefinition: + source = target.source + table = target.table + refs = ParserRef.from_target(table) + unique_id = target.unique_id + description = table.description or '' + meta = table.meta or {} + source_description = source.description or '' + loaded_at_field = table.loaded_at_field or source.loaded_at_field + + freshness = merge_freshness(source.freshness, table.freshness) + quoting = source.quoting.merged(table.quoting) + # path = block.path.original_file_path + source_meta = source.meta or {} + + # make sure we don't do duplicate tags from source + table + tags = sorted(set(itertools.chain(source.tags, table.tags))) + + config = self.config_generator.calculate_node_config( + config_calls=[], + fqn=target.fqn, + resource_type=NodeType.Source, + project_name=self.project.project_name, + base=False, ) - builder = TestBuilder[Target]( - test=block.test, - target=block.target, - column_name=block.column_name, - package_name=self.project.project_name, - render_ctx=render_ctx, + if not isinstance(config, SourceConfig): + raise InternalException( + f'Calculated a {type(config)} for a source, but expected ' + f'a SourceConfig' + ) + + default_database = self.root_project.credentials.database + + return ParsedSourceDefinition( + package_name=target.package_name, + database=(source.database or default_database), + schema=(source.schema or source.name), + identifier=(table.identifier or table.name), + root_path=target.root_path, + path=target.path, + original_file_path=target.original_file_path, + columns=refs.column_info, + unique_id=unique_id, + name=table.name, + description=description, + external=table.external, + source_name=source.name, + source_description=source_description, + source_meta=source_meta, + meta=meta, + loader=source.loader, + loaded_at_field=loaded_at_field, + freshness=freshness, + quoting=quoting, + resource_type=NodeType.Source, + fqn=target.fqn, + tags=tags, + config=config, ) - original_name = os.path.basename(block.path.original_file_path) + def create_test_node( + self, + target: Union[UnpatchedSourceDefinition, UnparsedNodeUpdate], + path: str, + config: ContextConfigType, + tags: List[str], + fqn: List[str], + name: str, + raw_sql: str, + test_metadata: Dict[str, Any], + column_name: Optional[str], + ) -> ParsedSchemaTestNode: + + dct = { + 'alias': name, + 'schema': self.default_schema, + 'database': self.default_database, + 'fqn': fqn, + 'name': name, + 'root_path': self.project.project_root, + 'resource_type': self.resource_type, + 'tags': tags, + 'path': path, + 'original_file_path': target.original_file_path, + 'package_name': self.project.project_name, + 'raw_sql': raw_sql, + 'unique_id': self.generate_unique_id(name), + 'config': self.config_dict(config), + 'test_metadata': test_metadata, + 'column_name': column_name, + } + try: + return self.parse_from_dict(dct) + except ValidationError as exc: + msg = validator_error_message(exc) + # this is a bit silly, but build an UnparsedNode just for error + # message reasons + node = self._create_error_node( + name=target.name, + path=path, + original_file_path=target.original_file_path, + raw_sql=raw_sql, + ) + raise CompilationException(msg, node=node) from exc + + def _parse_generic_test( + self, + target: Testable, + test: Dict[str, Any], + tags: List[str], + column_name: Optional[str], + ) -> ParsedSchemaTestNode: + + render_ctx = generate_target_context( + self.root_project, self.root_project.cli_vars + ) + try: + builder = TestBuilder( + test=test, + target=target, + column_name=column_name, + package_name=target.package_name, + render_ctx=render_ctx, + ) + except CompilationException as exc: + context = _trimmed(str(target)) + msg = ( + 'Invalid test config given in {}:' + '\n\t{}\n\t@: {}' + .format(target.original_file_path, exc.msg, context) + ) + raise CompilationException(msg) from exc + original_name = os.path.basename(target.original_file_path) compiled_path = get_pseudo_test_path( builder.compiled_name, original_name, 'schema_test', ) @@ -230,32 +380,79 @@ def parse_node(self, block: SchemaTestBlock) -> ParsedSchemaTestNode: 'name': builder.name, 'kwargs': builder.args, } - - # copy - we don't want to mutate the tags! - tags = block.tags[:] - tags.extend(builder.tags()) + tags = sorted(set(itertools.chain(tags, builder.tags()))) if 'schema' not in tags: tags.append('schema') - node = self._create_parsetime_node( - block=block, + node = self.create_test_node( + target=target, path=compiled_path, config=config, + fqn=fqn, tags=tags, name=builder.fqn_name, raw_sql=builder.build_raw_sql(), - column_name=block.column_name, + column_name=column_name, test_metadata=metadata, ) self.render_update(node, config) + return node + + def parse_source_test( + self, + target: UnpatchedSourceDefinition, + test: Dict[str, Any], + column: Optional[UnparsedColumn], + ) -> ParsedSchemaTestNode: + column_name: Optional[str] + if column is None: + column_name = None + else: + column_name = column.name + should_quote = ( + column.quote or + (column.quote is None and target.quote_columns) + ) + if should_quote: + column_name = get_adapter(self.root_project).quote(column_name) + + tags_sources = [target.source.tags, target.table.tags] + if column is not None: + tags_sources.append(column.tags) + tags = list(itertools.chain.from_iterable(tags_sources)) + + node = self._parse_generic_test( + target=target, + test=test, + tags=tags, + column_name=column_name + ) + # we can't go through result.add_node - no file... instead! + if node.config.enabled: + self.results.add_node_nofile(node) + else: + self.results.add_disabled_nofile(node) + return node + + def parse_node(self, block: SchemaTestBlock) -> ParsedSchemaTestNode: + """In schema parsing, we rewrite most of the part of parse_node that + builds the initial node to be parsed, but rendering is basically the + same + """ + node = self._parse_generic_test( + target=block.target, + test=block.test, + tags=block.tags, + column_name=block.column_name, + ) self.add_result_node(block, node) return node def render_with_context( - self, node: ParsedSchemaTestNode, config: SourceConfig, + self, node: ParsedSchemaTestNode, config: ContextConfigType, ) -> None: - """Given the parsed node and a SourceConfig to use during parsing, - collect all the refs that might be squirreled away in the test + """Given the parsed node and a ContextConfigType to use during + parsing, collect all the refs that might be squirreled away in the test arguments. This includes the implicit "model" argument. """ # make a base context that doesn't have the magic kwargs field @@ -296,16 +493,7 @@ def parse_test( column_name=column_name, tags=column_tags, ) - try: - self.parse_node(block) - except CompilationException as exc: - context = _trimmed(str(block.target)) - msg = ( - 'Invalid test config given in {}:' - '\n\t{}\n\t@: {}' - .format(block.path.original_file_path, exc.msg, context) - ) - raise CompilationException(msg) from exc + self.parse_node(block) def parse_tests(self, block: TestBlock) -> None: for column in block.columns: @@ -319,6 +507,7 @@ def parse_file(self, block: FileBlock) -> None: # mark the file as seen, even if there are no macros in it self.results.get_file(block.file) if dct: + dct = self.raw_renderer.render_data(dct) yaml_block = YamlBlock.from_file_block(block, dct) self._parse_format_version(yaml_block) @@ -340,7 +529,7 @@ def parse_file(self, block: FileBlock) -> None: Parsed = TypeVar( 'Parsed', - ParsedSourceDefinition, ParsedNodePatch, ParsedMacroPatch + UnpatchedSourceDefinition, ParsedNodePatch, ParsedMacroPatch ) NodeTarget = TypeVar( 'NodeTarget', @@ -352,7 +541,7 @@ def parse_file(self, block: FileBlock) -> None: ) -class YamlDocsReader(Generic[Target, Parsed]): +class YamlDocsReader(metaclass=ABCMeta): def __init__( self, schema_parser: SchemaParser, yaml: YamlBlock, key: str ) -> None: @@ -394,236 +583,94 @@ def get_key_dicts(self) -> Iterable[Dict[str, Any]]: ) raise CompilationException(msg) - def parse_docs(self, block: TargetBlock) -> ParserRef: - refs = ParserRef() - for column in block.columns: - description = column.description - data_type = column.data_type - meta = column.meta - column_info( - self.root_project, - block.target, - description, - ) - - refs.add(column, description, data_type, meta) - return refs - @abstractmethod - def get_unparsed_target(self) -> Iterable[Target]: - raise NotImplementedError('get_unparsed_target is abstract') - - @abstractmethod - def get_block(self, node: Target) -> TargetBlock: - raise NotImplementedError('get_block is abstract') - - @abstractmethod - def parse_patch( - self, block: TargetBlock[Target], refs: ParserRef - ) -> None: - raise NotImplementedError('parse_patch is abstract') - def parse(self) -> List[TestBlock]: - node: Target - test_blocks: List[TestBlock] = [] - for node in self.get_unparsed_target(): - node_block = self.get_block(node) - if isinstance(node_block, TestBlock): - test_blocks.append(node_block) - refs = self.parse_docs(node_block) - self.parse_patch(node_block, refs) - return test_blocks + raise NotImplementedError('parse is abstract') -class YamlParser(Generic[Target, Parsed]): - def __init__( - self, schema_parser: SchemaParser, yaml: YamlBlock, key: str - ) -> None: - self.schema_parser = schema_parser - self.key = key - self.yaml = yaml +T = TypeVar('T', bound=JsonSchemaMixin) - @property - def results(self): - return self.schema_parser.results - @property - def project(self): - return self.schema_parser.project - - @property - def default_database(self): - return self.schema_parser.default_database - - @property - def root_project(self): - return self.schema_parser.root_project +class SourceParser(YamlDocsReader): + def _target_from_dict(self, cls: Type[T], data: Dict[str, Any]) -> T: + path = self.yaml.path.original_file_path + try: + return cls.from_dict(data) + except (ValidationError, JSONValidationException) as exc: + msg = error_context(path, self.key, data, exc) + raise CompilationException(msg) from exc - def get_key_dicts(self) -> Iterable[Dict[str, Any]]: - data = self.yaml.data.get(self.key, []) - if not isinstance(data, list): - raise CompilationException( - '{} must be a list, got {} instead: ({})' - .format(self.key, type(data), _trimmed(str(data))) + def parse(self) -> List[TestBlock]: + for data in self.get_key_dicts(): + data = self.project.credentials.translate_aliases( + data, recurse=True ) - path = self.yaml.path.original_file_path - for entry in data: - if coerce_dict_str(entry) is not None: - yield entry + is_override = 'overrides' in data + if is_override: + data['path'] = self.yaml.path.original_file_path + patch = self._target_from_dict(SourcePatch, data) + self.results.add_source_patch(self.yaml.file, patch) else: - msg = error_context( - path, self.key, data, 'expected a dict with string keys' - ) - raise CompilationException(msg) - - def parse_docs(self, block: TargetBlock) -> ParserRef: - refs = ParserRef() - for column in block.columns: - description = column.description - data_type = column.data_type - meta = column.meta - column_info( - self.root_project, block.target, description + source = self._target_from_dict(UnparsedSourceDefinition, data) + self.add_source_definitions(source) + return [] + + def add_source_definitions(self, source: UnparsedSourceDefinition) -> None: + original_file_path = self.yaml.path.original_file_path + fqn_path = self.yaml.path.relative_path + for table in source.tables: + unique_id = '.'.join([ + NodeType.Source, self.project.project_name, + source.name, table.name + ]) + + # the FQN is project name / path elements /source_name /table_name + fqn = self.schema_parser.get_fqn_prefix(fqn_path) + fqn.extend([source.name, table.name]) + + result = UnpatchedSourceDefinition( + source=source, + table=table, + path=original_file_path, + original_file_path=original_file_path, + root_path=self.project.project_root, + package_name=self.project.project_name, + unique_id=unique_id, + resource_type=NodeType.Source, + fqn=fqn, ) + self.results.add_source(self.yaml.file, result) - refs.add(column, description, data_type, meta) - return refs - - def parse(self): - node: Target - for node in self.get_unparsed_target(): - node_block = TargetBlock.from_yaml_block(self.yaml, node) - refs = self.parse_docs(node_block) - self.parse_tests(node_block) - self.parse_patch(node_block, refs) - def parse_tests(self, target: TargetBlock[Target]) -> None: - # some yaml parsers just don't have tests (macros, analyses) - pass +class NonSourceParser(YamlDocsReader, Generic[NonSourceTarget, Parsed]): + @abstractmethod + def _target_type(self) -> Type[NonSourceTarget]: + raise NotImplementedError('_unsafe_from_dict not implemented') @abstractmethod - def get_unparsed_target(self) -> Iterable[Target]: - raise NotImplementedError('get_unparsed_target is abstract') + def get_block(self, node: NonSourceTarget) -> TargetBlock: + raise NotImplementedError('get_block is abstract') @abstractmethod def parse_patch( - self, block: TargetBlock[Target], refs: ParserRef + self, block: TargetBlock[NonSourceTarget], refs: ParserRef ) -> None: raise NotImplementedError('parse_patch is abstract') - -class SourceParser(YamlDocsReader[SourceTarget, ParsedSourceDefinition]): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self._renderer = ConfigRenderer( - generate_target_context( - self.root_project, self.root_project.cli_vars - ) - ) - - def get_block(self, node: SourceTarget) -> TestBlock: - return TestBlock.from_yaml_block(self.yaml, node) - - def get_unparsed_target(self) -> Iterable[SourceTarget]: - path = self.yaml.path.original_file_path - - for data in self.get_key_dicts(): - try: - data = self.project.credentials.translate_aliases(data) - data = self._renderer.render_schema_source(data) - source = UnparsedSourceDefinition.from_dict(data) - except (ValidationError, JSONValidationException) as exc: - msg = error_context(path, self.key, data, exc) - raise CompilationException(msg) from exc + def parse(self) -> List[TestBlock]: + node: NonSourceTarget + test_blocks: List[TestBlock] = [] + for node in self.get_unparsed_target(): + node_block = self.get_block(node) + if isinstance(node_block, TestBlock): + test_blocks.append(node_block) + if isinstance(node, (HasColumnDocs, HasColumnTests)): + refs: ParserRef = ParserRef.from_target(node) else: - for table in source.tables: - yield SourceTarget(source, table) - - def _calculate_freshness( - self, - source: UnparsedSourceDefinition, - table: UnparsedSourceTableDefinition, - ) -> Optional[FreshnessThreshold]: - # if both are non-none, merge them. If both are None, the freshness is - # None. If just table.freshness is None, the user disabled freshness - # for the table. - # the result should be None as the user explicitly disabled freshness. - if source.freshness is not None and table.freshness is not None: - return source.freshness.merged(table.freshness) - elif source.freshness is None and table.freshness is not None: - return table.freshness - else: - return None - - def parse_patch( - self, block: TargetBlock[SourceTarget], refs: ParserRef - ) -> None: - source = block.target.source - table = block.target.table - unique_id = '.'.join([ - NodeType.Source, self.project.project_name, source.name, table.name - ]) - description = table.description or '' - meta = table.meta or {} - source_description = source.description or '' - column_info( - self.root_project, source, description, source_description - ) - - loaded_at_field = table.loaded_at_field or source.loaded_at_field - - freshness = self._calculate_freshness(source, table) - quoting = source.quoting.merged(table.quoting) - path = block.path.original_file_path - source_meta = source.meta or {} - - # make sure we don't do duplicate tags from source + table - tags = sorted(set(itertools.chain(source.tags, table.tags))) - - result = ParsedSourceDefinition( - package_name=self.project.project_name, - database=(source.database or self.default_database), - schema=(source.schema or source.name), - identifier=(table.identifier or table.name), - root_path=self.project.project_root, - path=path, - original_file_path=path, - columns=refs.column_info, - unique_id=unique_id, - name=table.name, - description=description, - external=table.external, - source_name=source.name, - source_description=source_description, - source_meta=source_meta, - meta=meta, - loader=source.loader, - loaded_at_field=loaded_at_field, - freshness=freshness, - quoting=quoting, - resource_type=NodeType.Source, - fqn=[self.project.project_name, source.name, table.name], - tags=tags, - ) - self.results.add_source(self.yaml.file, result) - - -class NonSourceParser( - YamlDocsReader[NonSourceTarget, Parsed], Generic[NonSourceTarget, Parsed] -): - def collect_column_info( - self, block: TargetBlock[NonSourceTarget] - ) -> str: - description = block.target.description - column_info( - self.root_project, block.target, description - ) - return description - - @abstractmethod - def _target_type(self) -> Type[NonSourceTarget]: - raise NotImplementedError('_unsafe_from_dict not implemented') + refs = ParserRef() + self.parse_patch(node_block, refs) + return test_blocks def get_unparsed_target(self) -> Iterable[NonSourceTarget]: path = self.yaml.path.original_file_path @@ -650,13 +697,12 @@ class NodePatchParser( def parse_patch( self, block: TargetBlock[NodeTarget], refs: ParserRef ) -> None: - description = self.collect_column_info(block) result = ParsedNodePatch( name=block.target.name, original_file_path=block.target.original_file_path, yaml_key=block.target.yaml_key, package_name=block.target.package_name, - description=description, + description=block.target.description, columns=refs.column_info, meta=block.target.meta, docs=block.target.docs, @@ -681,16 +727,6 @@ def _target_type(self) -> Type[UnparsedAnalysisUpdate]: class MacroPatchParser(NonSourceParser[UnparsedMacroUpdate, ParsedMacroPatch]): - def collect_column_info( - self, block: TargetBlock[UnparsedMacroUpdate] - ) -> str: - description = block.target.description - arg_docs = [arg.description for arg in block.target.arguments] - column_info( - self.root_project, block.target, description, *arg_docs - ) - return description - def get_block(self, node: UnparsedMacroUpdate) -> TargetBlock: return TargetBlock.from_yaml_block(self.yaml, node) @@ -700,15 +736,13 @@ def _target_type(self) -> Type[UnparsedMacroUpdate]: def parse_patch( self, block: TargetBlock[UnparsedMacroUpdate], refs: ParserRef ) -> None: - description = self.collect_column_info(block) - result = ParsedMacroPatch( name=block.target.name, original_file_path=block.target.original_file_path, yaml_key=block.target.yaml_key, package_name=block.target.package_name, arguments=block.target.arguments, - description=description, + description=block.target.description, meta=block.target.meta, docs=block.target.docs, ) diff --git a/core/dbt/parser/seeds.py b/core/dbt/parser/seeds.py index 83027584bd5..041922cf61f 100644 --- a/core/dbt/parser/seeds.py +++ b/core/dbt/parser/seeds.py @@ -1,7 +1,7 @@ +from dbt.context.context_config import ContextConfigType from dbt.contracts.graph.manifest import SourceFile, FilePath from dbt.contracts.graph.parsed import ParsedSeedNode from dbt.node_types import NodeType -from dbt.source_config import SourceConfig from dbt.parser.base import SimpleSQLParser from dbt.parser.search import FileBlock, FilesystemSearcher @@ -24,7 +24,7 @@ def get_compiled_path(cls, block: FileBlock): return block.path.relative_path def render_with_context( - self, parsed_node: ParsedSeedNode, config: SourceConfig + self, parsed_node: ParsedSeedNode, config: ContextConfigType ) -> None: """Seeds don't need to do any rendering.""" diff --git a/core/dbt/parser/sources.py b/core/dbt/parser/sources.py new file mode 100644 index 00000000000..86012c0a4e4 --- /dev/null +++ b/core/dbt/parser/sources.py @@ -0,0 +1,197 @@ +from pathlib import Path +from typing import ( + Iterable, + Dict, + Optional, + Set, +) +from dbt.config import RuntimeConfig +from dbt.contracts.graph.manifest import Manifest, SourceKey +from dbt.contracts.graph.parsed import ( + UnpatchedSourceDefinition, + ParsedSourceDefinition, + ParsedSchemaTestNode, +) +from dbt.contracts.graph.unparsed import ( + UnparsedSourceDefinition, + SourcePatch, + SourceTablePatch, + UnparsedSourceTableDefinition, +) +from dbt.exceptions import warn_or_error + +from dbt.parser.schemas import SchemaParser, ParserRef +from dbt.parser.results import ParseResult +from dbt.ui import printer + + +class SourcePatcher: + def __init__( + self, + results: ParseResult, + root_project: RuntimeConfig, + ) -> None: + self.results = results + self.root_project = root_project + self.macro_manifest = Manifest.from_macros( + macros=self.results.macros, + files=self.results.files + ) + self.schema_parsers: Dict[str, SchemaParser] = {} + self.patches_used: Dict[SourceKey, Set[str]] = {} + self.sources: Dict[str, ParsedSourceDefinition] = {} + + def patch_source( + self, + unpatched: UnpatchedSourceDefinition, + patch: Optional[SourcePatch], + ) -> UnpatchedSourceDefinition: + + source_dct = unpatched.source.to_dict() + table_dct = unpatched.table.to_dict() + patch_path: Optional[Path] = None + + source_table_patch: Optional[SourceTablePatch] = None + + if patch is not None: + source_table_patch = patch.get_table_named(unpatched.table.name) + source_dct.update(patch.to_patch_dict()) + patch_path = patch.path + + if source_table_patch is not None: + table_dct.update(source_table_patch.to_patch_dict()) + + source = UnparsedSourceDefinition.from_dict(source_dct) + table = UnparsedSourceTableDefinition.from_dict(table_dct) + return unpatched.replace( + source=source, table=table, patch_path=patch_path + ) + + def parse_source_docs(self, block: UnpatchedSourceDefinition) -> ParserRef: + refs = ParserRef() + for column in block.columns: + description = column.description + data_type = column.data_type + meta = column.meta + refs.add(column, description, data_type, meta) + return refs + + def get_schema_parser_for(self, package_name: str) -> 'SchemaParser': + if package_name in self.schema_parsers: + schema_parser = self.schema_parsers[package_name] + else: + all_projects = self.root_project.load_dependencies() + project = all_projects[package_name] + schema_parser = SchemaParser( + self.results, project, self.root_project, self.macro_manifest + ) + self.schema_parsers[package_name] = schema_parser + return schema_parser + + def get_source_tests( + self, target: UnpatchedSourceDefinition + ) -> Iterable[ParsedSchemaTestNode]: + schema_parser = self.get_schema_parser_for(target.package_name) + for test, column in target.get_tests(): + yield schema_parser.parse_source_test( + target=target, + test=test, + column=column, + ) + + def get_patch_for( + self, + unpatched: UnpatchedSourceDefinition, + ) -> Optional[SourcePatch]: + key = (unpatched.package_name, unpatched.source.name) + patch: Optional[SourcePatch] = self.results.source_patches.get(key) + if patch is None: + return None + if key not in self.patches_used: + # mark the key as used + self.patches_used[key] = set() + if patch.get_table_named(unpatched.table.name) is not None: + self.patches_used[key].add(unpatched.table.name) + return patch + + def construct_sources(self) -> None: + # given the UnpatchedSourceDefinition and SourcePatches, combine them + # to make a beautiful baby ParsedSourceDefinition. + for unique_id, unpatched in self.results.sources.items(): + patch = self.get_patch_for(unpatched) + + patched = self.patch_source(unpatched, patch) + # now use the patched UnpatchedSourceDefinition to extract test + # data. + for test in self.get_source_tests(patched): + if test.config.enabled: + self.results.add_node_nofile(test) + else: + self.results.add_disabled_nofile(test) + + schema_parser = self.get_schema_parser_for(unpatched.package_name) + parsed = schema_parser.parse_source(patched) + if parsed.config.enabled: + self.sources[unique_id] = parsed + else: + self.results.add_disabled_nofile(parsed) + + self.warn_unused() + + def warn_unused(self) -> None: + unused_tables: Dict[SourceKey, Optional[Set[str]]] = {} + for patch in self.results.source_patches.values(): + key = (patch.overrides, patch.name) + if key not in self.patches_used: + unused_tables[key] = None + elif patch.tables is not None: + table_patches = {t.name for t in patch.tables} + unused = table_patches - self.patches_used[key] + # don't add unused tables, the + if unused: + # because patches are required to be unique, we can safely + # write without looking + unused_tables[key] = unused + + if unused_tables: + msg = self.get_unused_msg(unused_tables) + warn_or_error(msg, log_fmt=printer.yellow('WARNING: {}')) + + def get_unused_msg( + self, + unused_tables: Dict[SourceKey, Optional[Set[str]]], + ) -> str: + msg = [ + 'During parsing, dbt encountered source overrides that had no ' + 'target:', + ] + for key, table_names in unused_tables.items(): + patch = self.results.source_patches[key] + patch_name = f'{patch.overrides}.{patch.name}' + if table_names is None: + msg.append( + f' - Source {patch_name} (in {patch.path})' + ) + else: + for table_name in sorted(table_names): + msg.append( + f' - Source table {patch_name}.{table_name} ' + f'(in {patch.path})' + ) + msg.append('') + return '\n'.join(msg) + + +def patch_sources( + results: ParseResult, + root_project: RuntimeConfig, +) -> Dict[str, ParsedSourceDefinition]: + """Patch all the sources found in the results. Updates results.disabled and + results.nodes. + + Return a dict of ParsedSourceDefinitions, suitable for use in + manifest.sources. + """ + patcher = SourcePatcher(results, root_project) + patcher.construct_sources() + return patcher.sources diff --git a/core/dbt/source_config.py b/core/dbt/source_config.py deleted file mode 100644 index 828abe284d6..00000000000 --- a/core/dbt/source_config.py +++ /dev/null @@ -1,220 +0,0 @@ -from typing import Dict, Any - -import dbt.exceptions - -from dbt.utils import deep_merge -from dbt.node_types import NodeType -from dbt.adapters.factory import get_adapter_class_by_name - - -class SourceConfig: - AppendListFields = {'pre-hook', 'post-hook', 'tags'} - ExtendDictFields = {'vars', 'column_types', 'quoting', 'persist_docs'} - ClobberFields = { - 'alias', - 'schema', - 'enabled', - 'materialized', - 'unique_key', - 'database', - 'severity', - 'sql_header', - 'incremental_strategy', - - # snapshots - 'target_database', - 'target_schema', - 'strategy', - 'updated_at', - # this is often a list, but it should replace and not append (sometimes - # it's 'all') - 'check_cols', - # seeds - 'quote_columns', - } - ConfigKeys = AppendListFields | ExtendDictFields | ClobberFields - - def __init__(self, active_project, own_project, fqn, node_type): - self._config = None - # active_project is a RuntimeConfig, not a Project - self.active_project = active_project - self.own_project = own_project - self.fqn = fqn - self.node_type = node_type - - adapter_type = active_project.credentials.type - adapter_class = get_adapter_class_by_name(adapter_type) - self.AdapterSpecificConfigs = adapter_class.AdapterSpecificConfigs - - # the config options defined within the model - self.in_model_config: Dict[str, Any] = {} - - def _merge(self, *configs): - merged_config: Dict[str, Any] = {} - for config in configs: - # Do not attempt to deep merge clobber fields - config = config.copy() - clobber = { - key: config.pop(key) for key in list(config.keys()) - if key in (self.ClobberFields | self.AdapterSpecificConfigs) - } - intermediary_merged = deep_merge( - merged_config, config - ) - intermediary_merged.update(clobber) - - merged_config.update(intermediary_merged) - return merged_config - - # this is re-evaluated every time `config` is called. - # we can cache it, but that complicates things. - # TODO : see how this fares performance-wise - @property - def config(self): - """ - Config resolution order: - - if this is a dependency model: - - own project config - - in-model config - - active project config - if this is a top-level model: - - active project config - - in-model config - """ - - defaults = {"enabled": True, "materialized": "view"} - - if self.node_type == NodeType.Seed: - defaults['materialized'] = 'seed' - elif self.node_type == NodeType.Snapshot: - defaults['materialized'] = 'snapshot' - - if self.node_type == NodeType.Test: - defaults['severity'] = 'ERROR' - - active_config = self.load_config_from_active_project() - - if self.active_project.project_name == self.own_project.project_name: - cfg = self._merge(defaults, active_config, - self.in_model_config) - else: - own_config = self.load_config_from_own_project() - - cfg = self._merge( - defaults, own_config, self.in_model_config, active_config - ) - - return cfg - - def _translate_adapter_aliases(self, config): - return self.active_project.credentials.translate_aliases(config) - - def update_in_model_config(self, config): - config = self._translate_adapter_aliases(config) - for key, value in config.items(): - if key in self.AppendListFields: - current = self.in_model_config.get(key, []) - if not isinstance(value, (list, tuple)): - value = [value] - current.extend(value) - self.in_model_config[key] = current - elif key in self.ExtendDictFields: - current = self.in_model_config.get(key, {}) - try: - current.update(value) - except (ValueError, TypeError, AttributeError): - dbt.exceptions.raise_compiler_error( - 'Invalid config field: "{}" must be a dict'.format(key) - ) - self.in_model_config[key] = current - else: # key in self.ClobberFields or self.AdapterSpecificConfigs - self.in_model_config[key] = value - - @staticmethod - def __get_as_list(relevant_configs, key): - if key not in relevant_configs: - return [] - - items = relevant_configs[key] - if not isinstance(items, (list, tuple)): - items = [items] - - return items - - def smart_update(self, mutable_config, new_configs): - config_keys = self.ConfigKeys | self.AdapterSpecificConfigs - - relevant_configs: Dict[str, Any] = { - key: new_configs[key] for key - in new_configs if key in config_keys - } - - for key in self.AppendListFields: - append_fields = self.__get_as_list(relevant_configs, key) - mutable_config[key].extend([ - f for f in append_fields if f not in mutable_config[key] - ]) - - for key in self.ExtendDictFields: - dict_val = relevant_configs.get(key, {}) - try: - mutable_config[key].update(dict_val) - except (ValueError, TypeError, AttributeError): - dbt.exceptions.raise_compiler_error( - 'Invalid config field: "{}" must be a dict'.format(key) - ) - - for key in (self.ClobberFields | self.AdapterSpecificConfigs): - if key in relevant_configs: - mutable_config[key] = relevant_configs[key] - - return relevant_configs - - def get_project_config(self, runtime_config): - # most configs are overwritten by a more specific config, but pre/post - # hooks are appended! - config: Dict[str, Any] = {} - for k in self.AppendListFields: - config[k] = [] - for k in self.ExtendDictFields: - config[k] = {} - - if self.node_type == NodeType.Seed: - model_configs = runtime_config.seeds - elif self.node_type == NodeType.Snapshot: - model_configs = runtime_config.snapshots - else: - model_configs = runtime_config.models - - if model_configs is None: - return config - - # mutates config - self.smart_update(config, model_configs) - - fqn = self.fqn[:] - for level in fqn: - level_config = model_configs.get(level, None) - if level_config is None: - break - - # mutates config - relevant_configs = self.smart_update(config, level_config) - - clobber_configs = { - k: v for (k, v) in relevant_configs.items() - if k not in self.AppendListFields and - k not in self.ExtendDictFields - } - - config.update(clobber_configs) - model_configs = model_configs[level] - - return config - - def load_config_from_own_project(self): - return self.get_project_config(self.own_project) - - def load_config_from_active_project(self): - return self.get_project_config(self.active_project) diff --git a/core/dbt/task/debug.py b/core/dbt/task/debug.py index e9204fccf99..14b374e13a6 100644 --- a/core/dbt/task/debug.py +++ b/core/dbt/task/debug.py @@ -12,7 +12,8 @@ from dbt.links import ProfileConfigDocs from dbt.adapters.factory import get_adapter, register_adapter from dbt.version import get_installed_version -from dbt.config import Project, Profile, ConfigRenderer +from dbt.config import Project, Profile +from dbt.config.renderer import DbtProjectYamlRenderer, ProfileRenderer from dbt.context.base import generate_base_context from dbt.context.target import generate_target_context from dbt.clients.yaml_helper import load_yaml_text @@ -142,11 +143,12 @@ def _load_project(self): else: ctx = generate_target_context(self.profile, self.cli_vars) - renderer = ConfigRenderer(ctx) + renderer = DbtProjectYamlRenderer(ctx) try: - self.project = Project.from_project_root(self.project_dir, - renderer) + self.project = Project.from_project_root( + self.project_dir, renderer + ) except dbt.exceptions.DbtConfigError as exc: self.project_fail_details = str(exc) return red('ERROR invalid') @@ -185,7 +187,9 @@ def _choose_profile_names(self) -> Optional[List[str]]: partial = Project.partial_load( os.path.dirname(self.project_path) ) - renderer = ConfigRenderer(generate_base_context(self.cli_vars)) + renderer = DbtProjectYamlRenderer( + generate_base_context(self.cli_vars) + ) project_profile = partial.render_profile_name(renderer) except dbt.exceptions.DbtProjectError: pass @@ -226,7 +230,7 @@ def _choose_target_name(self, profile_name: str): assert self.raw_profile_data is not None raw_profile = self.raw_profile_data[profile_name] - renderer = ConfigRenderer(generate_base_context(self.cli_vars)) + renderer = ProfileRenderer(generate_base_context(self.cli_vars)) target_name, _ = Profile.render_profile( raw_profile=raw_profile, @@ -256,7 +260,7 @@ def _load_profile(self): profile_errors = [] profile_names = self._choose_profile_names() - renderer = ConfigRenderer(generate_base_context(self.cli_vars)) + renderer = ProfileRenderer(generate_base_context(self.cli_vars)) for profile_name in profile_names: try: profile: Profile = QueryCommentedProfile.render_from_args( @@ -372,7 +376,7 @@ def validate_connection(cls, target_dict): raw_profile=profile_data, profile_name='', target_override=target_name, - renderer=ConfigRenderer(generate_base_context({})), + renderer=ProfileRenderer(generate_base_context({})), ) result = cls.attempt_connection(profile) if result is not None: diff --git a/core/dbt/task/deps.py b/core/dbt/task/deps.py index f0f4bd8d165..94522e588b6 100644 --- a/core/dbt/task/deps.py +++ b/core/dbt/task/deps.py @@ -2,7 +2,8 @@ import dbt.deprecations import dbt.exceptions -from dbt.config import UnsetProfileConfig, ConfigRenderer +from dbt.config import UnsetProfileConfig +from dbt.config.renderer import DbtProjectYamlRenderer from dbt.context.target import generate_target_context from dbt.deps.base import downloads_directory from dbt.deps.resolver import resolve_packages @@ -43,7 +44,7 @@ def run(self): with downloads_directory(): final_deps = resolve_packages(packages, self.config) - renderer = ConfigRenderer(generate_target_context( + renderer = DbtProjectYamlRenderer(generate_target_context( self.config, self.config.cli_vars )) diff --git a/core/dbt/task/generate.py b/core/dbt/task/generate.py index c1d4b478e4d..50d505c6f10 100644 --- a/core/dbt/task/generate.py +++ b/core/dbt/task/generate.py @@ -1,7 +1,7 @@ import os import shutil from datetime import datetime -from typing import Dict, List, Any, Optional +from typing import Dict, List, Any, Optional, Tuple, Set from hologram import ValidationError @@ -93,20 +93,28 @@ def add_column(self, data: PrimitiveDict): def make_unique_id_map( self, manifest: Manifest - ) -> Dict[str, CatalogTable]: + ) -> Tuple[Dict[str, CatalogTable], Dict[str, CatalogTable]]: nodes: Dict[str, CatalogTable] = {} + sources: Dict[str, CatalogTable] = {} - manifest_mapping = get_unique_id_mapping(manifest) + node_map, source_map = get_unique_id_mapping(manifest) for table in self.values(): - unique_ids = manifest_mapping.get(table.key(), []) + key = table.key() + if key in node_map: + unique_id = node_map[key] + nodes[unique_id] = table.replace(unique_id=unique_id) + + unique_ids = source_map.get(table.key(), set()) for unique_id in unique_ids: - if unique_id in nodes: + if unique_id in sources: dbt.exceptions.raise_ambiguous_catalog_match( - unique_id, nodes[unique_id].to_dict(), table.to_dict() + unique_id, + sources[unique_id].to_dict(), + table.to_dict(), ) else: - nodes[unique_id] = table.replace(unique_id=unique_id) - return nodes + sources[unique_id] = table.replace(unique_id=unique_id) + return nodes, sources def format_stats(stats: PrimitiveDict) -> StatsDict: @@ -160,18 +168,23 @@ def mapping_key(node: CompileResultNode) -> CatalogKey: ) -def get_unique_id_mapping(manifest: Manifest) -> Dict[CatalogKey, List[str]]: +def get_unique_id_mapping( + manifest: Manifest +) -> Tuple[Dict[CatalogKey, str], Dict[CatalogKey, Set[str]]]: # A single relation could have multiple unique IDs pointing to it if a # source were also a node. - ident_map: Dict[CatalogKey, List[str]] = {} + node_map: Dict[CatalogKey, str] = {} + source_map: Dict[CatalogKey, Set[str]] = {} for unique_id, node in manifest.nodes.items(): key = mapping_key(node) + node_map[key] = unique_id - if key not in ident_map: - ident_map[key] = [] - - ident_map[key].append(unique_id) - return ident_map + for unique_id, source in manifest.sources.items(): + key = mapping_key(source) + if key not in source_map: + source_map[key] = set() + source_map[key].add(unique_id) + return node_map, source_map def _coerce_decimal(value): @@ -198,6 +211,7 @@ def run(self) -> CatalogResults: ) return CatalogResults( nodes={}, + sources={}, generated_at=datetime.utcnow(), errors=None, _compile_results=compile_results @@ -230,8 +244,10 @@ def run(self) -> CatalogResults: if exceptions: errors = [str(e) for e in exceptions] + nodes, sources = catalog.make_unique_id_map(self.manifest) results = self.get_catalog_results( - nodes=catalog.make_unique_id_map(self.manifest), + nodes=nodes, + sources=sources, generated_at=datetime.utcnow(), compile_results=compile_results, errors=errors, @@ -257,12 +273,14 @@ def run(self) -> CatalogResults: def get_catalog_results( self, nodes: Dict[str, CatalogTable], + sources: Dict[str, CatalogTable], generated_at: datetime, compile_results: Optional[Any], errors: Optional[List[str]] ) -> CatalogResults: return CatalogResults( nodes=nodes, + sources=sources, generated_at=generated_at, _compile_results=compile_results, errors=errors, diff --git a/core/dbt/task/list.py b/core/dbt/task/list.py index d6c5ce33896..03b0a3a369e 100644 --- a/core/dbt/task/list.py +++ b/core/dbt/task/list.py @@ -58,7 +58,15 @@ def _iterate_selected_nodes(self): 'manifest is None in _iterate_selected_nodes' ) for node in nodes: - yield self.manifest.nodes[node] + if node in self.manifest.nodes: + yield self.manifest.nodes[node] + elif node in self.manifest.sources: + yield self.manifest.sources[node] + else: + raise RuntimeException( + f'Got an unexpected result from node selection: "{node}"' + f'Expected a source or a node!' + ) def generate_selectors(self): for node in self._iterate_selected_nodes(): diff --git a/core/dbt/task/rpc/cli.py b/core/dbt/task/rpc/cli.py index d5bb61b6d96..622480470f9 100644 --- a/core/dbt/task/rpc/cli.py +++ b/core/dbt/task/rpc/cli.py @@ -13,6 +13,7 @@ Result, ) from dbt.exceptions import InternalException +from dbt.perf_utils import get_full_manifest from dbt.utils import parse_cli_vars from .base import RPCTask @@ -39,14 +40,6 @@ def __init__(self, args, config, manifest): def set_config(self, config): super().set_config(config) - # read any cli vars we got and use it to update cli_vars - self.config.cli_vars.update( - parse_cli_vars(getattr(self.args, 'vars', '{}')) - ) - # rewrite args.vars to reflect our merged vars - self.args.vars = yaml.safe_dump(self.config.cli_vars) - self.config.args = self.args - if self.task_type is None: raise InternalException('task type not set for set_config') if issubclass(self.task_type, RemoteManifestMethod): @@ -96,6 +89,23 @@ def handle_request(self) -> Result: 'CLI task is in a bad state: handle_request called with no ' 'real_task set!' ) + + # It's important to update cli_vars here, because set_config()'s + # `self.config` is before the fork(), so it would alter the behavior of + # future calls. + + # read any cli vars we got and use it to update cli_vars + self.config.cli_vars.update( + parse_cli_vars(getattr(self.args, 'vars', '{}')) + ) + # If this changed the vars, rewrite args.vars to reflect our merged + # vars and reload the manifest. + dumped = yaml.safe_dump(self.config.cli_vars) + if dumped != self.args.vars: + self.real_task.args.vars = dumped + if isinstance(self.real_task, RemoteManifestMethod): + self.real_task.manifest = get_full_manifest(self.config) + # we parsed args from the cli, so we're set on that front return self.real_task.handle_request() diff --git a/core/dbt/task/rpc/project_commands.py b/core/dbt/task/rpc/project_commands.py index 465c228b4e2..99a67f8a36b 100644 --- a/core/dbt/task/rpc/project_commands.py +++ b/core/dbt/task/rpc/project_commands.py @@ -112,10 +112,11 @@ def set_args(self, params: RPCDocsGenerateParameters) -> None: self.args.compile = params.compile def get_catalog_results( - self, nodes, generated_at, compile_results, errors + self, nodes, sources, generated_at, compile_results, errors ) -> RemoteCatalogResults: return RemoteCatalogResults( nodes=nodes, + sources=sources, generated_at=datetime.utcnow(), _compile_results=compile_results, errors=errors, diff --git a/core/dbt/task/run.py b/core/dbt/task/run.py index 5e633f6194d..40ad51585a3 100644 --- a/core/dbt/task/run.py +++ b/core/dbt/task/run.py @@ -16,7 +16,7 @@ import dbt.exceptions import dbt.flags -from dbt.hooks import get_hook +from dbt.hooks import get_hook_dict from dbt.ui.printer import \ print_hook_start_line, \ print_hook_end_line, \ @@ -26,6 +26,7 @@ from dbt.compilation import compile_node from dbt.contracts.graph.compiled import CompileResultNode +from dbt.contracts.graph.model_config import Hook from dbt.contracts.graph.parsed import ParsedHookNode from dbt.task.compile import CompileTask @@ -76,6 +77,12 @@ def get_hooks_by_tags( return matched_nodes +def get_hook(source, index): + hook_dict = get_hook_dict(source) + hook_dict.setdefault('index', index) + return Hook.from_dict(hook_dict) + + class RunTask(CompileTask): def __init__(self, args, config): super().__init__(args, config) diff --git a/core/dbt/task/runnable.py b/core/dbt/task/runnable.py index d23df026f7f..d86bd2c744e 100644 --- a/core/dbt/task/runnable.py +++ b/core/dbt/task/runnable.py @@ -23,6 +23,7 @@ from dbt.contracts.graph.compiled import CompileResultNode from dbt.contracts.graph.manifest import Manifest +from dbt.contracts.graph.parsed import ParsedSourceDefinition from dbt.contracts.results import ExecutionResult from dbt.exceptions import ( InternalException, @@ -117,9 +118,17 @@ def _runtime_initialize(self): selected_nodes) # we use this a couple times. order does not matter. - self._flattened_nodes = [ - self.manifest.nodes[uid] for uid in selected_nodes - ] + self._flattened_nodes = [] + for uid in selected_nodes: + if uid in self.manifest.nodes: + self._flattened_nodes.append(self.manifest.nodes[uid]) + elif uid in self.manifest.sources: + self._flattened_nodes.append(self.manifest.sources[uid]) + else: + raise InternalException( + f'Node selection returned {uid}, expected a node or a ' + f'source' + ) self.num_nodes = len([ n for n in self._flattened_nodes @@ -191,7 +200,7 @@ def _submit(self, pool, args, callback): """If the caller has passed the magic 'single-threaded' flag, call the function directly instead of pool.apply_async. The single-threaded flag is intended for gathering more useful performance information about - what appens beneath `call_runner`, since python's default profiling + what happens beneath `call_runner`, since python's default profiling tools ignore child threads. This does still go through the callback path for result collection. @@ -266,7 +275,10 @@ def _handle_result(self, result): if self.manifest is None: raise InternalException('manifest was None in _handle_result') - self.manifest.update_node(node) + if isinstance(node, ParsedSourceDefinition): + self.manifest.update_source(node) + else: + self.manifest.update_node(node) if result.error is not None: if is_ephemeral: diff --git a/core/dbt/task/serve.py b/core/dbt/task/serve.py index 42258eb83c2..4d0e2473498 100644 --- a/core/dbt/task/serve.py +++ b/core/dbt/task/serve.py @@ -31,10 +31,11 @@ def run(self): SimpleHTTPRequestHandler # type: ignore ) # type: ignore - try: - webbrowser.open_new_tab('http://127.0.0.1:{}'.format(port)) - except webbrowser.Error: - pass + if self.args.open_browser: + try: + webbrowser.open_new_tab(f'http://127.0.0.1:{port}') + except webbrowser.Error: + pass try: httpd.serve_forever() # blocks diff --git a/core/dbt/utils.py b/core/dbt/utils.py index bc58f662e92..cacd33edf21 100644 --- a/core/dbt/utils.py +++ b/core/dbt/utils.py @@ -11,7 +11,8 @@ from enum import Enum from typing_extensions import Protocol from typing import ( - Tuple, Type, Any, Optional, TypeVar, Dict, Union, Callable + Tuple, Type, Any, Optional, TypeVar, Dict, Union, Callable, List, Iterator, + Mapping, Iterable, AbstractSet, Set, Sequence ) import dbt.exceptions @@ -247,10 +248,6 @@ def __init__(self, *args, **kwargs): self.__dict__ = self -def is_enabled(node): - return node.config.enabled - - def get_pseudo_test_path(node_name, source_path, test_type): "schema tests all come from schema.yml files. fake a source sql file" source_path_parts = split_path(source_path) @@ -292,7 +289,7 @@ def __init__(self, func): self.cache = {} def __call__(self, *args): - if not isinstance(args, collections.Hashable): + if not isinstance(args, collections.abc.Hashable): # uncacheable. a list, for instance. # better to not cache than blow up. return self.func(*args) @@ -311,44 +308,44 @@ def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) -def invalid_ref_test_message(node, target_model_name, target_model_package, - disabled): - if disabled: - msg = dbt.exceptions.get_target_disabled_msg( - node, target_model_name, target_model_package - ) - else: - msg = dbt.exceptions.get_target_not_found_msg( - node, target_model_name, target_model_package - ) - return 'WARNING: {}'.format(msg) - - def invalid_ref_fail_unless_test(node, target_model_name, target_model_package, disabled): if node.resource_type == NodeType.Test: - msg = invalid_ref_test_message(node, target_model_name, - target_model_package, disabled) + msg = dbt.exceptions.get_target_not_found_or_disabled_msg( + node, target_model_name, target_model_package, disabled + ) if disabled: - logger.debug(msg) + logger.debug(f'WARNING: {msg}') else: - dbt.exceptions.warn_or_error(msg) + dbt.exceptions.warn_or_error(msg, log_fmt='WARNING: {}') else: dbt.exceptions.ref_target_not_found( node, target_model_name, - target_model_package) + target_model_package, + disabled=disabled, + ) -def invalid_source_fail_unless_test(node, target_name, target_table_name): +def invalid_source_fail_unless_test( + node, target_name, target_table_name, disabled +): if node.resource_type == NodeType.Test: - msg = dbt.exceptions.source_disabled_message(node, target_name, - target_table_name) - dbt.exceptions.warn_or_error(msg, log_fmt='WARNING: {}') + msg = dbt.exceptions.get_source_not_found_or_disabled_msg( + node, target_name, target_table_name, disabled + ) + if disabled: + logger.debug(f'WARNING: {msg}') + else: + dbt.exceptions.warn_or_error(msg, log_fmt='WARNING: {}') else: - dbt.exceptions.source_target_not_found(node, target_name, - target_table_name) + dbt.exceptions.source_target_not_found( + node, + target_name, + target_table_name, + disabled=disabled + ) def parse_cli_vars(var_string: str) -> Dict[str, Any]: @@ -414,33 +411,61 @@ def default(self, obj): return str(obj) +class Translator: + def __init__(self, aliases: Mapping[str, str], recursive: bool = False): + self.aliases = aliases + self.recursive = recursive + + def translate_mapping( + self, kwargs: Mapping[str, Any] + ) -> Dict[str, Any]: + result: Dict[str, Any] = {} + + for key, value in kwargs.items(): + canonical_key = self.aliases.get(key, key) + if canonical_key in result: + dbt.exceptions.raise_duplicate_alias( + kwargs, self.aliases, canonical_key + ) + result[canonical_key] = self.translate_value(value) + return result + + def translate_sequence(self, value: Sequence[Any]) -> List[Any]: + return [self.translate_value(v) for v in value] + + def translate_value(self, value: Any) -> Any: + if self.recursive: + if isinstance(value, Mapping): + return self.translate_mapping(value) + elif isinstance(value, (list, tuple)): + return self.translate_sequence(value) + return value + + def translate(self, value: Mapping[str, Any]) -> Dict[str, Any]: + try: + return self.translate_mapping(value) + except RuntimeError as exc: + if 'maximum recursion depth exceeded' in str(exc): + raise dbt.exceptions.RecursionException( + 'Cycle detected in a value passed to translate!' + ) + raise + + def translate_aliases( - kwargs: Dict[str, Any], aliases: Dict[str, str] + kwargs: Dict[str, Any], aliases: Dict[str, str], recurse: bool = False, ) -> Dict[str, Any]: """Given a dict of keyword arguments and a dict mapping aliases to their canonical values, canonicalize the keys in the kwargs dict. - :return: A dict continaing all the values in kwargs referenced by their + If recurse is True, perform this operation recursively. + + :return: A dict containing all the values in kwargs referenced by their canonical key. :raises: `AliasException`, if a canonical key is defined more than once. """ - result: Dict[str, Any] = {} - - for given_key, value in kwargs.items(): - canonical_key = aliases.get(given_key, given_key) - if canonical_key in result: - # dupe found: go through the dict so we can have a nice-ish error - key_names = ', '.join("{}".format(k) for k in kwargs if - aliases.get(k) == canonical_key) - - raise dbt.exceptions.AliasException( - 'Got duplicate keys: ({}) all map to "{}"' - .format(key_names, canonical_key) - ) - - result[canonical_key] = value - - return result + translator = Translator(aliases, recurse) + return translator.translate(kwargs) def _pluralize(string: Union[str, NodeType]) -> str: @@ -539,3 +564,70 @@ def executor(config: HasThreadingConfig) -> concurrent.futures.Executor: return concurrent.futures.ThreadPoolExecutor( max_workers=config.threads ) + + +def fqn_search( + root: Dict[str, Any], fqn: List[str] +) -> Iterator[Dict[str, Any]]: + """Iterate into a nested dictionary, looking for keys in the fqn as levels. + Yield the level config. + """ + yield root + + for level in fqn: + level_config = root.get(level, None) + if not isinstance(level_config, dict): + break + yield copy.deepcopy(level_config) + root = level_config + + +StringMap = Mapping[str, Any] +StringMapList = List[StringMap] +StringMapIter = Iterable[StringMap] + + +class MultiDict(Mapping[str, Any]): + """Implement the mapping protocol using a list of mappings. The most + recently added mapping "wins". + """ + def __init__(self, sources: Optional[StringMapList] = None) -> None: + super().__init__() + self.sources: StringMapList + + if sources is None: + self.sources = [] + else: + self.sources = sources + + def add_from(self, sources: StringMapIter): + self.sources.extend(sources) + + def add(self, source: StringMap): + self.sources.append(source) + + def _keyset(self) -> AbstractSet[str]: + # return the set of keys + keys: Set[str] = set() + for entry in self._itersource(): + keys.update(entry) + return keys + + def _itersource(self) -> StringMapIter: + return reversed(self.sources) + + def __iter__(self) -> Iterator[str]: + # we need to avoid duplicate keys + return iter(self._keyset()) + + def __len__(self): + return len(self._keyset()) + + def __getitem__(self, name: str) -> Any: + for entry in self._itersource(): + if name in entry: + return entry[name] + raise KeyError(name) + + def __contains__(self, name) -> bool: + return any((name in entry for entry in self._itersource())) diff --git a/core/setup.py b/core/setup.py index b24a6917cf3..7d482a664fc 100644 --- a/core/setup.py +++ b/core/setup.py @@ -87,4 +87,5 @@ def read(fname): 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/plugins/bigquery/dbt/adapters/bigquery/impl.py b/plugins/bigquery/dbt/adapters/bigquery/impl.py index 0ac599ffb8b..a11af60b72a 100644 --- a/plugins/bigquery/dbt/adapters/bigquery/impl.py +++ b/plugins/bigquery/dbt/adapters/bigquery/impl.py @@ -1,5 +1,5 @@ from dataclasses import dataclass -from typing import Dict, List, Optional, Any, Set +from typing import Dict, List, Optional, Any, Set, Union from hologram import JsonSchemaMixin, ValidationError import dbt.deprecations @@ -10,7 +10,7 @@ import dbt.links from dbt.adapters.base import ( - BaseAdapter, available, RelationType, SchemaSearchMap + BaseAdapter, available, RelationType, SchemaSearchMap, AdapterConfig ) from dbt.adapters.bigquery.relation import ( BigQueryRelation, BigQueryInformationSchema @@ -30,15 +30,6 @@ import time import agate -import re - - -BQ_INTEGER_RANGE_NOT_SUPPORTED = f""" -BigQuery integer range partitioning is only supported by the -`partition_by` config, which accepts a dictionary. - -See: {dbt.links.BigQueryNewPartitionBy} -""" @dataclass @@ -57,57 +48,17 @@ def render(self, alias: Optional[str] = None): else: return column - @classmethod - def _parse(cls, raw_partition_by) -> Optional['PartitionConfig']: - if isinstance(raw_partition_by, dict): - try: - return cls.from_dict(raw_partition_by) - except ValidationError as exc: - msg = dbt.exceptions.validator_error_message(exc) - dbt.exceptions.raise_compiler_error( - f'Could not parse partition config: {msg}' - ) - - elif isinstance(raw_partition_by, str): - raw_partition_by = raw_partition_by.strip() - if 'range_bucket' in raw_partition_by.lower(): - dbt.exceptions.raise_compiler_error( - BQ_INTEGER_RANGE_NOT_SUPPORTED - ) - - elif raw_partition_by.lower().startswith('date('): - matches = re.match(r'date\((.+)\)', raw_partition_by, - re.IGNORECASE) - if not matches: - dbt.exceptions.raise_compiler_error( - f"Specified partition_by '{raw_partition_by}' " - "is not parseable") - - partition_by = matches.group(1) - data_type = 'timestamp' - - else: - partition_by = raw_partition_by - data_type = 'date' - - inferred_partition_by = cls( - field=partition_by, - data_type=data_type - ) - - dbt.deprecations.warn( - 'bq-partition-by-string', - raw_partition_by=raw_partition_by, - inferred_partition_by=inferred_partition_by.to_dict() - ) - return inferred_partition_by - else: - return None - @classmethod def parse(cls, raw_partition_by) -> Optional['PartitionConfig']: + if raw_partition_by is None: + return None try: - return cls._parse(raw_partition_by) + return cls.from_dict(raw_partition_by) + except ValidationError as exc: + msg = dbt.exceptions.validator_error_message(exc) + dbt.exceptions.raise_compiler_error( + f'Could not parse partition config: {msg}' + ) except TypeError: dbt.exceptions.raise_compiler_error( f'Invalid partition_by config:\n' @@ -126,6 +77,15 @@ def _stub_relation(*args, **kwargs): ) +@dataclass +class BigqueryConfig(AdapterConfig): + cluster_by: Optional[Union[List[str], str]] = None + partition_by: Optional[Dict[str, Any]] = None + kms_key_name: Optional[str] = None + labels: Optional[Dict[str, str]] = None + partitions: Optional[List[str]] = None + + class BigQueryAdapter(BaseAdapter): RELATION_TYPES = { @@ -138,9 +98,7 @@ class BigQueryAdapter(BaseAdapter): Column = BigQueryColumn ConnectionManager = BigQueryConnectionManager - AdapterSpecificConfigs = frozenset({ - "cluster_by", "partition_by", "kms_key_name", "labels", "partitions" - }) + AdapterSpecificConfigs = BigqueryConfig ### # Implementations of abstract methods diff --git a/plugins/bigquery/dbt/include/bigquery/dbt_project.yml b/plugins/bigquery/dbt/include/bigquery/dbt_project.yml index edae5386994..b4e88b7b0a4 100644 --- a/plugins/bigquery/dbt/include/bigquery/dbt_project.yml +++ b/plugins/bigquery/dbt/include/bigquery/dbt_project.yml @@ -1,4 +1,4 @@ - +config-version: 2 name: dbt_bigquery version: 1.0 diff --git a/plugins/bigquery/dbt/include/bigquery/macros/adapters.sql b/plugins/bigquery/dbt/include/bigquery/macros/adapters.sql index a7530d43b2e..a318257e74b 100644 --- a/plugins/bigquery/dbt/include/bigquery/macros/adapters.sql +++ b/plugins/bigquery/dbt/include/bigquery/macros/adapters.sql @@ -61,7 +61,7 @@ {%- set raw_cluster_by = config.get('cluster_by', none) -%} {%- set raw_persist_docs = config.get('persist_docs', {}) -%} {%- set raw_kms_key_name = config.get('kms_key_name', none) -%} - {%- set raw_labels = config.get('labels', []) -%} + {%- set raw_labels = config.get('labels', {}) -%} {%- set sql_header = config.get('sql_header', none) -%} {%- set partition_config = adapter.parse_partition_by(raw_partition_by) -%} diff --git a/plugins/bigquery/dbt/include/bigquery/macros/catalog.sql b/plugins/bigquery/dbt/include/bigquery/macros/catalog.sql index d41b03604f4..ed64af88173 100644 --- a/plugins/bigquery/dbt/include/bigquery/macros/catalog.sql +++ b/plugins/bigquery/dbt/include/bigquery/macros/catalog.sql @@ -7,22 +7,7 @@ {%- else -%} {%- set query -%} - with schemas as ( - - select - catalog_name as table_database, - schema_name as table_schema, - location - - from {{ information_schema.replace(information_schema_view='SCHEMATA') }} - where ( - {%- for schema in schemas -%} - upper(schema_name) = upper('{{ schema }}'){%- if not loop.last %} or {% endif -%} - {%- endfor -%} - ) - ), - - tables as ( + with tables as ( select project_id as table_database, dataset_id as table_schema, @@ -43,7 +28,11 @@ REGEXP_EXTRACT(table_id, '^.+([0-9]{8})$') as shard_name from {{ information_schema.replace(information_schema_view='__TABLES__') }} - + where ( + {%- for schema in schemas -%} + upper(dataset_id) = upper('{{ schema }}'){%- if not loop.last %} or {% endif -%} + {%- endfor -%} + ) ), extracted as ( @@ -171,11 +160,6 @@ coalesce(columns.column_type, '') as column_type, columns.column_comment, - 'Location' as `stats__location__label`, - location as `stats__location__value`, - 'The geographic location of this table' as `stats__location__description`, - location is not null as `stats__location__include`, - 'Shard count' as `stats__date_shards__label`, table_shards.shard_count as `stats__date_shards__value`, 'The number of date shards in this table' as `stats__date_shards__description`, @@ -215,7 +199,6 @@ -- sure that column metadata is picked up through the join. This will only -- return the column information for the "max" table in a date-sharded table set from unsharded_tables - left join schemas using(table_database, table_schema) left join columns using (relation_id) left join column_stats using (relation_id) {%- endset -%} diff --git a/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql b/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql index 18a0c0bc350..6aabf8d7521 100644 --- a/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql +++ b/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql @@ -34,7 +34,7 @@ ) {%- endset -%} - {{ get_insert_overwrite_merge_sql(target_relation, source_sql, dest_columns, [predicate]) }} + {{ get_insert_overwrite_merge_sql(target_relation, source_sql, dest_columns, [predicate], include_sql_header=true) }} {% else %} {# dynamic #} @@ -66,8 +66,12 @@ from {{ tmp_relation }} ); + {# + TODO: include_sql_header is a hack; consider a better approach that includes + the sql_header at the materialization-level instead + #} -- 3. run the merge statement - {{ get_insert_overwrite_merge_sql(target_relation, source_sql, dest_columns, [predicate]) }}; + {{ get_insert_overwrite_merge_sql(target_relation, source_sql, dest_columns, [predicate], include_sql_header=false) }}; -- 4. clean up the temp table drop table if exists {{ tmp_relation }} diff --git a/plugins/bigquery/setup.py b/plugins/bigquery/setup.py index c5a9ad14c0f..41632b059bd 100644 --- a/plugins/bigquery/setup.py +++ b/plugins/bigquery/setup.py @@ -60,4 +60,5 @@ 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/plugins/postgres/dbt/adapters/postgres/impl.py b/plugins/postgres/dbt/adapters/postgres/impl.py index 9a8e897f31e..7f7fd1d78dd 100644 --- a/plugins/postgres/dbt/adapters/postgres/impl.py +++ b/plugins/postgres/dbt/adapters/postgres/impl.py @@ -1,4 +1,7 @@ +from dataclasses import dataclass +from typing import Optional from dbt.adapters.base.meta import available +from dbt.adapters.base.impl import AdapterConfig from dbt.adapters.sql import SQLAdapter from dbt.adapters.postgres import PostgresConnectionManager from dbt.adapters.postgres import PostgresColumn @@ -9,11 +12,16 @@ GET_RELATIONS_MACRO_NAME = 'postgres_get_relations' +@dataclass +class PostgresConfig(AdapterConfig): + unlogged: Optional[bool] = None + + class PostgresAdapter(SQLAdapter): ConnectionManager = PostgresConnectionManager Column = PostgresColumn - AdapterSpecificConfigs = frozenset({'unlogged'}) + AdapterSpecificConfigs = PostgresConfig @classmethod def date_function(cls): diff --git a/plugins/postgres/dbt/include/postgres/dbt_project.yml b/plugins/postgres/dbt/include/postgres/dbt_project.yml index 266eba33db9..081149f6fd7 100644 --- a/plugins/postgres/dbt/include/postgres/dbt_project.yml +++ b/plugins/postgres/dbt/include/postgres/dbt_project.yml @@ -1,4 +1,4 @@ - +config-version: 2 name: dbt_postgres version: 1.0 diff --git a/plugins/postgres/setup.py b/plugins/postgres/setup.py index c3716e31be8..38ab425646d 100644 --- a/plugins/postgres/setup.py +++ b/plugins/postgres/setup.py @@ -79,4 +79,5 @@ def _dbt_psycopg2_name(): 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/plugins/redshift/dbt/adapters/redshift/impl.py b/plugins/redshift/dbt/adapters/redshift/impl.py index 51deda501f8..da8588df3f2 100644 --- a/plugins/redshift/dbt/adapters/redshift/impl.py +++ b/plugins/redshift/dbt/adapters/redshift/impl.py @@ -1,14 +1,25 @@ +from dataclasses import dataclass +from typing import Optional +from dbt.adapters.base.impl import AdapterConfig from dbt.adapters.postgres import PostgresAdapter from dbt.adapters.redshift import RedshiftConnectionManager from dbt.adapters.redshift import RedshiftColumn from dbt.logger import GLOBAL_LOGGER as logger # noqa +@dataclass +class RedshiftConfig(AdapterConfig): + sort_type: Optional[str] = None + dist: Optional[str] = None + sort: Optional[str] = None + bind: Optional[bool] = None + + class RedshiftAdapter(PostgresAdapter): ConnectionManager = RedshiftConnectionManager Column = RedshiftColumn - AdapterSpecificConfigs = frozenset({"sort_type", "dist", "sort", "bind"}) + AdapterSpecificConfigs = RedshiftConfig @classmethod def date_function(cls): diff --git a/plugins/redshift/dbt/include/redshift/dbt_project.yml b/plugins/redshift/dbt/include/redshift/dbt_project.yml index edcd805ab7a..1efdab2c1b0 100644 --- a/plugins/redshift/dbt/include/redshift/dbt_project.yml +++ b/plugins/redshift/dbt/include/redshift/dbt_project.yml @@ -1,4 +1,4 @@ - +config-version: 2 name: dbt_redshift version: 1.0 diff --git a/plugins/redshift/setup.py b/plugins/redshift/setup.py index 6a0a0494028..e3788a944d0 100644 --- a/plugins/redshift/setup.py +++ b/plugins/redshift/setup.py @@ -59,4 +59,5 @@ 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/plugins/snowflake/dbt/adapters/snowflake/impl.py b/plugins/snowflake/dbt/adapters/snowflake/impl.py index 6c726eadc6c..614326bdf99 100644 --- a/plugins/snowflake/dbt/adapters/snowflake/impl.py +++ b/plugins/snowflake/dbt/adapters/snowflake/impl.py @@ -1,7 +1,9 @@ -from typing import Mapping, Any, Optional, List +from dataclasses import dataclass +from typing import Mapping, Any, Optional, List, Union import agate +from dbt.adapters.base.impl import AdapterConfig from dbt.adapters.sql import SQLAdapter from dbt.adapters.sql.impl import ( LIST_SCHEMAS_MACRO_NAME, @@ -15,15 +17,22 @@ from dbt.utils import filter_null_values +@dataclass +class SnowflakeConfig(AdapterConfig): + transient: Optional[bool] = None + cluster_by: Optional[Union[str, List[str]]] = None + automatic_clustering: Optional[bool] = None + secure: Optional[bool] = None + copy_grants: Optional[bool] = None + snowflake_warehouse: Optional[str] = None + + class SnowflakeAdapter(SQLAdapter): Relation = SnowflakeRelation Column = SnowflakeColumn ConnectionManager = SnowflakeConnectionManager - AdapterSpecificConfigs = frozenset( - {"transient", "cluster_by", "automatic_clustering", "secure", - "copy_grants", "snowflake_warehouse"} - ) + AdapterSpecificConfigs = SnowflakeConfig @classmethod def date_function(cls): diff --git a/plugins/snowflake/dbt/include/snowflake/dbt_project.yml b/plugins/snowflake/dbt/include/snowflake/dbt_project.yml index 587a22b5232..fcd2c9a4822 100644 --- a/plugins/snowflake/dbt/include/snowflake/dbt_project.yml +++ b/plugins/snowflake/dbt/include/snowflake/dbt_project.yml @@ -1,4 +1,4 @@ - +config-version: 2 name: dbt_snowflake version: 1.0 diff --git a/plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql b/plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql index e3a5d5cd085..0c48eb9493a 100644 --- a/plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql +++ b/plugins/snowflake/dbt/include/snowflake/macros/materializations/merge.sql @@ -7,9 +7,12 @@ #} {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute='name')) -%} + {%- set sql_header = config.get('sql_header', none) -%} {%- if unique_key is none -%} + {{ sql_header if sql_header is not none }} + insert into {{ target }} ({{ dest_cols_csv }}) ( select {{ dest_cols_csv }} diff --git a/plugins/snowflake/setup.py b/plugins/snowflake/setup.py index baeb82030d2..da80e804044 100644 --- a/plugins/snowflake/setup.py +++ b/plugins/snowflake/setup.py @@ -61,4 +61,5 @@ 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/setup.py b/setup.py index e92788f9888..33c7d4842b8 100644 --- a/setup.py +++ b/setup.py @@ -56,4 +56,5 @@ 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', ], + python_requires=">=3.6.2", ) diff --git a/test/integration/001_simple_copy_test/test_simple_copy.py b/test/integration/001_simple_copy_test/test_simple_copy.py index 4866eb79a6a..0705fbba93f 100644 --- a/test/integration/001_simple_copy_test/test_simple_copy.py +++ b/test/integration/001_simple_copy_test/test_simple_copy.py @@ -26,6 +26,7 @@ def project_config(self): def seed_quote_cfg_with(self, extra): cfg = { + 'config-version': 2, 'seeds': { 'quote_columns': False, } @@ -322,7 +323,9 @@ def test__snowflake__incremental_overwrite(self): # Setting the incremental_strategy should make this succeed self.use_default_project({ - "models": {"incremental_strategy": "delete+insert"}, + "models": { + "incremental_strategy": "delete+insert" + }, "data-paths": [self.dir("snowflake-seed-update")], }) @@ -388,7 +391,7 @@ def postgres_profile(self): @property def project_config(self): - return {} + return {'config-version': 2} @use_profile('postgres') def test_postgres_run_mixed_case(self): diff --git a/test/integration/003_simple_reference_test/test_simple_reference.py b/test/integration/003_simple_reference_test/test_simple_reference.py index b2a4aae3efc..9db7cc5d1ba 100644 --- a/test/integration/003_simple_reference_test/test_simple_reference.py +++ b/test/integration/003_simple_reference_test/test_simple_reference.py @@ -17,11 +17,12 @@ def models(self): @property def project_config(self): return { - 'models': { - 'vars': { + 'config-version': 2, + 'vars': { + 'test': { 'var_ref': '{{ ref("view_copy") }}', - } - } + }, + }, } def setUp(self): diff --git a/test/integration/004_simple_snapshot_test/test_simple_snapshot.py b/test/integration/004_simple_snapshot_test/test_simple_snapshot.py index 1b86de9e21a..290f38d9a44 100644 --- a/test/integration/004_simple_snapshot_test/test_simple_snapshot.py +++ b/test/integration/004_simple_snapshot_test/test_simple_snapshot.py @@ -41,6 +41,7 @@ class TestSimpleSnapshotFiles(BaseSimpleSnapshotTest): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-snapshots-pg'], 'macro-paths': ['macros'], @@ -108,6 +109,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['custom-snapshot-macros', 'macros'], 'snapshot-paths': ['test-snapshots-checkall'], @@ -172,6 +174,7 @@ class TestCustomSnapshotFiles(BaseSimpleSnapshotTest): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['custom-snapshot-macros', 'macros'], 'snapshot-paths': ['test-snapshots-pg-custom'], @@ -202,6 +205,7 @@ class TestNamespacedCustomSnapshotFiles(BaseSimpleSnapshotTest): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['custom-snapshot-macros', 'macros'], 'snapshot-paths': ['test-snapshots-pg-custom-namespaced'], @@ -226,6 +230,7 @@ class TestInvalidNamespacedCustomSnapshotFiles(BaseSimpleSnapshotTest): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['custom-snapshot-macros', 'macros'], 'snapshot-paths': ['test-snapshots-pg-custom-invalid'], @@ -251,6 +256,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-snapshots-select', 'test-snapshots-pg'], @@ -303,6 +309,7 @@ class TestConfiguredSnapshotFileSelects(TestSimpleSnapshotFileSelects): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-snapshots-select-noconfig'], "snapshots": { @@ -311,7 +318,7 @@ def project_config(self): "unique_key": "id || '-' || first_name", 'strategy': 'timestamp', 'updated_at': 'updated_at', - } + }, }, 'macro-paths': ['macros'], } @@ -329,6 +336,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['test-snapshots-bq'], 'macro-paths': ['macros'], } @@ -413,6 +421,7 @@ def project_config(self): else: paths = ['test-snapshots-bq'] return { + 'config-version': 2, 'snapshot-paths': paths, 'macro-paths': ['macros'], } @@ -468,6 +477,7 @@ def models(self): def project_config(self): paths = ['test-snapshots-pg'] return { + 'config-version': 2, 'snapshot-paths': paths, 'macro-paths': ['macros'], } @@ -501,6 +511,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['test-snapshots-invalid'], 'macro-paths': ['macros'], } @@ -530,6 +541,7 @@ def assert_expected(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-check-col-snapshots'], 'macro-paths': ['macros'], @@ -540,6 +552,7 @@ class TestConfiguredCheckCols(TestCheckCols): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-check-col-snapshots-noconfig'], "snapshots": { @@ -548,7 +561,7 @@ def project_config(self): "unique_key": "id || '-' || first_name", "strategy": "check", "check_cols": ["email"], - } + }, }, 'macro-paths': ['macros'], } @@ -569,6 +582,7 @@ def assert_expected(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], "snapshot-paths": ['test-check-col-snapshots-bq'], 'macro-paths': ['macros'], @@ -644,6 +658,7 @@ def run_snapshot(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['test-snapshots-longtext'], 'macro-paths': ['macros'], } diff --git a/test/integration/004_simple_snapshot_test/test_snapshot_check_cols.py b/test/integration/004_simple_snapshot_test/test_snapshot_check_cols.py index 0416ed97eff..fc07ae0df96 100644 --- a/test/integration/004_simple_snapshot_test/test_snapshot_check_cols.py +++ b/test/integration/004_simple_snapshot_test/test_snapshot_check_cols.py @@ -16,6 +16,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['check-snapshots'], "test-paths": ['check-snapshots-expected'], "source-paths": [], diff --git a/test/integration/005_simple_seed_test/test_seed_type_override.py b/test/integration/005_simple_seed_test/test_seed_type_override.py index a1d1b288820..a6cfd372f4a 100644 --- a/test/integration/005_simple_seed_test/test_seed_type_override.py +++ b/test/integration/005_simple_seed_test/test_seed_type_override.py @@ -10,22 +10,23 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data-config'], 'macro-paths': ['macros'], 'seeds': { 'test': { 'enabled': False, + 'quote_columns': True, 'seed_enabled': { 'enabled': True, - 'column_types': self.seed_enabled_types() + '+column_types': self.seed_enabled_types() }, 'seed_tricky': { 'enabled': True, - 'column_types': self.seed_tricky_types(), - } + '+column_types': self.seed_tricky_types(), + }, }, - 'quote_columns': True, - } + }, } diff --git a/test/integration/005_simple_seed_test/test_simple_seed.py b/test/integration/005_simple_seed_test/test_simple_seed.py index 65fe7287f67..f1c3cd67817 100644 --- a/test/integration/005_simple_seed_test/test_simple_seed.py +++ b/test/integration/005_simple_seed_test/test_simple_seed.py @@ -21,6 +21,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], 'seeds': { 'quote_columns': False, @@ -67,11 +68,12 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data'], 'seeds': { "schema": "custom_schema", 'quote_columns': False, - } + }, } @use_profile('postgres') @@ -115,6 +117,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data-config'], 'seeds': { "test": { @@ -126,7 +129,7 @@ def project_config(self): } }, 'quote_columns': False, - } + }, } @use_profile('postgres') @@ -169,10 +172,11 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data-bad'], 'seeds': { 'quote_columns': False, - } + }, } @use_profile('postgres') @@ -201,10 +205,11 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data-bom'], 'seeds': { 'quote_columns': False, - } + }, } @use_profile('postgres') @@ -232,6 +237,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": ['data-unicode'], 'seeds': { 'quote_columns': False, diff --git a/test/integration/006_simple_dependency_test/test_local_dependency.py b/test/integration/006_simple_dependency_test/test_local_dependency.py index 1290217c912..62a103bd71f 100644 --- a/test/integration/006_simple_dependency_test/test_local_dependency.py +++ b/test/integration/006_simple_dependency_test/test_local_dependency.py @@ -35,6 +35,11 @@ def packages_config(self): ] } + def run_dbt(self, *args, **kwargs): + strict = kwargs.pop('strict', False) + kwargs['strict'] = strict + return super().run_dbt(*args, **kwargs) + class TestSimpleDependency(BaseDependencyTest): @@ -76,7 +81,7 @@ def test_postgres_no_dependency_paths(self): # this should work local_path = os.path.join('local_models', 'my_model.sql') results = self.run_dbt( - ['run', '--models', f'+{local_path}'] + ['run', '--models', f'+{local_path}'], ) # should run the dependency and my_model self.assertEqual(len(results), 2) @@ -103,7 +108,7 @@ def models(self): def test_postgres_missing_dependency(self): # dbt should raise a dbt exception, not raise a parse-time TypeError. with self.assertRaises(dbt.exceptions.Exception) as exc: - self.run_dbt(['compile']) + self.run_dbt(['compile'], strict=False) message = str(exc.exception) self.assertIn('no_such_dependency', message) self.assertIn('is undefined', message) @@ -121,13 +126,14 @@ def run_dbt(self, cmd, *args, **kwargs): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['schema_override_macros'], 'models': { 'schema': 'dbt_test', }, 'seeds': { 'schema': 'dbt_test', - }, + } } def base_schema(self): @@ -173,6 +179,7 @@ def models(self): def project_config(self): # these hooks should run first, so nothing to drop return { + 'config-version': 2, 'on-run-start': [ "drop table if exists {{ var('test_create_table') }}", "drop table if exists {{ var('test_create_second_table') }}", @@ -229,6 +236,11 @@ def packages_config(self): ] } + def run_dbt(self, *args, **kwargs): + strict = kwargs.pop('strict', False) + kwargs['strict'] = strict + return super().run_dbt(*args, **kwargs) + @use_profile('postgres') def test_postgres_local_dependency_same_name(self): with self.assertRaises(dbt.exceptions.DependencyException): diff --git a/test/integration/006_simple_dependency_test/test_simple_dependency.py b/test/integration/006_simple_dependency_test/test_simple_dependency.py index a7e64821d28..cf525968593 100644 --- a/test/integration/006_simple_dependency_test/test_simple_dependency.py +++ b/test/integration/006_simple_dependency_test/test_simple_dependency.py @@ -30,6 +30,12 @@ def packages_config(self): ] } + def run_dbt(self, cmd=None, *args, **kwargs): + if cmd and cmd[0] != 'deps': + strict = kwargs.pop('strict', False) + kwargs['strict'] = strict + return super().run_dbt(cmd, *args, **kwargs) + def run_deps(self): return self.run_dbt(["deps"]) @@ -194,7 +200,7 @@ def packages_config(self): def deps_run_assert_equality(self): self.run_dbt(["deps"]) - results = self.run_dbt(["run"]) + results = self.run_dbt(["run"], strict=False) self.assertEqual(len(results), 4) self.assertTablesEqual("seed","table_model") diff --git a/test/integration/006_simple_dependency_test/test_simple_dependency_with_configs.py b/test/integration/006_simple_dependency_test/test_simple_dependency_with_configs.py index c9686b4165e..ca05c30109a 100644 --- a/test/integration/006_simple_dependency_test/test_simple_dependency_with_configs.py +++ b/test/integration/006_simple_dependency_test/test_simple_dependency_with_configs.py @@ -1,5 +1,6 @@ from test.integration.base import DBTIntegrationTest, use_profile + class BaseTestSimpleDependencyWithConfigs(DBTIntegrationTest): def setUp(self): @@ -14,6 +15,11 @@ def schema(self): def models(self): return "models" + def run_dbt(self, *args, **kwargs): + strict = kwargs.pop('strict', False) + kwargs['strict'] = strict + return super().run_dbt(*args, **kwargs) + class TestSimpleDependencyWithConfigs(BaseTestSimpleDependencyWithConfigs): @property def packages_config(self): @@ -29,13 +35,11 @@ def packages_config(self): @property def project_config(self): return { - "models": { - "dbt_integration_project": { - 'vars': { - 'bool_config': True - } - } - + 'config-version': 2, + 'vars': { + 'dbt_integration_project': { + 'bool_config': True + }, }, } @@ -67,21 +71,17 @@ def packages_config(self): @property def project_config(self): return { - "models": { + 'config-version': 2, + "vars": { # project-level configs "dbt_integration_project": { - "vars": { - "config_1": "abc", - "config_2": "def", - "bool_config": True - - } - } - + "config_1": "abc", + "config_2": "def", + "bool_config": True + }, }, } - @use_profile('postgres') def test_postgres_simple_dependency(self): self.run_dbt(["deps"]) @@ -94,7 +94,6 @@ def test_postgres_simple_dependency(self): self.assertTablesEqual("seed","incremental") - class TestSimpleDependencyWithModelSpecificOverriddenConfigs(BaseTestSimpleDependencyWithConfigs): @property @@ -110,7 +109,9 @@ def packages_config(self): @property def project_config(self): + # This feature doesn't exist in v2! return { + 'config-version': 1, "models": { "dbt_integration_project": { "config": { @@ -156,6 +157,7 @@ def packages_config(self): @property def project_config(self): return { + 'config-version': 1, "models": { "dbt_integration_project": { # disable config model, but supply vars @@ -181,7 +183,6 @@ def project_config(self): }, } - @use_profile('postgres') def test_postgres_simple_dependency(self): self.run_dbt(["deps"]) diff --git a/test/integration/007_graph_selection_tests/test_tag_selection.py b/test/integration/007_graph_selection_tests/test_tag_selection.py index d322d4032cc..1925e281aaa 100644 --- a/test/integration/007_graph_selection_tests/test_tag_selection.py +++ b/test/integration/007_graph_selection_tests/test_tag_selection.py @@ -1,5 +1,6 @@ from test.integration.base import DBTIntegrationTest, use_profile + class TestGraphSelection(DBTIntegrationTest): @property @@ -13,14 +14,14 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "models": { "test": { "users": { "tags": "specified_as_string" }, - "users_rollup": { - "tags": ["specified_in_project"] + "tags": ["specified_in_project"], } } } diff --git a/test/integration/009_data_tests_test/test_data_tests.py b/test/integration/009_data_tests_test/test_data_tests.py index 23e4271b905..c255c11932d 100644 --- a/test/integration/009_data_tests_test/test_data_tests.py +++ b/test/integration/009_data_tests_test/test_data_tests.py @@ -11,6 +11,7 @@ class TestDataTests(DBTIntegrationTest): @property def project_config(self): return { + 'config-version': 2, "test-paths": [self.test_path] } diff --git a/test/integration/011_invalid_model_tests/models-not-found/dependent.sql b/test/integration/011_invalid_model_tests/models-not-found/dependent.sql new file mode 100644 index 00000000000..3f871f8ffb6 --- /dev/null +++ b/test/integration/011_invalid_model_tests/models-not-found/dependent.sql @@ -0,0 +1,2 @@ +-- view does not exist +select * from {{ ref('view_model') }} diff --git a/test/integration/011_invalid_model_tests/sources-disabled/model.sql b/test/integration/011_invalid_model_tests/sources-disabled/model.sql new file mode 100644 index 00000000000..55bbcba67b4 --- /dev/null +++ b/test/integration/011_invalid_model_tests/sources-disabled/model.sql @@ -0,0 +1 @@ +select * from {{ source('test_source', 'test_table') }} diff --git a/test/integration/011_invalid_model_tests/sources-disabled/sources.yml b/test/integration/011_invalid_model_tests/sources-disabled/sources.yml new file mode 100644 index 00000000000..b6755f4bb2c --- /dev/null +++ b/test/integration/011_invalid_model_tests/sources-disabled/sources.yml @@ -0,0 +1,7 @@ +version: 2 +sources: + - name: test_source + schema: "{{ target.schema }}" + tables: + - name: test_table + identifier: seed diff --git a/test/integration/011_invalid_model_tests/sources-missing/model.sql b/test/integration/011_invalid_model_tests/sources-missing/model.sql new file mode 100644 index 00000000000..55bbcba67b4 --- /dev/null +++ b/test/integration/011_invalid_model_tests/sources-missing/model.sql @@ -0,0 +1 @@ +select * from {{ source('test_source', 'test_table') }} diff --git a/test/integration/011_invalid_model_tests/test_invalid_models.py b/test/integration/011_invalid_model_tests/test_invalid_models.py index cc1bc93404c..3170e6be509 100644 --- a/test/integration/011_invalid_model_tests/test_invalid_models.py +++ b/test/integration/011_invalid_model_tests/test_invalid_models.py @@ -25,7 +25,7 @@ def test_postgres_view_with_incremental_attributes(self): self.assertIn('enabled', str(exc.exception)) -class TestInvalidModelReference(DBTIntegrationTest): +class TestDisabledModelReference(DBTIntegrationTest): def setUp(self): DBTIntegrationTest.setUp(self) @@ -40,6 +40,23 @@ def schema(self): def models(self): return "models-3" + @use_profile('postgres') + def test_postgres_view_with_incremental_attributes(self): + with self.assertRaises(RuntimeError) as exc: + self.run_dbt() + + self.assertIn('which is disabled', str(exc.exception)) + + +class TestMissingModelReference(DBTIntegrationTest): + @property + def schema(self): + return "invalid_models_011" + + @property + def models(self): + return "models-not-found" + @use_profile('postgres') def test_postgres_view_with_incremental_attributes(self): with self.assertRaises(RuntimeError) as exc: @@ -64,6 +81,7 @@ def dir(path): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': [self.dir('bad-macros')], } @@ -72,9 +90,61 @@ def test_postgres_call_invalid(self): with self.assertRaises(Exception) as exc: self.run_dbt(['compile']) - macro_path = os.path.join('bad-macros', 'macros.sql') model_path = os.path.join('models-4', 'bad_macro.sql') self.assertIn(f'> in macro some_macro ({macro_path})', str(exc.exception)) self.assertIn(f'> called by model bad_macro ({model_path})', str(exc.exception)) + + +class TestInvalidDisabledSource(DBTIntegrationTest): + def setUp(self): + super().setUp() + self.run_sql_file("seed.sql") + + @property + def schema(self): + return "invalid_models_011" + + @property + def models(self): + return 'sources-disabled' + + @property + def project_config(self): + return { + 'config-version': 2, + 'sources': { + 'test': { + 'enabled': False, + } + } + } + + @use_profile('postgres') + def test_postgres_source_disabled(self): + with self.assertRaises(RuntimeError) as exc: + self.run_dbt() + + self.assertIn('which is disabled', str(exc.exception)) + + +class TestInvalidMissingSource(DBTIntegrationTest): + def setUp(self): + super().setUp() + self.run_sql_file("seed.sql") + + @property + def schema(self): + return "invalid_models_011" + + @property + def models(self): + return 'sources-missing' + + @use_profile('postgres') + def test_postgres_source_missing(self): + with self.assertRaises(RuntimeError) as exc: + self.run_dbt() + + self.assertIn('which was not found', str(exc.exception)) diff --git a/test/integration/012_deprecation_tests/bq-partitioned-models/clustered_model.sql b/test/integration/012_deprecation_tests/bq-partitioned-models/clustered_model.sql deleted file mode 100644 index e2caada291a..00000000000 --- a/test/integration/012_deprecation_tests/bq-partitioned-models/clustered_model.sql +++ /dev/null @@ -1,16 +0,0 @@ - -{{ - config( - materialized = "table", - partition_by = "updated_at_date", - cluster_by = "dupe", - ) -}} - -select - - id, - current_date as updated_at_date, - dupe - -from {{ ref('data_seed') }} diff --git a/test/integration/012_deprecation_tests/bq-partitioned-models/multi_clustered_model.sql b/test/integration/012_deprecation_tests/bq-partitioned-models/multi_clustered_model.sql deleted file mode 100644 index 5cbd012a272..00000000000 --- a/test/integration/012_deprecation_tests/bq-partitioned-models/multi_clustered_model.sql +++ /dev/null @@ -1,16 +0,0 @@ - -{{ - config( - materialized = "table", - partition_by = "updated_at_date", - cluster_by = ["dupe","id"], - ) -}} - -select - - id, - current_date as updated_at_date, - dupe - -from {{ ref('data_seed') }} diff --git a/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_model.sql b/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_model.sql deleted file mode 100644 index 12087fac76a..00000000000 --- a/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_model.sql +++ /dev/null @@ -1,15 +0,0 @@ - -{{ - config( - materialized = "table", - partition_by = "updated_at_date", - ) -}} - -select - - id, - current_date as updated_at_date, - dupe - -from {{ ref('data_seed') }} diff --git a/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_ts_model.sql b/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_ts_model.sql deleted file mode 100644 index 5696b231b9a..00000000000 --- a/test/integration/012_deprecation_tests/bq-partitioned-models/partitioned_ts_model.sql +++ /dev/null @@ -1,15 +0,0 @@ - -{{ - config( - materialized = "table", - partition_by = "date(updated_at_ts)", - ) -}} - -select - - id, - current_timestamp as updated_at_ts, - dupe - -from {{ ref('data_seed') }} diff --git a/test/integration/012_deprecation_tests/test_deprecations.py b/test/integration/012_deprecation_tests/test_deprecations.py index b6d21c9cc45..e11e9866922 100644 --- a/test/integration/012_deprecation_tests/test_deprecations.py +++ b/test/integration/012_deprecation_tests/test_deprecations.py @@ -17,12 +17,12 @@ def schema(self): def dir(path): return path.lstrip("/") + +class TestDeprecations(BaseTestDeprecations): @property def models(self): return self.dir("models") - -class TestDeprecations(BaseTestDeprecations): @use_profile('postgres') def test_postgres_deprecations_fail(self): self.run_dbt(strict=True, expect_pass=False) @@ -43,6 +43,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': [self.dir('custom-materialization-macros')], } @@ -77,22 +78,30 @@ def test_postgres_deprecations(self): self.assertEqual(deprecations.active_deprecations, set()) self.run_dbt(strict=False) expected = {'models-key-mismatch'} + self.assertEqual(expected, deprecations.active_deprecations) -class TestBQPartitionByDeprecation(BaseTestDeprecations): +class TestDbtProjectYamlV1Deprecation(BaseTestDeprecations): @property def models(self): - return self.dir('bq-partitioned-models') - - @use_profile('bigquery') - def test_bigquery_partition_by_fail(self): - self.run_dbt(['seed']) - self.run_dbt(strict=True, expect_pass=False) + return 'boring-models' + + @property + def project_config(self): + # No config-version set! + return {} + + @use_profile('postgres') + def test_postgres_project_deprecations_fail(self): + with self.assertRaises(dbt.exceptions.CompilationException) as exc: + self.run_dbt(strict=True) + + exc_str = ' '.join(str(exc.exception).split()) # flatten all whitespace + self.assertIn('Support for the existing version 1 format will be removed', exc_str) - @use_profile('bigquery') - def test_bigquery_partition_by(self): - self.run_dbt(['seed']) + @use_profile('postgres') + def test_postgres_project_deprecations(self): self.assertEqual(deprecations.active_deprecations, set()) self.run_dbt(strict=False) - expected = {'bq-partition-by-string'} + expected = {'dbt-project-yaml-v1'} self.assertEqual(expected, deprecations.active_deprecations) diff --git a/test/integration/013_context_var_tests/test_context_members.py b/test/integration/013_context_var_tests/test_context_members.py index 8a6e9a27127..502f8afd801 100644 --- a/test/integration/013_context_var_tests/test_context_members.py +++ b/test/integration/013_context_var_tests/test_context_members.py @@ -12,7 +12,10 @@ def models(self): @property def project_config(self): - return {'test-paths': ['tests']} + return { + 'config-version': 2, + 'test-paths': ['tests'], + } @use_profile('postgres') def test_json_data_tests_postgres(self): diff --git a/test/integration/014_hook_tests/test_model_hooks.py b/test/integration/014_hook_tests/test_model_hooks.py index 5e892544d33..9adb74a65ac 100644 --- a/test/integration/014_hook_tests/test_model_hooks.py +++ b/test/integration/014_hook_tests/test_model_hooks.py @@ -120,6 +120,7 @@ class TestPrePostModelHooks(BaseTestPrePost): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], 'models': { 'test': { @@ -130,14 +131,13 @@ def project_config(self): # outside transaction (runs first) {"sql": "vacuum {{ this.schema }}.on_model_hook", "transaction": False}, ], - 'post-hook':[ # outside transaction (runs second) {"sql": "vacuum {{ this.schema }}.on_model_hook", "transaction": False}, # inside transaction (runs first) MODEL_POST_HOOK, - ] + ], } } } @@ -158,6 +158,7 @@ class TestHookRefs(BaseTestPrePost): @property def project_config(self): return { + 'config-version': 2, 'models': { 'test': { 'hooked': { @@ -205,6 +206,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'models': {}, 'seeds': { @@ -236,6 +238,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'snapshot-paths': ['test-snapshots'], 'models': {}, @@ -281,23 +284,21 @@ def test_postgres_pre_and_post_model_hooks_model(self): @use_profile('postgres') def test_postgres_pre_and_post_model_hooks_model_and_project(self): self.use_default_project({ + 'config-version': 2, 'models': { 'test': { 'pre-hook': [ # inside transaction (runs second) MODEL_PRE_HOOK, - # outside transaction (runs first) {"sql": "vacuum {{ this.schema }}.on_model_hook", "transaction": False}, ], - 'post-hook':[ # outside transaction (runs second) {"sql": "vacuum {{ this.schema }}.on_model_hook", "transaction": False}, - # inside transaction (runs first) MODEL_POST_HOOK, - ] + ], } } }) @@ -321,6 +322,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'snapshot-paths': ['test-kwargs-snapshots'], 'models': {}, diff --git a/test/integration/014_hook_tests/test_model_hooks_bq.py b/test/integration/014_hook_tests/test_model_hooks_bq.py index fbb604f04f9..581b48872af 100644 --- a/test/integration/014_hook_tests/test_model_hooks_bq.py +++ b/test/integration/014_hook_tests/test_model_hooks_bq.py @@ -69,12 +69,12 @@ def profile_config(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], 'models': { 'test': { 'pre-hook': [MODEL_PRE_HOOK], - - 'post-hook':[MODEL_POST_HOOK] + 'post-hook':[MODEL_POST_HOOK], } } } @@ -125,6 +125,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'models': {}, 'seeds': { diff --git a/test/integration/014_hook_tests/test_run_hooks.py b/test/integration/014_hook_tests/test_run_hooks.py index d39dc303ecc..e613e0644d3 100644 --- a/test/integration/014_hook_tests/test_run_hooks.py +++ b/test/integration/014_hook_tests/test_run_hooks.py @@ -30,6 +30,7 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], 'data-paths': ['data'], diff --git a/test/integration/014_hook_tests/test_run_hooks_bq.py b/test/integration/014_hook_tests/test_run_hooks_bq.py index d88834d7bc5..e0edbde1af4 100644 --- a/test/integration/014_hook_tests/test_run_hooks_bq.py +++ b/test/integration/014_hook_tests/test_run_hooks_bq.py @@ -32,6 +32,7 @@ def profile_config(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], 'data-paths': ['data'], diff --git a/test/integration/015_cli_invocation_tests/test_cli_invocation.py b/test/integration/015_cli_invocation_tests/test_cli_invocation.py index c8a0c0d71b2..4465ca3aac6 100644 --- a/test/integration/015_cli_invocation_tests/test_cli_invocation.py +++ b/test/integration/015_cli_invocation_tests/test_cli_invocation.py @@ -1,6 +1,7 @@ from test.integration.base import DBTIntegrationTest, use_profile import os import shutil +import tempfile import yaml @@ -111,3 +112,32 @@ def test_postgres_toplevel_dbt_run_with_profile_dir_arg(self): # make sure the test runs against `custom_schema` for test_result in res: self.assertTrue(self.custom_schema, test_result.node.injected_sql) + + +class TestCLIInvocationWithProjectDir(ModelCopyingIntegrationTest): + + @property + def schema(self): + return "test_cli_invocation_015" + + @property + def models(self): + return "models" + + @use_profile('postgres') + def test_postgres_dbt_commands_with_cwd_as_project_dir(self): + self._run_simple_dbt_commands(os.getcwd()) + + @use_profile('postgres') + def test_postgres_dbt_commands_with_randomdir_as_project_dir(self): + workdir = os.getcwd() + with tempfile.TemporaryDirectory() as tmpdir: + os.chdir(tmpdir) + self._run_simple_dbt_commands(workdir) + os.chdir(workdir) + + def _run_simple_dbt_commands(self, project_dir): + self.run_dbt(['deps', '--project-dir', project_dir]) + self.run_dbt(['seed', '--project-dir', project_dir]) + self.run_dbt(['run', '--project-dir', project_dir]) + self.run_dbt(['test', '--project-dir', project_dir]) diff --git a/test/integration/016_macro_tests/test_macros.py b/test/integration/016_macro_tests/test_macros.py index 9f64d1829fe..37a4b9534ee 100644 --- a/test/integration/016_macro_tests/test_macros.py +++ b/test/integration/016_macro_tests/test_macros.py @@ -26,10 +26,11 @@ def packages_config(self): @property def project_config(self): return { - "models": { - "vars": { - "test": "DUMMY" - } + 'config-version': 2, + 'vars': { + 'test': { + 'test': 'DUMMY', + }, }, "macro-paths": ["macros"], } @@ -60,6 +61,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ["bad-macros"] } @@ -99,6 +101,7 @@ def packages_config(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ["macros"], } diff --git a/test/integration/017_runtime_materialization_tests/test_runtime_materialization.py b/test/integration/017_runtime_materialization_tests/test_runtime_materialization.py index eae875be394..2e3ba9c1af4 100644 --- a/test/integration/017_runtime_materialization_tests/test_runtime_materialization.py +++ b/test/integration/017_runtime_materialization_tests/test_runtime_materialization.py @@ -10,6 +10,7 @@ def setUp(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'seeds': { 'quote_columns': False, diff --git a/test/integration/019_analysis_tests/test_analyses.py b/test/integration/019_analysis_tests/test_analyses.py index aacabf3bef5..e21e3c5da07 100644 --- a/test/integration/019_analysis_tests/test_analyses.py +++ b/test/integration/019_analysis_tests/test_analyses.py @@ -18,6 +18,7 @@ def analysis_path(self): @property def project_config(self): return { + "config-version": 2, "analysis-paths": [self.analysis_path()] } diff --git a/test/integration/022_bigquery_test/models/sql_header_model_incr.sql b/test/integration/022_bigquery_test/models/sql_header_model_incr.sql new file mode 100644 index 00000000000..f93280a3bfd --- /dev/null +++ b/test/integration/022_bigquery_test/models/sql_header_model_incr.sql @@ -0,0 +1,15 @@ + +{{ config(materialized="incremental") }} + +{# This will fail if it is not extracted correctly #} +{% call set_sql_header(config) %} + CREATE TEMPORARY FUNCTION a_to_b(str STRING) + RETURNS STRING AS ( + CASE + WHEN LOWER(str) = 'a' THEN 'b' + ELSE str + END + ); +{% endcall %} + +select a_to_b(dupe) as dupe from {{ ref('view_model') }} diff --git a/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite.sql b/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite.sql new file mode 100644 index 00000000000..467a0d8d7b6 --- /dev/null +++ b/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite.sql @@ -0,0 +1,31 @@ + +{# + Ensure that the insert overwrite incremental strategy + works correctly when a UDF is used in a sql_header. The + failure mode here is that dbt might inject the UDF header + twice: once for the `create table` and then again for the + merge statement. +#} + +{{ config( + materialized="incremental", + incremental_strategy='insert_overwrite', + partition_by={"field": "dt", "data_type": "date"} +) }} + +{# This will fail if it is not extracted correctly #} +{% call set_sql_header(config) %} + CREATE TEMPORARY FUNCTION a_to_b(str STRING) + RETURNS STRING AS ( + CASE + WHEN LOWER(str) = 'a' THEN 'b' + ELSE str + END + ); +{% endcall %} + +select + current_date() as dt, + a_to_b(dupe) as dupe + +from {{ ref('view_model') }} diff --git a/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite_static.sql b/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite_static.sql new file mode 100644 index 00000000000..4d760a0fd96 --- /dev/null +++ b/test/integration/022_bigquery_test/models/sql_header_model_incr_insert_overwrite_static.sql @@ -0,0 +1,32 @@ + +{# + Ensure that the insert overwrite incremental strategy + works correctly when a UDF is used in a sql_header. The + failure mode here is that dbt might inject the UDF header + twice: once for the `create table` and then again for the + merge statement. +#} + +{{ config( + materialized="incremental", + incremental_strategy='insert_overwrite', + partition_by={"field": "dt", "data_type": "date"}, + partitions=["'2020-01-1'"] +) }} + +{# This will fail if it is not extracted correctly #} +{% call set_sql_header(config) %} + CREATE TEMPORARY FUNCTION a_to_b(str STRING) + RETURNS STRING AS ( + CASE + WHEN LOWER(str) = 'a' THEN 'b' + ELSE str + END + ); +{% endcall %} + +select + cast('2020-01-01' as date) as dt, + a_to_b(dupe) as dupe + +from {{ ref('view_model') }} diff --git a/test/integration/022_bigquery_test/test_bigquery_changing_partitions.py b/test/integration/022_bigquery_test/test_bigquery_changing_partitions.py index 27547c54a2e..e9f917bc56c 100644 --- a/test/integration/022_bigquery_test/test_bigquery_changing_partitions.py +++ b/test/integration/022_bigquery_test/test_bigquery_changing_partitions.py @@ -12,33 +12,33 @@ def schema(self): def models(self): return "partition-models" - def run_changes(self, before, after, strict=False): - # strict needs to be off because these tests use legacy partition_by clauses - results = self.run_dbt(['run', '--vars', json.dumps(before)], strict=strict) + def run_changes(self, before, after): + results = self.run_dbt(['run', '--vars', json.dumps(before)]) self.assertEqual(len(results), 1) - results = self.run_dbt(['run', '--vars', json.dumps(after)], strict=strict) + results = self.run_dbt(['run', '--vars', json.dumps(after)]) self.assertEqual(len(results), 1) + @use_profile('bigquery') def test_bigquery_add_partition(self): before = {"partition_by": None, "cluster_by": None} - after = {"partition_by": "date(cur_time)", "cluster_by": None} + after = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": None} self.run_changes(before, after) @use_profile('bigquery') def test_bigquery_remove_partition(self): - before = {"partition_by": "date(cur_time)", "cluster_by": None} + before = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": None} after = {"partition_by": None, "cluster_by": None} self.run_changes(before, after) @use_profile('bigquery') def test_bigquery_change_partitions(self): - before = {"partition_by": "date(cur_time)", "cluster_by": None} - after = {"partition_by": "cur_date", "cluster_by": None} + before = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": None} + after = {"partition_by": {'field': "cur_date"}, "cluster_by": None} self.run_changes(before, after) self.run_changes(after, before) - + @use_profile('bigquery') def test_bigquery_change_partitions_from_int(self): before = {"partition_by": {"field": "id", "data_type": "int64", "range": {"start": 0, "end": 10, "interval": 1}}, "cluster_by": None} @@ -48,24 +48,24 @@ def test_bigquery_change_partitions_from_int(self): @use_profile('bigquery') def test_bigquery_add_clustering(self): - before = {"partition_by": "date(cur_time)", "cluster_by": None} - after = {"partition_by": "cur_date", "cluster_by": "id"} + before = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": None} + after = {"partition_by": {'field': "cur_date"}, "cluster_by": "id"} self.run_changes(before, after) @use_profile('bigquery') def test_bigquery_remove_clustering(self): - before = {"partition_by": "date(cur_time)", "cluster_by": "id"} - after = {"partition_by": "cur_date", "cluster_by": None} + before = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": "id"} + after = {"partition_by": {'field': "cur_date"}, "cluster_by": None} self.run_changes(before, after) @use_profile('bigquery') def test_bigquery_change_clustering(self): - before = {"partition_by": "date(cur_time)", "cluster_by": "id"} - after = {"partition_by": "cur_date", "cluster_by": "name"} + before = {"partition_by": {'field': 'cur_time', 'data_type': 'timestamp'}, "cluster_by": "id"} + after = {"partition_by": {'field': "cur_date"}, "cluster_by": "name"} self.run_changes(before, after) @use_profile('bigquery') def test_bigquery_change_clustering_strict(self): before = {'partition_by': {'field': 'cur_time', 'data_type': 'timestamp'}, 'cluster_by': 'id'} after = {'partition_by': {'field': 'cur_date', 'data_type': 'date'}, 'cluster_by': 'name'} - self.run_changes(before, after, strict=True) + self.run_changes(before, after) diff --git a/test/integration/022_bigquery_test/test_bigquery_date_partitioning.py b/test/integration/022_bigquery_test/test_bigquery_date_partitioning.py index fa0fadd1dc0..ae7781ef1a8 100644 --- a/test/integration/022_bigquery_test/test_bigquery_date_partitioning.py +++ b/test/integration/022_bigquery_test/test_bigquery_date_partitioning.py @@ -1,4 +1,4 @@ -from test.integration.base import DBTIntegrationTest, FakeArgs, use_profile +from test.integration.base import DBTIntegrationTest, use_profile import textwrap import yaml @@ -20,6 +20,7 @@ def profile_config(self): @property def project_config(self): return yaml.safe_load(textwrap.dedent('''\ + config-version: 2 models: test: partitioned_noconfig: diff --git a/test/integration/022_bigquery_test/test_bigquery_query_results.py b/test/integration/022_bigquery_test/test_bigquery_query_results.py index 55dc670f6ee..c45ca275faf 100644 --- a/test/integration/022_bigquery_test/test_bigquery_query_results.py +++ b/test/integration/022_bigquery_test/test_bigquery_query_results.py @@ -14,6 +14,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], } diff --git a/test/integration/022_bigquery_test/test_bigquery_repeated_records.py b/test/integration/022_bigquery_test/test_bigquery_repeated_records.py index 17c529291b0..a631cc2deab 100644 --- a/test/integration/022_bigquery_test/test_bigquery_repeated_records.py +++ b/test/integration/022_bigquery_test/test_bigquery_repeated_records.py @@ -14,6 +14,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], } @@ -26,8 +27,8 @@ def test__bigquery_fetch_nested_records(self): cast('Stonebreaker' as string) as lname ) as user, [ - struct(1 as val_1, 2 as val_2), - struct(3 as val_1, 4 as val_2) + struct(1 as val_1, cast(2.12 as numeric) as val_2), + struct(3 as val_1, cast(4.83 as numeric) as val_2) ] as val union all @@ -38,8 +39,8 @@ def test__bigquery_fetch_nested_records(self): cast('Brickmaker' as string) as lname ) as user, [ - struct(7 as val_1, 8 as val_2), - struct(9 as val_1, 0 as val_2) + struct(7 as val_1, cast(8 as numeric) as val_2), + struct(9 as val_1, cast(null as numeric) as val_2) ] as val """ @@ -54,8 +55,8 @@ def test__bigquery_fetch_nested_records(self): '{"fname": "Johnny", "lname": "Brickmaker"}' ], "val": [ - '[{"val_1": 1, "val_2": 2}, {"val_1": 3, "val_2": 4}]', - '[{"val_1": 7, "val_2": 8}, {"val_1": 9, "val_2": 0}]' + '[{"val_1": 1, "val_2": 2.12}, {"val_1": 3, "val_2": 4.83}]', + '[{"val_1": 7, "val_2": 8}, {"val_1": 9, "val_2": null}]' ] } diff --git a/test/integration/022_bigquery_test/test_simple_bigquery_view.py b/test/integration/022_bigquery_test/test_simple_bigquery_view.py index e25704ba1b4..690955862a3 100644 --- a/test/integration/022_bigquery_test/test_simple_bigquery_view.py +++ b/test/integration/022_bigquery_test/test_simple_bigquery_view.py @@ -1,7 +1,8 @@ -from test.integration.base import DBTIntegrationTest, FakeArgs, use_profile +from test.integration.base import DBTIntegrationTest, use_profile import random import time + class TestBaseBigQueryRun(DBTIntegrationTest): @property @@ -15,11 +16,12 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['macros'], 'seeds': { 'quote_columns': False, - } + }, } @property @@ -53,7 +55,7 @@ def test__bigquery_simple_run(self): self.run_dbt(['seed', '--full-refresh']) results = self.run_dbt() # Bump expected number of results when adding new model - self.assertEqual(len(results), 8) + self.assertEqual(len(results), 11) self.assert_nondupes_pass() @@ -64,7 +66,7 @@ class TestUnderscoreBigQueryRun(TestBaseBigQueryRun): def test_bigquery_run_twice(self): self.run_dbt(['seed']) results = self.run_dbt() - self.assertEqual(len(results), 8) + self.assertEqual(len(results), 11) results = self.run_dbt() - self.assertEqual(len(results), 8) + self.assertEqual(len(results), 11) self.assert_nondupes_pass() diff --git a/test/integration/023_exit_codes_test/test_exit_codes.py b/test/integration/023_exit_codes_test/test_exit_codes.py index 10f70e828f8..4186a32b0e7 100644 --- a/test/integration/023_exit_codes_test/test_exit_codes.py +++ b/test/integration/023_exit_codes_test/test_exit_codes.py @@ -16,6 +16,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['snapshots-good'], } @@ -79,6 +80,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['snapshots-bad'], } @@ -156,10 +158,11 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data-good'], 'seeds': { 'quote_columns': False, - } + }, } @use_profile('postgres') @@ -181,10 +184,11 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data-bad'], 'seeds': { 'quote_columns': False, - } + }, } @use_profile('postgres') diff --git a/test/integration/024_custom_schema_test/test_custom_database.py b/test/integration/024_custom_schema_test/test_custom_database.py index 23a80f81e5f..477d0a1e3a5 100644 --- a/test/integration/024_custom_schema_test/test_custom_database.py +++ b/test/integration/024_custom_schema_test/test_custom_database.py @@ -15,6 +15,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['custom-db-macros'], } diff --git a/test/integration/024_custom_schema_test/test_custom_schema.py b/test/integration/024_custom_schema_test/test_custom_schema.py index 9feda281f6a..065c2fb84d6 100644 --- a/test/integration/024_custom_schema_test/test_custom_schema.py +++ b/test/integration/024_custom_schema_test/test_custom_schema.py @@ -60,9 +60,10 @@ def profile_config(self): @property def project_config(self): return { + 'config-version': 2, "models": { "schema": "dbt_test" - } + }, } @use_profile('postgres') @@ -95,6 +96,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "models": { "schema": "dbt_test" } @@ -150,9 +152,10 @@ def profile_config(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['custom-macros'], 'models': { - 'schema': 'dbt_test' + 'schema': 'dbt_test', } } @@ -178,10 +181,11 @@ class TestCustomSchemaWithCustomMacroConfigs(TestCustomSchemaWithCustomMacro): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['custom-macros-configs'], 'models': { 'schema': 'dbt_test' - } + }, } @use_profile('postgres') diff --git a/test/integration/025_duplicate_model_test/local_dependency/dbt_project.yml b/test/integration/025_duplicate_model_test/local_dependency/dbt_project.yml new file mode 100644 index 00000000000..64bd0728d3f --- /dev/null +++ b/test/integration/025_duplicate_model_test/local_dependency/dbt_project.yml @@ -0,0 +1,10 @@ +name: 'local_dep' +version: '1.0' +config-version: 2 + +profile: 'default' + +source-paths: ["models"] + +seeds: + quote_columns: False diff --git a/test/integration/025_duplicate_model_test/local_dependency/models/table_model.sql b/test/integration/025_duplicate_model_test/local_dependency/models/table_model.sql new file mode 100644 index 00000000000..43258a71464 --- /dev/null +++ b/test/integration/025_duplicate_model_test/local_dependency/models/table_model.sql @@ -0,0 +1 @@ +select 1 as id diff --git a/test/integration/025_duplicate_model_test/test_duplicate_macro.py b/test/integration/025_duplicate_model_test/test_duplicate_macro.py index aa927d2b83c..689ef4e8e63 100644 --- a/test/integration/025_duplicate_model_test/test_duplicate_macro.py +++ b/test/integration/025_duplicate_model_test/test_duplicate_macro.py @@ -16,6 +16,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros-bad-same'] } @@ -40,6 +41,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros-bad-separate'] } diff --git a/test/integration/025_duplicate_model_test/test_duplicate_model.py b/test/integration/025_duplicate_model_test/test_duplicate_model.py index a000973f7d5..42dad92dfc0 100644 --- a/test/integration/025_duplicate_model_test/test_duplicate_model.py +++ b/test/integration/025_duplicate_model_test/test_duplicate_model.py @@ -111,11 +111,9 @@ def packages_config(self): return { "packages": [ { - 'git': 'https://github.com/fishtown-analytics/dbt-integration-project', - 'revision': 'master', - 'warn-unpinned': False, - }, - ], + 'local': 'local_dependency' + } + ] } @use_profile("postgres") @@ -148,11 +146,9 @@ def packages_config(self): return { "packages": [ { - 'git': 'https://github.com/fishtown-analytics/dbt-integration-project', - 'revision': 'master', - 'warn-unpinned': False, - }, - ], + 'local': 'local_dependency' + } + ] } @use_profile("postgres") @@ -181,7 +177,10 @@ def models(self): @property def project_config(self): - return {'test-paths': [self.models]} + return { + 'config-version': 2, + 'test-paths': [self.models], + } @use_profile('postgres') def test_postgres_duplicate_test_model_paths(self): diff --git a/test/integration/026_aliases_test/test_aliases.py b/test/integration/026_aliases_test/test_aliases.py index 1db2be51d3a..0eb06e8588b 100644 --- a/test/integration/026_aliases_test/test_aliases.py +++ b/test/integration/026_aliases_test/test_aliases.py @@ -13,15 +13,16 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros'], "models": { "test": { "alias_in_project": { - "alias" : 'project_alias' + "alias": 'project_alias', }, "alias_in_project_with_override": { - "alias" : 'project_alias' - } + "alias": 'project_alias', + }, } } } @@ -57,6 +58,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros'], } @@ -79,6 +81,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros'], } diff --git a/test/integration/028_cli_vars/test_cli_var_override.py b/test/integration/028_cli_vars/test_cli_var_override.py index f8374f698e3..af3fe41d891 100644 --- a/test/integration/028_cli_vars/test_cli_var_override.py +++ b/test/integration/028_cli_vars/test_cli_var_override.py @@ -1,5 +1,4 @@ from test.integration.base import DBTIntegrationTest, use_profile -import yaml class TestCLIVarOverride(DBTIntegrationTest): @@ -14,11 +13,10 @@ def models(self): @property def project_config(self): return { - "models": { - "vars": { - "required": "present" - } - } + 'config-version': 2, + 'vars': { + 'required': 'present', + }, } @use_profile('postgres') @@ -43,13 +41,12 @@ def models(self): @property def project_config(self): return { - "models": { - "test": { - "vars": { - "required": "present" - } - } - } + 'config-version': 2, + 'vars': { + 'test': { + 'required': 'present', + }, + }, } @use_profile('postgres') diff --git a/test/integration/029_docs_generate_tests/test_docs_generate.py b/test/integration/029_docs_generate_tests/test_docs_generate.py index 19650d55dda..04521314553 100644 --- a/test/integration/029_docs_generate_tests/test_docs_generate.py +++ b/test/integration/029_docs_generate_tests/test_docs_generate.py @@ -121,6 +121,7 @@ def packages_config(self): @property def project_config(self): return { + 'config-version': 2, 'quoting': { 'identifier': False } @@ -132,8 +133,8 @@ def run_and_generate(self, extra=None, seed_count=1, model_count=1, alternate_db project = { "data-paths": [self.dir("seed")], 'macro-paths': [self.dir('macros')], - 'models': { - 'vars': {'alternate_db': alternate_db}, + 'vars': { + 'alternate_db': alternate_db, }, 'seeds': { 'quote_columns': True, @@ -252,22 +253,8 @@ def _snowflake_stats(self): } def _bigquery_stats(self, is_table, partition=None, cluster=None): - stats = { - 'has_stats': { - 'id': 'has_stats', - 'label': 'Has Stats?', - 'value': True, - 'description': 'Indicates whether there are statistics for this table', - 'include': False, - }, - 'location': { - 'id': 'location', - 'label': 'Location', - 'value': 'US', - 'description': 'The geographic location of this table', - 'include': True, - } - } + stats = {} + if is_table: stats.update({ 'num_bytes': { @@ -308,6 +295,15 @@ def _bigquery_stats(self, is_table, partition=None, cluster=None): } }) + has_stats = { + 'id': 'has_stats', + 'label': 'Has Stats?', + 'value': bool(stats), + 'description': 'Indicates whether there are statistics for this table', + 'include': False, + } + stats['has_stats'] = has_stats + return stats def _expected_catalog(self, id_type, text_type, time_type, view_type, @@ -357,32 +353,35 @@ def _expected_catalog(self, id_type, text_type, time_type, view_type, }, } return { - 'model.test.model': { - 'unique_id': 'model.test.model', - 'metadata': { - 'schema': my_schema_name, - 'database': model_database, - 'name': case('model'), - 'type': view_type, - 'comment': None, - 'owner': role, + 'nodes': { + 'model.test.model': { + 'unique_id': 'model.test.model', + 'metadata': { + 'schema': my_schema_name, + 'database': model_database, + 'name': case('model'), + 'type': view_type, + 'comment': None, + 'owner': role, + }, + 'stats': model_stats, + 'columns': expected_cols, }, - 'stats': model_stats, - 'columns': expected_cols, - }, - 'seed.test.seed': { - 'unique_id': 'seed.test.seed', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': case('seed'), - 'type': table_type, - 'comment': None, - 'owner': role, + 'seed.test.seed': { + 'unique_id': 'seed.test.seed', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': case('seed'), + 'type': table_type, + 'comment': None, + 'owner': role, + }, + 'stats': seed_stats, + 'columns': expected_cols, }, - 'stats': seed_stats, - 'columns': expected_cols, }, + 'sources': {} } def expected_postgres_catalog(self): @@ -458,57 +457,61 @@ def expected_postgres_references_catalog(self): }, } return { - 'seed.test.seed': { - 'unique_id': 'seed.test.seed', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'seed', - 'type': 'BASE TABLE', - 'comment': None, - 'owner': role, + 'nodes': { + 'seed.test.seed': { + 'unique_id': 'seed.test.seed', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'seed', + 'type': 'BASE TABLE', + 'comment': None, + 'owner': role, + }, + 'stats': stats, + 'columns': seed_columns }, - 'stats': stats, - 'columns': seed_columns - }, - 'model.test.ephemeral_summary': { - 'unique_id': 'model.test.ephemeral_summary', - 'metadata': { - 'schema': my_schema_name, - 'database': model_database, - 'name': 'ephemeral_summary', - 'type': 'BASE TABLE', - 'comment': None, - 'owner': role, + 'model.test.ephemeral_summary': { + 'unique_id': 'model.test.ephemeral_summary', + 'metadata': { + 'schema': my_schema_name, + 'database': model_database, + 'name': 'ephemeral_summary', + 'type': 'BASE TABLE', + 'comment': None, + 'owner': role, + }, + 'stats': stats, + 'columns': summary_columns, }, - 'stats': stats, - 'columns': summary_columns, - }, - 'model.test.view_summary': { - 'unique_id': 'model.test.view_summary', - 'metadata': { - 'schema': my_schema_name, - 'database': model_database, - 'name': 'view_summary', - 'type': 'VIEW', - 'comment': None, - 'owner': role, + 'model.test.view_summary': { + 'unique_id': 'model.test.view_summary', + 'metadata': { + 'schema': my_schema_name, + 'database': model_database, + 'name': 'view_summary', + 'type': 'VIEW', + 'comment': None, + 'owner': role, + }, + 'stats': stats, + 'columns': summary_columns, }, - 'stats': stats, - 'columns': summary_columns, }, - "source.test.my_source.my_table": { - "unique_id": "source.test.my_source.my_table", - "metadata": { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'seed', - 'type': 'BASE TABLE', - 'comment': None, - 'owner': role, + 'sources': { + "source.test.my_source.my_table": { + "unique_id": "source.test.my_source.my_table", + "metadata": { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'seed', + 'type': 'BASE TABLE', + 'comment': None, + 'owner': role, + }, + "stats": stats, + 'columns': seed_columns, }, - "stats": stats, - 'columns': seed_columns, }, } @@ -627,71 +630,74 @@ def expected_bigquery_complex_catalog(self): } return { - 'model.test.clustered': { - 'unique_id': 'model.test.clustered', - 'metadata': { - 'comment': None, - 'name': 'clustered', - 'owner': None, - 'schema': my_schema_name, - 'database': self.default_database, - 'type': 'table' + 'nodes': { + 'model.test.clustered': { + 'unique_id': 'model.test.clustered', + 'metadata': { + 'comment': None, + 'name': 'clustered', + 'owner': None, + 'schema': my_schema_name, + 'database': self.default_database, + 'type': 'table' + }, + 'stats': clustering_stats, + 'columns': self._clustered_bigquery_columns('DATE'), }, - 'stats': clustering_stats, - 'columns': self._clustered_bigquery_columns('DATE'), - }, - 'model.test.multi_clustered': { - 'unique_id': 'model.test.multi_clustered', - 'metadata': { - 'comment': None, - 'name': 'multi_clustered', - 'owner': None, - 'schema': my_schema_name, - 'database': self.default_database, - 'type': 'table' + 'model.test.multi_clustered': { + 'unique_id': 'model.test.multi_clustered', + 'metadata': { + 'comment': None, + 'name': 'multi_clustered', + 'owner': None, + 'schema': my_schema_name, + 'database': self.default_database, + 'type': 'table' + }, + 'stats': multi_clustering_stats, + 'columns': self._clustered_bigquery_columns('DATE'), }, - 'stats': multi_clustering_stats, - 'columns': self._clustered_bigquery_columns('DATE'), - }, - 'seed.test.seed': { - 'unique_id': 'seed.test.seed', - 'metadata': { - 'comment': None, - 'name': 'seed', - 'owner': None, - 'schema': my_schema_name, - 'database': self.default_database, - 'type': 'table', + 'seed.test.seed': { + 'unique_id': 'seed.test.seed', + 'metadata': { + 'comment': None, + 'name': 'seed', + 'owner': None, + 'schema': my_schema_name, + 'database': self.default_database, + 'type': 'table', + }, + 'stats': table_stats, + 'columns': self._clustered_bigquery_columns('DATETIME'), }, - 'stats': table_stats, - 'columns': self._clustered_bigquery_columns('DATETIME'), - }, - 'model.test.nested_view': { - 'unique_id': 'model.test.nested_view', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'nested_view', - 'type': 'view', - 'owner': role, - 'comment': None + 'model.test.nested_view': { + 'unique_id': 'model.test.nested_view', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'nested_view', + 'type': 'view', + 'owner': role, + 'comment': None + }, + 'stats': self._bigquery_stats(False), + 'columns': nesting_columns, }, - 'stats': self._bigquery_stats(False), - 'columns': nesting_columns, - }, - 'model.test.nested_table': { - 'unique_id': 'model.test.nested_table', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'nested_table', - 'type': 'table', - 'owner': role, - 'comment': None + 'model.test.nested_table': { + 'unique_id': 'model.test.nested_table', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'nested_table', + 'type': 'table', + 'owner': role, + 'comment': None + }, + 'stats': table_stats, + 'columns': nesting_columns, }, - 'stats': table_stats, - 'columns': nesting_columns, - } + }, + 'sources': {}, } def expected_redshift_catalog(self): @@ -709,95 +715,98 @@ def expected_redshift_incremental_catalog(self): my_schema_name = self.unique_schema() role = self.get_role() return { - 'model.test.model': { - 'unique_id': 'model.test.model', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'model', - 'type': 'LATE BINDING VIEW', - 'comment': None, - 'owner': role, - }, - # incremental views have no stats - 'stats': self._no_stats(), - 'columns': { - 'id': { - 'name': 'id', - 'index': 1, - 'type': 'integer', - 'comment': None, - }, - 'first_name': { - 'name': 'first_name', - 'index': 2, - 'type': 'character varying(5)', - 'comment': None, - }, - 'email': { - 'name': 'email', - 'index': 3, - 'type': 'character varying(23)', - 'comment': None, - }, - 'ip_address': { - 'name': 'ip_address', - 'index': 4, - 'type': 'character varying(14)', + 'nodes': { + 'model.test.model': { + 'unique_id': 'model.test.model', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'model', + 'type': 'LATE BINDING VIEW', 'comment': None, + 'owner': role, }, - 'updated_at': { - 'name': 'updated_at', - 'index': 5, - 'type': 'timestamp without time zone', - 'comment': None, + # incremental views have no stats + 'stats': self._no_stats(), + 'columns': { + 'id': { + 'name': 'id', + 'index': 1, + 'type': 'integer', + 'comment': None, + }, + 'first_name': { + 'name': 'first_name', + 'index': 2, + 'type': 'character varying(5)', + 'comment': None, + }, + 'email': { + 'name': 'email', + 'index': 3, + 'type': 'character varying(23)', + 'comment': None, + }, + 'ip_address': { + 'name': 'ip_address', + 'index': 4, + 'type': 'character varying(14)', + 'comment': None, + }, + 'updated_at': { + 'name': 'updated_at', + 'index': 5, + 'type': 'timestamp without time zone', + 'comment': None, + }, }, }, - }, - 'seed.test.seed': { - 'unique_id': 'seed.test.seed', - 'metadata': { - 'schema': my_schema_name, - 'database': self.default_database, - 'name': 'seed', - 'type': 'BASE TABLE', - 'comment': None, - 'owner': role, - }, - 'stats': self._redshift_stats(), - 'columns': { - 'id': { - 'name': 'id', - 'index': 1, - 'type': 'integer', - 'comment': None, - }, - 'first_name': { - 'name': 'first_name', - 'index': 2, - 'type': 'character varying', - 'comment': None, - }, - 'email': { - 'name': 'email', - 'index': 3, - 'type': 'character varying', - 'comment': None, - }, - 'ip_address': { - 'name': 'ip_address', - 'index': 4, - 'type': 'character varying', + 'seed.test.seed': { + 'unique_id': 'seed.test.seed', + 'metadata': { + 'schema': my_schema_name, + 'database': self.default_database, + 'name': 'seed', + 'type': 'BASE TABLE', 'comment': None, + 'owner': role, }, - 'updated_at': { - 'name': 'updated_at', - 'index': 5, - 'type': 'timestamp without time zone', - 'comment': None, + 'stats': self._redshift_stats(), + 'columns': { + 'id': { + 'name': 'id', + 'index': 1, + 'type': 'integer', + 'comment': None, + }, + 'first_name': { + 'name': 'first_name', + 'index': 2, + 'type': 'character varying', + 'comment': None, + }, + 'email': { + 'name': 'email', + 'index': 3, + 'type': 'character varying', + 'comment': None, + }, + 'ip_address': { + 'name': 'ip_address', + 'index': 4, + 'type': 'character varying', + 'comment': None, + }, + 'updated_at': { + 'name': 'updated_at', + 'index': 5, + 'type': 'timestamp without time zone', + 'comment': None, + }, }, }, }, + 'sources': {}, } def verify_catalog(self, expected): @@ -810,8 +819,8 @@ def verify_catalog(self, expected): catalog.pop('generated_at'), start=self.generate_start_time, ) - actual = catalog['nodes'] - self.assertEqual(expected, actual) + for key in 'nodes', 'sources': + self.assertEqual(catalog[key], expected[key]) def verify_manifest_macros(self, manifest, expected=None): self.assertIn('macros', manifest) @@ -878,15 +887,13 @@ def expected_seeded_manifest(self, model_database=None): if model_database is None: model_database = self.alternative_database - config_vars = {'alternate_db': model_database} - model_config = { 'database': model_database, 'enabled': True, 'materialized': 'view', 'pre-hook': [], 'post-hook': [], - 'vars': config_vars, + 'vars': {}, 'column_types': {}, 'quoting': {}, 'tags': [], @@ -1054,7 +1061,7 @@ def expected_seeded_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -1107,7 +1114,7 @@ def expected_seeded_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -1159,7 +1166,7 @@ def expected_seeded_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -1200,6 +1207,7 @@ def expected_seeded_manifest(self, model_database=None): }, }, }, + 'sources': {}, 'parent_map': { 'model.test.model': ['seed.test.seed'], 'seed.test.seed': [], @@ -1235,7 +1243,6 @@ def expected_seeded_manifest(self, model_database=None): def expected_postgres_references_manifest(self, model_database=None): if model_database is None: model_database = self.default_database - config_vars = {'alternate_db': model_database} my_schema_name = self.unique_schema() docs_path = self.dir('ref_models/docs.md') @@ -1277,7 +1284,7 @@ def expected_postgres_references_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'sources': [['my_source', 'my_table']], @@ -1338,7 +1345,7 @@ def expected_postgres_references_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'sources': [], @@ -1401,7 +1408,7 @@ def expected_postgres_references_manifest(self, model_database=None): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'database': self.default_database, @@ -1513,6 +1520,8 @@ def expected_postgres_references_manifest(self, model_database=None): 'extra_ctes': [], 'injected_sql': '', }, + }, + 'sources': { 'source.test.my_source.my_table': { 'columns': { 'id': { @@ -1523,6 +1532,9 @@ def expected_postgres_references_manifest(self, model_database=None): 'tags': [], } }, + 'config': { + 'enabled': True, + }, 'quoting': { 'database': False, 'schema': None, @@ -1531,10 +1543,7 @@ def expected_postgres_references_manifest(self, model_database=None): }, 'database': self.default_database, 'description': 'My table', - 'external': { - 'file_format': None, 'location': None, 'partitions': None, - 'row_format': None, 'tbl_properties': None - }, + 'external': None, 'freshness': {'error_after': None, 'warn_after': None, 'filter': None}, 'identifier': 'seed', 'loaded_at_field': None, @@ -1544,6 +1553,7 @@ def expected_postgres_references_manifest(self, model_database=None): 'original_file_path': self.dir('ref_models/schema.yml'), 'package_name': 'test', 'path': self.dir('ref_models/schema.yml'), + 'patch_path': None, 'resource_type': 'source', 'root_path': self.test_root_dir, 'schema': my_schema_name, @@ -1553,7 +1563,7 @@ def expected_postgres_references_manifest(self, model_database=None): 'tags': [], 'unique_id': 'source.test.my_source.my_table', 'fqn': ['test', 'my_source', 'my_table'], - } + }, }, 'docs': { 'dbt.__overview__': ANY, @@ -1699,7 +1709,6 @@ def expected_bigquery_complex_manifest(self): clustered_sql_path = self.dir('bq_models/clustered.sql') multi_clustered_sql_path = self.dir('bq_models/multi_clustered.sql') my_schema_name = self.unique_schema() - config_vars = {'alternate_db': self.alternative_database} return { 'nodes': { 'model.test.clustered': { @@ -1714,7 +1723,7 @@ def expected_bigquery_complex_manifest(self): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'sources': [], @@ -1794,7 +1803,7 @@ def expected_bigquery_complex_manifest(self): 'pre-hook': [], 'quoting': {}, 'tags': [], - 'vars': config_vars + 'vars': {}, }, 'sources': [], 'depends_on': {'macros': [], 'nodes': ['seed.test.seed']}, @@ -1869,7 +1878,7 @@ def expected_bigquery_complex_manifest(self): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'sources': [], @@ -1948,7 +1957,7 @@ def expected_bigquery_complex_manifest(self): 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], }, 'sources': [], @@ -2061,6 +2070,7 @@ def expected_bigquery_complex_manifest(self): 'injected_sql': '', }, }, + 'sources': {}, 'child_map': { 'model.test.clustered': [], 'model.test.multi_clustered': [], @@ -2117,7 +2127,6 @@ def _absolute_path_to(self, searched_path: str, relative_path: str): def expected_redshift_incremental_view_manifest(self): model_sql_path = self.dir('rs_models/model.sql') my_schema_name = self.unique_schema() - config_vars = {'alternate_db': self.default_database} return { 'nodes': { @@ -2150,7 +2159,7 @@ def expected_redshift_incremental_view_manifest(self): 'pre-hook': [], 'quoting': {}, 'tags': [], - 'vars': config_vars, + 'vars': {}, }, 'schema': my_schema_name, 'database': self.default_database, @@ -2282,6 +2291,7 @@ def expected_redshift_incremental_view_manifest(self): 'injected_sql': ANY, }, }, + 'sources': {}, 'parent_map': { 'model.test.model': ['seed.test.seed'], 'seed.test.seed': [] @@ -2310,7 +2320,7 @@ def verify_manifest(self, expected_manifest): manifest = _read_json('./target/manifest.json') manifest_keys = frozenset({ - 'nodes', 'macros', 'parent_map', 'child_map', 'generated_at', + 'nodes', 'sources', 'macros', 'parent_map', 'child_map', 'generated_at', 'docs', 'metadata', 'docs', 'disabled' }) @@ -2343,8 +2353,6 @@ def expected_run_results(self, quote_schema=True, quote_model=False, if model_database is None: model_database = self.alternative_database - config_vars = {'alternate_db': model_database} - model_config = { 'database': model_database, 'enabled': True, @@ -2352,7 +2360,7 @@ def expected_run_results(self, quote_schema=True, quote_model=False, 'persist_docs': {}, 'pre-hook': [], 'post-hook': [], - 'vars': config_vars, + 'vars': {}, 'column_types': {}, 'quoting': {}, 'tags': [], @@ -2564,7 +2572,7 @@ def expected_run_results(self, quote_schema=True, quote_model=False, 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -2627,7 +2635,7 @@ def expected_run_results(self, quote_schema=True, quote_model=False, 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -2689,7 +2697,7 @@ def expected_run_results(self, quote_schema=True, quote_model=False, 'post-hook': [], 'pre-hook': [], 'quoting': {}, - 'vars': config_vars, + 'vars': {}, 'tags': [], 'severity': 'ERROR', }, @@ -2736,7 +2744,6 @@ def expected_run_results(self, quote_schema=True, quote_model=False, def expected_postgres_references_run_results(self): my_schema_name = self.unique_schema() - config_vars = {'alternate_db': self.default_database} ephemeral_compiled_sql = ( '\n\nselect first_name, count(*) as ct from ' '__dbt__CTE__ephemeral_copy\ngroup by first_name\n' @@ -2793,7 +2800,7 @@ def expected_postgres_references_run_results(self): 'persist_docs': {}, 'pre-hook': [], 'post-hook': [], - 'vars': config_vars, + 'vars': {}, 'column_types': {}, 'quoting': {}, 'tags': [], @@ -2873,7 +2880,7 @@ def expected_postgres_references_run_results(self): 'persist_docs': {}, 'pre-hook': [], 'post-hook': [], - 'vars': config_vars, + 'vars': {}, 'column_types': {}, 'quoting': {}, 'tags': [], @@ -3185,6 +3192,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': [self.dir('fail_macros')], } diff --git a/test/integration/030_statement_test/test_statements.py b/test/integration/030_statement_test/test_statements.py index 3cbdaef6035..a410cd86ab2 100644 --- a/test/integration/030_statement_test/test_statements.py +++ b/test/integration/030_statement_test/test_statements.py @@ -18,6 +18,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'seeds': { 'quote_columns': False, } @@ -74,6 +75,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'seeds': { 'quote_columns': False, } diff --git a/test/integration/031_thread_count_test/test_thread_count.py b/test/integration/031_thread_count_test/test_thread_count.py index 5928782f4ea..042e2cd8a94 100644 --- a/test/integration/031_thread_count_test/test_thread_count.py +++ b/test/integration/031_thread_count_test/test_thread_count.py @@ -6,7 +6,7 @@ class TestThreadCount(DBTIntegrationTest): @property def project_config(self): - return {} + return {'config-version': 2} @property def profile_config(self): diff --git a/test/integration/032_concurrent_transaction_test/test_concurrent_transaction.py b/test/integration/032_concurrent_transaction_test/test_concurrent_transaction.py index eaa7047c3a7..69d2a2af903 100644 --- a/test/integration/032_concurrent_transaction_test/test_concurrent_transaction.py +++ b/test/integration/032_concurrent_transaction_test/test_concurrent_transaction.py @@ -32,6 +32,7 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ["macros"], "on-run-start": [ "{{ create_udfs() }}", diff --git a/test/integration/033_event_tracking_test/test_events.py b/test/integration/033_event_tracking_test/test_events.py index 0489695e679..acb3312333c 100644 --- a/test/integration/033_event_tracking_test/test_events.py +++ b/test/integration/033_event_tracking_test/test_events.py @@ -185,6 +185,7 @@ def packages_config(self): @property def project_config(self): return { + 'config-version': 2, "data-paths": [self.dir("data")], "test-paths": [self.dir("test")], 'seeds': { @@ -462,6 +463,7 @@ class TestEventTrackingCompilationError(TestEventTracking): @property def project_config(self): return { + 'config-version': 2, "source-paths": [self.dir("model-compilation-error")], } @@ -565,6 +567,7 @@ class TestEventTrackingSnapshot(TestEventTracking): @property def project_config(self): return { + 'config-version': 2, "snapshot-paths": ['snapshots'] } diff --git a/test/integration/034_redshift_test/test_late_binding_view.py b/test/integration/034_redshift_test/test_late_binding_view.py index fdb0194e0aa..5f7816e00e9 100644 --- a/test/integration/034_redshift_test/test_late_binding_view.py +++ b/test/integration/034_redshift_test/test_late_binding_view.py @@ -19,6 +19,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': [self.dir('seed')], 'seeds': { 'quote_columns': False, diff --git a/test/integration/036_snowflake_view_dependency_test/test_view_binding_dependency.py b/test/integration/036_snowflake_view_dependency_test/test_view_binding_dependency.py index 9b0a31bc23f..61ba9baafae 100644 --- a/test/integration/036_snowflake_view_dependency_test/test_view_binding_dependency.py +++ b/test/integration/036_snowflake_view_dependency_test/test_view_binding_dependency.py @@ -13,6 +13,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'seeds': { 'quote_columns': False, diff --git a/test/integration/038_caching_test/test_caching.py b/test/integration/038_caching_test/test_caching.py index 2953b21bf4f..7ac1cf42e92 100644 --- a/test/integration/038_caching_test/test_caching.py +++ b/test/integration/038_caching_test/test_caching.py @@ -9,6 +9,7 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, 'quoting': { 'identifier': False, 'schema': False, diff --git a/test/integration/039_config_test/models/schema.yml b/test/integration/039_config_test/models/schema.yml new file mode 100644 index 00000000000..f6bc295ae6d --- /dev/null +++ b/test/integration/039_config_test/models/schema.yml @@ -0,0 +1,8 @@ +version: 2 +sources: + - name: raw + database: "{{ target.database }}" + schema: "{{ target.schema }}" + tables: + - name: 'seed' + identifier: "{{ var('seed_name', 'invalid') }}" diff --git a/test/integration/039_config_test/models/model.sql b/test/integration/039_config_test/models/tagged/model.sql similarity index 100% rename from test/integration/039_config_test/models/model.sql rename to test/integration/039_config_test/models/tagged/model.sql diff --git a/test/integration/039_config_test/models/untagged.sql b/test/integration/039_config_test/models/untagged.sql new file mode 100644 index 00000000000..9f9bc85e111 --- /dev/null +++ b/test/integration/039_config_test/models/untagged.sql @@ -0,0 +1,5 @@ +{{ + config(materialized='table') +}} + +select id, value from {{ source('raw', 'seed') }} diff --git a/test/integration/039_config_test/test_configs.py b/test/integration/039_config_test/test_configs.py index b58b9cdf5a7..1246d72be71 100644 --- a/test/integration/039_config_test/test_configs.py +++ b/test/integration/039_config_test/test_configs.py @@ -2,6 +2,7 @@ import shutil from test.integration.base import DBTIntegrationTest, use_profile +from dbt.exceptions import CompilationException class TestConfigs(DBTIntegrationTest): @@ -15,13 +16,16 @@ def unique_schema(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'models': { 'test': { - # the model configs will override this - 'materialized': 'invalid', - # the model configs will append to these - 'tags': ['tag_one'], + 'tagged': { + # the model configs will override this + 'materialized': 'invalid', + # the model configs will append to these + 'tags': ['tag_one'], + } }, }, 'seeds': { @@ -75,6 +79,7 @@ def new_dirs(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'target-path': "target_{{ modules.datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%S') }}", 'seeds': { @@ -95,18 +100,114 @@ class TestDisabledConfigs(DBTIntegrationTest): def schema(self): return "config_039" + def postgres_profile(self): + return { + 'config': { + 'send_anonymous_usage_stats': False + }, + 'test': { + 'outputs': { + 'default2': { + 'type': 'postgres', + # make sure you can do this and get an int out + 'threads': "{{ 1 + 3 }}", + 'host': self.database_host, + 'port': "{{ 5400 + 32 }}", + 'user': 'root', + 'pass': 'password', + 'dbname': 'dbt', + 'schema': self.unique_schema() + }, + 'disabled': { + 'type': 'postgres', + # make sure you can do this and get an int out + 'threads': "{{ 1 + 3 }}", + 'host': self.database_host, + 'port': "{{ 5400 + 32 }}", + 'user': 'root', + 'pass': 'password', + 'dbname': 'dbt', + 'schema': self.unique_schema() + }, + }, + 'target': 'default2' + } + } + @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], + 'models': { + 'test': { + 'enabled': "{{ target.name == 'default2' }}", + }, + }, + # set the `var` result in schema.yml to be 'seed', so that the + # `source` call can suceed. + 'vars': { + 'test': { + 'seed_name': 'seed', + } + }, 'seeds': { 'quote_columns': False, 'test': { 'seed': { - 'enabled': False, - } + 'enabled': "{{ target.name == 'default2' }}", + }, + }, + }, + } + + @property + def models(self): + return "models" + + @use_profile('postgres') + def test_postgres_disable_seed_partial_parse(self): + self.run_dbt(['--partial-parse', 'seed', '--target', 'disabled']) + self.run_dbt(['--partial-parse', 'seed', '--target', 'disabled']) + + @use_profile('postgres') + def test_postgres_conditional_model(self): + # no seeds/models - enabled should eval to False because of the target + results = self.run_dbt(['seed', '--target', 'disabled'], strict=False) + self.assertEqual(len(results), 0) + results = self.run_dbt(['run', '--target', 'disabled'], strict=False) + self.assertEqual(len(results), 0) + + # has seeds/models - enabled should eval to True because of the target + results = self.run_dbt(['seed']) + self.assertEqual(len(results), 1) + results = self.run_dbt(['run']) + self.assertEqual(len(results), 2) + + +class TestUnusedModelConfigs(DBTIntegrationTest): + @property + def schema(self): + return "config_039" + + @property + def project_config(self): + return { + 'config-version': 2, + 'data-paths': ['data'], + 'models': { + 'test': { + 'enabled': True, } }, + 'seeds': { + 'quote_columns': False, + }, + 'sources': { + 'test': { + 'enabled': True, + } + } } @property @@ -114,6 +215,12 @@ def models(self): return "empty-models" @use_profile('postgres') - def test_postgres_disable_seed_partial_parse(self): - self.run_dbt(['--partial-parse', 'seed']) - self.run_dbt(['--partial-parse', 'seed']) + def test_postgres_warn_unused_configuration_paths(self): + with self.assertRaises(CompilationException) as exc: + self.run_dbt(['seed']) + + self.assertIn('Configuration paths exist', str(exc.exception)) + self.assertIn('- sources.test', str(exc.exception)) + self.assertIn('- models.test', str(exc.exception)) + + self.run_dbt(['seed'], strict=False) diff --git a/test/integration/040_override_database_test/test_override_database.py b/test/integration/040_override_database_test/test_override_database.py index 5e9373f7f6f..37ea59b8eb5 100644 --- a/test/integration/040_override_database_test/test_override_database.py +++ b/test/integration/040_override_database_test/test_override_database.py @@ -55,11 +55,10 @@ def snowflake_profile(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], - 'models': { - 'vars': { - 'alternate_db': self.alternative_database, - }, + 'vars': { + 'alternate_db': self.alternative_database, }, 'quoting': { 'database': True, @@ -108,17 +107,18 @@ def run_database_override(self): func = lambda x: x self.use_default_project({ + 'config-version': 2, + 'vars': { + 'alternate_db': self.alternative_database, + }, 'models': { - 'vars': { - 'alternate_db': self.alternative_database, - }, 'database': self.alternative_database, 'test': { 'subfolder': { 'database': self.default_database, - }, - }, - } + } + } + }, }) self.run_dbt_notstrict(['seed']) @@ -148,7 +148,10 @@ def run_database_override(self): func = lambda x: x self.use_default_project({ - 'seeds': {'database': self.alternative_database} + 'config-version': 2, + 'seeds': { + 'database': self.alternative_database + }, }) self.run_dbt_notstrict(['seed']) diff --git a/test/integration/041_presto_test/test_simple_presto_view.py b/test/integration/041_presto_test/test_simple_presto_view.py index ea4412602cf..44190e6b122 100644 --- a/test/integration/041_presto_test/test_simple_presto_view.py +++ b/test/integration/041_presto_test/test_simple_presto_view.py @@ -16,6 +16,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'macro-paths': ['macros'], 'seeds': { diff --git a/test/integration/042_sources_test/test_sources.py b/test/integration/042_sources_test/test_sources.py index 7f124db516f..67be460863c 100644 --- a/test/integration/042_sources_test/test_sources.py +++ b/test/integration/042_sources_test/test_sources.py @@ -21,6 +21,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'quoting': {'database': True, 'schema': True, 'identifier': True}, 'seeds': { diff --git a/test/integration/043_custom_aliases_test/test_custom_aliases.py b/test/integration/043_custom_aliases_test/test_custom_aliases.py index f78cf8efc29..1acc9dd5224 100644 --- a/test/integration/043_custom_aliases_test/test_custom_aliases.py +++ b/test/integration/043_custom_aliases_test/test_custom_aliases.py @@ -13,6 +13,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros'], } @@ -27,6 +28,7 @@ class TestAliasesWithConfig(TestAliases): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros-configs'], } diff --git a/test/integration/044_run_operations_test/test_run_operations.py b/test/integration/044_run_operations_test/test_run_operations.py index 8a436a822cf..98715191e3e 100644 --- a/test/integration/044_run_operations_test/test_run_operations.py +++ b/test/integration/044_run_operations_test/test_run_operations.py @@ -14,6 +14,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, "macro-paths": ['macros'], } diff --git a/test/integration/045_test_severity_tests/test_severity.py b/test/integration/045_test_severity_tests/test_severity.py index 5e79b276cea..66f2a2dbe9e 100644 --- a/test/integration/045_test_severity_tests/test_severity.py +++ b/test/integration/045_test_severity_tests/test_severity.py @@ -12,6 +12,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'test-paths': ['tests'], 'seeds': { diff --git a/test/integration/047_dbt_ls_test/test_ls.py b/test/integration/047_dbt_ls_test/test_ls.py index da0a4bbd13b..9fbacdc5e41 100644 --- a/test/integration/047_dbt_ls_test/test_ls.py +++ b/test/integration/047_dbt_ls_test/test_ls.py @@ -22,6 +22,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'analysis-paths': [self.dir('analyses')], 'snapshot-paths': [self.dir('snapshots')], 'macro-paths': [self.dir('macros')], @@ -220,6 +221,9 @@ def expect_source_output(self): 'name': 'my_source.my_table', 'selector': 'source:test.my_source.my_table', 'json': { + 'config': { + 'enabled': True, + }, 'package_name': 'test', 'name': 'my_table', 'source_name': 'my_source', diff --git a/test/integration/048_rpc_test/test_execute_fetch_and_serialize.py b/test/integration/048_rpc_test/test_execute_fetch_and_serialize.py index e1d21b2e1ef..bdcceeb16ec 100644 --- a/test/integration/048_rpc_test/test_execute_fetch_and_serialize.py +++ b/test/integration/048_rpc_test/test_execute_fetch_and_serialize.py @@ -16,6 +16,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['macros'], } diff --git a/test/integration/048_rpc_test/test_rpc.py b/test/integration/048_rpc_test/test_rpc.py index b7816448136..d76aaec1a90 100644 --- a/test/integration/048_rpc_test/test_rpc.py +++ b/test/integration/048_rpc_test/test_rpc.py @@ -22,7 +22,7 @@ class ServerProcess(dbt.flags.MP_CONTEXT.Process): def __init__(self, port, profiles_dir, cli_vars=None): self.port = port handle_and_check_args = [ - '--strict', 'rpc', '--log-cache-events', + 'rpc', '--log-cache-events', '--port', str(self.port), '--profiles-dir', profiles_dir ] @@ -144,6 +144,7 @@ def run_dbt_with_vars(self, cmd, *args, **kwargs): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], 'quoting': {'database': True, 'schema': True, 'identifier': True}, 'macro-paths': ['macros'], @@ -893,12 +894,14 @@ def test_test_project_cli_postgres(self): self.assertIn('results', result) self.assertHasTestResults(result['results'], 4) - def assertManifestExists(self, length): + def assertManifestExists(self, nodes_length, sources_length): self.assertTrue(os.path.exists('target/manifest.json')) with open('target/manifest.json') as fp: manifest = json.load(fp) self.assertIn('nodes', manifest) - self.assertEqual(len(manifest['nodes']), length) + self.assertEqual(len(manifest['nodes']), nodes_length) + self.assertIn('sources', manifest) + self.assertEqual(len(manifest['sources']), sources_length) def assertHasDocsGenerated(self, result, expected): dct = self.assertIsResult(result) @@ -906,7 +909,10 @@ def assertHasDocsGenerated(self, result, expected): self.assertTrue(dct['state']) self.assertIn('nodes', dct) nodes = dct['nodes'] - self.assertEqual(set(nodes), expected) + self.assertEqual(set(nodes), expected['nodes']) + self.assertIn('sources', dct) + sources = dct['sources'] + self.assertEqual(set(sources), expected['sources']) def assertCatalogExists(self): self.assertTrue(os.path.exists('target/catalog.json')) @@ -915,20 +921,24 @@ def assertCatalogExists(self): def _correct_docs_generate_result(self, result): expected = { - 'model.test.descendant_model', - 'model.test.multi_source_model', - 'model.test.nonsource_descendant', - 'seed.test.expected_multi_source', - 'seed.test.other_source_table', - 'seed.test.other_table', - 'seed.test.source', - 'source.test.other_source.test_table', - 'source.test.test_source.other_test_table', - 'source.test.test_source.test_table', + 'nodes': { + 'model.test.descendant_model', + 'model.test.multi_source_model', + 'model.test.nonsource_descendant', + 'seed.test.expected_multi_source', + 'seed.test.other_source_table', + 'seed.test.other_table', + 'seed.test.source', + }, + 'sources': { + 'source.test.other_source.test_table', + 'source.test.test_source.other_test_table', + 'source.test.test_source.test_table', + }, } self.assertHasDocsGenerated(result, expected) self.assertCatalogExists() - self.assertManifestExists(17) + self.assertManifestExists(12, 5) @use_profile('postgres') def test_docs_generate_postgres(self): diff --git a/test/integration/049_dbt_debug_test/test_debug.py b/test/integration/049_dbt_debug_test/test_debug.py index 4d96bb80ce7..77c4dfc0993 100644 --- a/test/integration/049_dbt_debug_test/test_debug.py +++ b/test/integration/049_dbt_debug_test/test_debug.py @@ -77,7 +77,10 @@ def test_postgres_wronguser(self): class TestDebugProfileVariable(TestDebug): @property def project_config(self): - return {'profile': '{{ "te" ~ "st" }}'} + return { + 'config-version': 2, + 'profile': '{{ "te" ~ "st" }}' + } class TestDebugInvalidProject(DBTIntegrationTest): diff --git a/test/integration/050_warehouse_test/test_warehouses.py b/test/integration/050_warehouse_test/test_warehouses.py index 3ac84333ca9..eaef4ddb8ac 100644 --- a/test/integration/050_warehouse_test/test_warehouses.py +++ b/test/integration/050_warehouse_test/test_warehouses.py @@ -36,6 +36,7 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, 'source-paths': ['project-config-models'], 'models': { 'test': { diff --git a/test/integration/051_query_comments_test/test_query_comments.py b/test/integration/051_query_comments_test/test_query_comments.py index 441112d4d13..ee1d761df32 100644 --- a/test/integration/051_query_comments_test/test_query_comments.py +++ b/test/integration/051_query_comments_test/test_query_comments.py @@ -21,7 +21,10 @@ def matches_comment(self, msg): @property def project_config(self): - return {'macro-paths': ['macros']} + return { + 'config-version': 2, + 'macro-paths': ['macros'] + } @property diff --git a/test/integration/052_column_quoting/test_column_quotes.py b/test/integration/052_column_quoting/test_column_quotes.py index b298df4bfa8..f18ea6ff828 100644 --- a/test/integration/052_column_quoting/test_column_quotes.py +++ b/test/integration/052_column_quoting/test_column_quotes.py @@ -24,7 +24,9 @@ def _run_columnn_quotes(self, strategy='delete+insert'): class TestColumnQuotingDefault(BaseColumnQuotingTest): @property def project_config(self): - return {} + return { + 'config-version': 2 + } @property def models(self): @@ -58,6 +60,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'seeds': { 'quote_columns': False, }, @@ -92,6 +95,7 @@ def models(self): @property def project_config(self): return { + 'config-version': 2, 'seeds': { 'quote_columns': True, }, diff --git a/test/integration/053_custom_materialization/override-view-adapter-dep/dbt_project.yml b/test/integration/053_custom_materialization/override-view-adapter-dep/dbt_project.yml index 2c58789b48e..248abd809ac 100644 --- a/test/integration/053_custom_materialization/override-view-adapter-dep/dbt_project.yml +++ b/test/integration/053_custom_materialization/override-view-adapter-dep/dbt_project.yml @@ -1,3 +1,4 @@ name: view_adapter_override version: '1.0' macro-paths: ['macros'] +config-version: 2 diff --git a/test/integration/053_custom_materialization/override-view-adapter-pass-dep/dbt_project.yml b/test/integration/053_custom_materialization/override-view-adapter-pass-dep/dbt_project.yml index 2c58789b48e..248abd809ac 100644 --- a/test/integration/053_custom_materialization/override-view-adapter-pass-dep/dbt_project.yml +++ b/test/integration/053_custom_materialization/override-view-adapter-pass-dep/dbt_project.yml @@ -1,3 +1,4 @@ name: view_adapter_override version: '1.0' macro-paths: ['macros'] +config-version: 2 diff --git a/test/integration/053_custom_materialization/override-view-default-dep/dbt_project.yml b/test/integration/053_custom_materialization/override-view-default-dep/dbt_project.yml index 9b1515079d2..f8fe48084c0 100644 --- a/test/integration/053_custom_materialization/override-view-default-dep/dbt_project.yml +++ b/test/integration/053_custom_materialization/override-view-default-dep/dbt_project.yml @@ -1,3 +1,4 @@ name: view_default_override +config-version: 2 version: '1.0' macro-paths: ['macros'] diff --git a/test/integration/053_custom_materialization/test_custom_materialization.py b/test/integration/053_custom_materialization/test_custom_materialization.py index c594e69699e..0d4e5399934 100644 --- a/test/integration/053_custom_materialization/test_custom_materialization.py +++ b/test/integration/053_custom_materialization/test_custom_materialization.py @@ -89,6 +89,7 @@ def packages_config(self): @property def project_config(self): return { + 'config-version': 2, 'macro-paths': ['override-view-adapter-macros'] } diff --git a/test/integration/055_ref_override_test/test_ref_override.py b/test/integration/055_ref_override_test/test_ref_override.py index 2d0fc3e068d..360bfa64b28 100644 --- a/test/integration/055_ref_override_test/test_ref_override.py +++ b/test/integration/055_ref_override_test/test_ref_override.py @@ -9,11 +9,12 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, 'data-paths': ['data'], "macro-paths": ["macros"], 'seeds': { - 'quote_columns': False - } + 'quote_columns': False, + }, } @property diff --git a/test/integration/058_fail_fast/test_fail_fast_run.py b/test/integration/058_fail_fast/test_fail_fast_run.py index ce99cebe740..d8aa9cd9aa0 100644 --- a/test/integration/058_fail_fast/test_fail_fast_run.py +++ b/test/integration/058_fail_fast/test_fail_fast_run.py @@ -10,23 +10,24 @@ def schema(self): @property def project_config(self): return { + 'config-version': 2, "on-run-start": "create table if not exists {{ target.schema }}.audit (model text)", 'models': { 'test': { 'pre-hook': [ - { - # we depend on non-deterministic nature of tasks execution - # there is possibility to run next task in-between - # first task failure and adapter connections cancellations - # if you encounter any problems with these tests please report - # the sleep command with random time minimize the risk - 'sql': "select pg_sleep(random())", - 'transaction': False - }, - { - 'sql': "insert into {{ target.schema }}.audit values ('{{ this }}')", - 'transaction': False - } + { + # we depend on non-deterministic nature of tasks execution + # there is possibility to run next task in-between + # first task failure and adapter connections cancellations + # if you encounter any problems with these tests please report + # the sleep command with random time minimize the risk + 'sql': "select pg_sleep(random())", + 'transaction': False + }, + { + 'sql': "insert into {{ target.schema }}.audit values ('{{ this }}')", + 'transaction': False + } ], } } diff --git a/test/integration/059_source_overrides_test/data/expected_result.csv b/test/integration/059_source_overrides_test/data/expected_result.csv new file mode 100644 index 00000000000..2d75f7658bb --- /dev/null +++ b/test/integration/059_source_overrides_test/data/expected_result.csv @@ -0,0 +1,5 @@ +letter,color +c,cyan +m,magenta +y,yellow +k,key diff --git a/test/integration/059_source_overrides_test/data/my_real_other_seed.csv b/test/integration/059_source_overrides_test/data/my_real_other_seed.csv new file mode 100644 index 00000000000..defeee5ce23 --- /dev/null +++ b/test/integration/059_source_overrides_test/data/my_real_other_seed.csv @@ -0,0 +1,5 @@ +id,letter +1,c +2,m +3,y +4,k diff --git a/test/integration/059_source_overrides_test/data/my_real_seed.csv b/test/integration/059_source_overrides_test/data/my_real_seed.csv new file mode 100644 index 00000000000..ff44257bd6b --- /dev/null +++ b/test/integration/059_source_overrides_test/data/my_real_seed.csv @@ -0,0 +1,6 @@ +id,color +1,cyan +2,magenta +3,yellow +4,key +5,NULL diff --git a/test/integration/059_source_overrides_test/dupe-models/schema1.yml b/test/integration/059_source_overrides_test/dupe-models/schema1.yml new file mode 100644 index 00000000000..778618d51f8 --- /dev/null +++ b/test/integration/059_source_overrides_test/dupe-models/schema1.yml @@ -0,0 +1,26 @@ +version: 2 +sources: + - name: my_source + overrides: localdep + schema: "{{ target.schema }}" + database: "{{ target.database }}" + freshness: + error_after: {count: 3, period: day} + tables: + - name: my_table + identifier: my_real_seed + # on the override, the "color" column is only unique, it can be null! + columns: + - name: id + tests: + - not_null + - unique + - name: color + tests: + - unique + - name: my_other_table + identifier: my_real_other_seed + - name: snapshot_freshness + identifier: snapshot_freshness_base + freshness: + error_after: {count: 1, period: day} diff --git a/test/integration/059_source_overrides_test/dupe-models/schema2.yml b/test/integration/059_source_overrides_test/dupe-models/schema2.yml new file mode 100644 index 00000000000..778618d51f8 --- /dev/null +++ b/test/integration/059_source_overrides_test/dupe-models/schema2.yml @@ -0,0 +1,26 @@ +version: 2 +sources: + - name: my_source + overrides: localdep + schema: "{{ target.schema }}" + database: "{{ target.database }}" + freshness: + error_after: {count: 3, period: day} + tables: + - name: my_table + identifier: my_real_seed + # on the override, the "color" column is only unique, it can be null! + columns: + - name: id + tests: + - not_null + - unique + - name: color + tests: + - unique + - name: my_other_table + identifier: my_real_other_seed + - name: snapshot_freshness + identifier: snapshot_freshness_base + freshness: + error_after: {count: 1, period: day} diff --git a/test/integration/059_source_overrides_test/local_dependency/data/keep/never_fresh.csv b/test/integration/059_source_overrides_test/local_dependency/data/keep/never_fresh.csv new file mode 100644 index 00000000000..d7fd6c7b91d --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/data/keep/never_fresh.csv @@ -0,0 +1,51 @@ +favorite_color,id,first_name,email,ip_address,updated_at +blue,1,Larry,lking0@miitbeian.gov.cn,'69.135.206.194',2008-09-12 19:08:31 +blue,2,Larry,lperkins1@toplist.cz,'64.210.133.162',1978-05-09 04:15:14 +blue,3,Anna,amontgomery2@miitbeian.gov.cn,'168.104.64.114',2011-10-16 04:07:57 +blue,4,Sandra,sgeorge3@livejournal.com,'229.235.252.98',1973-07-19 10:52:43 +blue,5,Fred,fwoods4@google.cn,'78.229.170.124',2012-09-30 16:38:29 +blue,6,Stephen,shanson5@livejournal.com,'182.227.157.105',1995-11-07 21:40:50 +blue,7,William,wmartinez6@upenn.edu,'135.139.249.50',1982-09-05 03:11:59 +blue,8,Jessica,jlong7@hao123.com,'203.62.178.210',1991-10-16 11:03:15 +blue,9,Douglas,dwhite8@tamu.edu,'178.187.247.1',1979-10-01 09:49:48 +blue,10,Lisa,lcoleman9@nydailynews.com,'168.234.128.249',2011-05-26 07:45:49 +blue,11,Ralph,rfieldsa@home.pl,'55.152.163.149',1972-11-18 19:06:11 +blue,12,Louise,lnicholsb@samsung.com,'141.116.153.154',2014-11-25 20:56:14 +blue,13,Clarence,cduncanc@sfgate.com,'81.171.31.133',2011-11-17 07:02:36 +blue,14,Daniel,dfranklind@omniture.com,'8.204.211.37',1980-09-13 00:09:04 +blue,15,Katherine,klanee@auda.org.au,'176.96.134.59',1997-08-22 19:36:56 +blue,16,Billy,bwardf@wikia.com,'214.108.78.85',2003-10-19 02:14:47 +blue,17,Annie,agarzag@ocn.ne.jp,'190.108.42.70',1988-10-28 15:12:35 +blue,18,Shirley,scolemanh@fastcompany.com,'109.251.164.84',1988-08-24 10:50:57 +blue,19,Roger,rfrazieri@scribd.com,'38.145.218.108',1985-12-31 15:17:15 +blue,20,Lillian,lstanleyj@goodreads.com,'47.57.236.17',1970-06-08 02:09:05 +blue,21,Aaron,arodriguezk@nps.gov,'205.245.118.221',1985-10-11 23:07:49 +blue,22,Patrick,pparkerl@techcrunch.com,'19.8.100.182',2006-03-29 12:53:56 +blue,23,Phillip,pmorenom@intel.com,'41.38.254.103',2011-11-07 15:35:43 +blue,24,Henry,hgarcian@newsvine.com,'1.191.216.252',2008-08-28 08:30:44 +blue,25,Irene,iturnero@opera.com,'50.17.60.190',1994-04-01 07:15:02 +blue,26,Andrew,adunnp@pen.io,'123.52.253.176',2000-11-01 06:03:25 +blue,27,David,dgutierrezq@wp.com,'238.23.203.42',1988-01-25 07:29:18 +blue,28,Henry,hsanchezr@cyberchimps.com,'248.102.2.185',1983-01-01 13:36:37 +blue,29,Evelyn,epetersons@gizmodo.com,'32.80.46.119',1979-07-16 17:24:12 +blue,30,Tammy,tmitchellt@purevolume.com,'249.246.167.88',2001-04-03 10:00:23 +blue,31,Jacqueline,jlittleu@domainmarket.com,'127.181.97.47',1986-02-11 21:35:50 +blue,32,Earl,eortizv@opera.com,'166.47.248.240',1996-07-06 08:16:27 +blue,33,Juan,jgordonw@sciencedirect.com,'71.77.2.200',1987-01-31 03:46:44 +blue,34,Diane,dhowellx@nyu.edu,'140.94.133.12',1994-06-11 02:30:05 +blue,35,Randy,rkennedyy@microsoft.com,'73.255.34.196',2005-05-26 20:28:39 +blue,36,Janice,jriveraz@time.com,'22.214.227.32',1990-02-09 04:16:52 +blue,37,Laura,lperry10@diigo.com,'159.148.145.73',2015-03-17 05:59:25 +blue,38,Gary,gray11@statcounter.com,'40.193.124.56',1970-01-27 10:04:51 +blue,39,Jesse,jmcdonald12@typepad.com,'31.7.86.103',2009-03-14 08:14:29 +blue,40,Sandra,sgonzalez13@goodreads.com,'223.80.168.239',1993-05-21 14:08:54 +blue,41,Scott,smoore14@archive.org,'38.238.46.83',1980-08-30 11:16:56 +blue,42,Phillip,pevans15@cisco.com,'158.234.59.34',2011-12-15 23:26:31 +blue,43,Steven,sriley16@google.ca,'90.247.57.68',2011-10-29 19:03:28 +blue,44,Deborah,dbrown17@hexun.com,'179.125.143.240',1995-04-10 14:36:07 +blue,45,Lori,lross18@ow.ly,'64.80.162.180',1980-12-27 16:49:15 +blue,46,Sean,sjackson19@tumblr.com,'240.116.183.69',1988-06-12 21:24:45 +blue,47,Terry,tbarnes1a@163.com,'118.38.213.137',1997-09-22 16:43:19 +blue,48,Dorothy,dross1b@ebay.com,'116.81.76.49',2005-02-28 13:33:24 +blue,49,Samuel,swashington1c@house.gov,'38.191.253.40',1989-01-19 21:15:48 +blue,50,Ralph,rcarter1d@tinyurl.com,'104.84.60.174',2007-08-11 10:21:49 diff --git a/test/integration/059_source_overrides_test/local_dependency/data/keep/snapshot_freshness_base.csv b/test/integration/059_source_overrides_test/local_dependency/data/keep/snapshot_freshness_base.csv new file mode 100644 index 00000000000..a8f87412ef5 --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/data/keep/snapshot_freshness_base.csv @@ -0,0 +1,101 @@ +favorite_color,id,first_name,email,ip_address,updated_at +blue,1,Larry,lking0@miitbeian.gov.cn,'69.135.206.194',2008-09-12 19:08:31 +blue,2,Larry,lperkins1@toplist.cz,'64.210.133.162',1978-05-09 04:15:14 +blue,3,Anna,amontgomery2@miitbeian.gov.cn,'168.104.64.114',2011-10-16 04:07:57 +blue,4,Sandra,sgeorge3@livejournal.com,'229.235.252.98',1973-07-19 10:52:43 +blue,5,Fred,fwoods4@google.cn,'78.229.170.124',2012-09-30 16:38:29 +blue,6,Stephen,shanson5@livejournal.com,'182.227.157.105',1995-11-07 21:40:50 +blue,7,William,wmartinez6@upenn.edu,'135.139.249.50',1982-09-05 03:11:59 +blue,8,Jessica,jlong7@hao123.com,'203.62.178.210',1991-10-16 11:03:15 +blue,9,Douglas,dwhite8@tamu.edu,'178.187.247.1',1979-10-01 09:49:48 +blue,10,Lisa,lcoleman9@nydailynews.com,'168.234.128.249',2011-05-26 07:45:49 +blue,11,Ralph,rfieldsa@home.pl,'55.152.163.149',1972-11-18 19:06:11 +blue,12,Louise,lnicholsb@samsung.com,'141.116.153.154',2014-11-25 20:56:14 +blue,13,Clarence,cduncanc@sfgate.com,'81.171.31.133',2011-11-17 07:02:36 +blue,14,Daniel,dfranklind@omniture.com,'8.204.211.37',1980-09-13 00:09:04 +blue,15,Katherine,klanee@auda.org.au,'176.96.134.59',1997-08-22 19:36:56 +blue,16,Billy,bwardf@wikia.com,'214.108.78.85',2003-10-19 02:14:47 +blue,17,Annie,agarzag@ocn.ne.jp,'190.108.42.70',1988-10-28 15:12:35 +blue,18,Shirley,scolemanh@fastcompany.com,'109.251.164.84',1988-08-24 10:50:57 +blue,19,Roger,rfrazieri@scribd.com,'38.145.218.108',1985-12-31 15:17:15 +blue,20,Lillian,lstanleyj@goodreads.com,'47.57.236.17',1970-06-08 02:09:05 +blue,21,Aaron,arodriguezk@nps.gov,'205.245.118.221',1985-10-11 23:07:49 +blue,22,Patrick,pparkerl@techcrunch.com,'19.8.100.182',2006-03-29 12:53:56 +blue,23,Phillip,pmorenom@intel.com,'41.38.254.103',2011-11-07 15:35:43 +blue,24,Henry,hgarcian@newsvine.com,'1.191.216.252',2008-08-28 08:30:44 +blue,25,Irene,iturnero@opera.com,'50.17.60.190',1994-04-01 07:15:02 +blue,26,Andrew,adunnp@pen.io,'123.52.253.176',2000-11-01 06:03:25 +blue,27,David,dgutierrezq@wp.com,'238.23.203.42',1988-01-25 07:29:18 +blue,28,Henry,hsanchezr@cyberchimps.com,'248.102.2.185',1983-01-01 13:36:37 +blue,29,Evelyn,epetersons@gizmodo.com,'32.80.46.119',1979-07-16 17:24:12 +blue,30,Tammy,tmitchellt@purevolume.com,'249.246.167.88',2001-04-03 10:00:23 +blue,31,Jacqueline,jlittleu@domainmarket.com,'127.181.97.47',1986-02-11 21:35:50 +blue,32,Earl,eortizv@opera.com,'166.47.248.240',1996-07-06 08:16:27 +blue,33,Juan,jgordonw@sciencedirect.com,'71.77.2.200',1987-01-31 03:46:44 +blue,34,Diane,dhowellx@nyu.edu,'140.94.133.12',1994-06-11 02:30:05 +blue,35,Randy,rkennedyy@microsoft.com,'73.255.34.196',2005-05-26 20:28:39 +blue,36,Janice,jriveraz@time.com,'22.214.227.32',1990-02-09 04:16:52 +blue,37,Laura,lperry10@diigo.com,'159.148.145.73',2015-03-17 05:59:25 +blue,38,Gary,gray11@statcounter.com,'40.193.124.56',1970-01-27 10:04:51 +blue,39,Jesse,jmcdonald12@typepad.com,'31.7.86.103',2009-03-14 08:14:29 +blue,40,Sandra,sgonzalez13@goodreads.com,'223.80.168.239',1993-05-21 14:08:54 +blue,41,Scott,smoore14@archive.org,'38.238.46.83',1980-08-30 11:16:56 +blue,42,Phillip,pevans15@cisco.com,'158.234.59.34',2011-12-15 23:26:31 +blue,43,Steven,sriley16@google.ca,'90.247.57.68',2011-10-29 19:03:28 +blue,44,Deborah,dbrown17@hexun.com,'179.125.143.240',1995-04-10 14:36:07 +blue,45,Lori,lross18@ow.ly,'64.80.162.180',1980-12-27 16:49:15 +blue,46,Sean,sjackson19@tumblr.com,'240.116.183.69',1988-06-12 21:24:45 +blue,47,Terry,tbarnes1a@163.com,'118.38.213.137',1997-09-22 16:43:19 +blue,48,Dorothy,dross1b@ebay.com,'116.81.76.49',2005-02-28 13:33:24 +blue,49,Samuel,swashington1c@house.gov,'38.191.253.40',1989-01-19 21:15:48 +blue,50,Ralph,rcarter1d@tinyurl.com,'104.84.60.174',2007-08-11 10:21:49 +green,51,Wayne,whudson1e@princeton.edu,'90.61.24.102',1983-07-03 16:58:12 +green,52,Rose,rjames1f@plala.or.jp,'240.83.81.10',1995-06-08 11:46:23 +green,53,Louise,lcox1g@theglobeandmail.com,'105.11.82.145',2016-09-19 14:45:51 +green,54,Kenneth,kjohnson1h@independent.co.uk,'139.5.45.94',1976-08-17 11:26:19 +green,55,Donna,dbrown1i@amazon.co.uk,'19.45.169.45',2006-05-27 16:51:40 +green,56,Johnny,jvasquez1j@trellian.com,'118.202.238.23',1975-11-17 08:42:32 +green,57,Patrick,pramirez1k@tamu.edu,'231.25.153.198',1997-08-06 11:51:09 +green,58,Helen,hlarson1l@prweb.com,'8.40.21.39',1993-08-04 19:53:40 +green,59,Patricia,pspencer1m@gmpg.org,'212.198.40.15',1977-08-03 16:37:27 +green,60,Joseph,jspencer1n@marriott.com,'13.15.63.238',2005-07-23 20:22:06 +green,61,Phillip,pschmidt1o@blogtalkradio.com,'177.98.201.190',1976-05-19 21:47:44 +green,62,Joan,jwebb1p@google.ru,'105.229.170.71',1972-09-07 17:53:47 +green,63,Phyllis,pkennedy1q@imgur.com,'35.145.8.244',2000-01-01 22:33:37 +green,64,Katherine,khunter1r@smh.com.au,'248.168.205.32',1991-01-09 06:40:24 +green,65,Laura,lvasquez1s@wiley.com,'128.129.115.152',1997-10-23 12:04:56 +green,66,Juan,jdunn1t@state.gov,'44.228.124.51',2004-11-10 05:07:35 +green,67,Judith,jholmes1u@wiley.com,'40.227.179.115',1977-08-02 17:01:45 +green,68,Beverly,bbaker1v@wufoo.com,'208.34.84.59',2016-03-06 20:07:23 +green,69,Lawrence,lcarr1w@flickr.com,'59.158.212.223',1988-09-13 06:07:21 +green,70,Gloria,gwilliams1x@mtv.com,'245.231.88.33',1995-03-18 22:32:46 +green,71,Steven,ssims1y@cbslocal.com,'104.50.58.255',2001-08-05 21:26:20 +green,72,Betty,bmills1z@arstechnica.com,'103.177.214.220',1981-12-14 21:26:54 +green,73,Mildred,mfuller20@prnewswire.com,'151.158.8.130',2000-04-19 10:13:55 +green,74,Donald,dday21@icq.com,'9.178.102.255',1972-12-03 00:58:24 +green,75,Eric,ethomas22@addtoany.com,'85.2.241.227',1992-11-01 05:59:30 +green,76,Joyce,jarmstrong23@sitemeter.com,'169.224.20.36',1985-10-24 06:50:01 +green,77,Maria,mmartinez24@amazonaws.com,'143.189.167.135',2005-10-05 05:17:42 +green,78,Harry,hburton25@youtube.com,'156.47.176.237',1978-03-26 05:53:33 +green,79,Kevin,klawrence26@hao123.com,'79.136.183.83',1994-10-12 04:38:52 +green,80,David,dhall27@prweb.com,'133.149.172.153',1976-12-15 16:24:24 +green,81,Kathy,kperry28@twitter.com,'229.242.72.228',1979-03-04 02:58:56 +green,82,Adam,aprice29@elegantthemes.com,'13.145.21.10',1982-11-07 11:46:59 +green,83,Brandon,bgriffin2a@va.gov,'73.249.128.212',2013-10-30 05:30:36 +green,84,Henry,hnguyen2b@discovery.com,'211.36.214.242',1985-01-09 06:37:27 +green,85,Eric,esanchez2c@edublogs.org,'191.166.188.251',2004-05-01 23:21:42 +green,86,Jason,jlee2d@jimdo.com,'193.92.16.182',1973-01-08 09:05:39 +green,87,Diana,drichards2e@istockphoto.com,'19.130.175.245',1994-10-05 22:50:49 +green,88,Andrea,awelch2f@abc.net.au,'94.155.233.96',2002-04-26 08:41:44 +green,89,Louis,lwagner2g@miitbeian.gov.cn,'26.217.34.111',2003-08-25 07:56:39 +green,90,Jane,jsims2h@seesaa.net,'43.4.220.135',1987-03-20 20:39:04 +green,91,Larry,lgrant2i@si.edu,'97.126.79.34',2000-09-07 20:26:19 +green,92,Louis,ldean2j@prnewswire.com,'37.148.40.127',2011-09-16 20:12:14 +green,93,Jennifer,jcampbell2k@xing.com,'38.106.254.142',1988-07-15 05:06:49 +green,94,Wayne,wcunningham2l@google.com.hk,'223.28.26.187',2009-12-15 06:16:54 +green,95,Lori,lstevens2m@icq.com,'181.250.181.58',1984-10-28 03:29:19 +green,96,Judy,jsimpson2n@marriott.com,'180.121.239.219',1986-02-07 15:18:10 +green,97,Phillip,phoward2o@usa.gov,'255.247.0.175',2002-12-26 08:44:45 +green,98,Gloria,gwalker2p@usa.gov,'156.140.7.128',1997-10-04 07:58:58 +green,99,Paul,pjohnson2q@umn.edu,'183.59.198.197',1991-11-14 12:33:55 +green,100,Frank,fgreene2r@blogspot.com,'150.143.68.121',2010-06-12 23:55:39 diff --git a/test/integration/059_source_overrides_test/local_dependency/data/my_other_seed.csv b/test/integration/059_source_overrides_test/local_dependency/data/my_other_seed.csv new file mode 100644 index 00000000000..ec44ccd4238 --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/data/my_other_seed.csv @@ -0,0 +1,4 @@ +id,letter +1,r +2,g +3,b diff --git a/test/integration/059_source_overrides_test/local_dependency/data/my_seed.csv b/test/integration/059_source_overrides_test/local_dependency/data/my_seed.csv new file mode 100644 index 00000000000..37493c909b8 --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/data/my_seed.csv @@ -0,0 +1,4 @@ +id,color +1,red +2,green +3,blue diff --git a/test/integration/059_source_overrides_test/local_dependency/dbt_project.yml b/test/integration/059_source_overrides_test/local_dependency/dbt_project.yml new file mode 100644 index 00000000000..2f57d5bc31d --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/dbt_project.yml @@ -0,0 +1,11 @@ +config-version: 2 +name: localdep + +version: '1.0' + +profile: 'default' + +seeds: + quote_columns: False + +data-paths: ['data'] diff --git a/test/integration/059_source_overrides_test/local_dependency/models/my_model.sql b/test/integration/059_source_overrides_test/local_dependency/models/my_model.sql new file mode 100644 index 00000000000..5be6422876f --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/models/my_model.sql @@ -0,0 +1,8 @@ +{{ config(materialized='table') }} +with colors as ( + select id, color from {{ source('my_source', 'my_table') }} +), +letters as ( + select id, letter from {{ source('my_source', 'my_other_table') }} +) +select letter, color from colors join letters using (id) diff --git a/test/integration/059_source_overrides_test/local_dependency/models/schema.yml b/test/integration/059_source_overrides_test/local_dependency/models/schema.yml new file mode 100644 index 00000000000..d4f8bef6b73 --- /dev/null +++ b/test/integration/059_source_overrides_test/local_dependency/models/schema.yml @@ -0,0 +1,43 @@ +version: 2 +sources: + - name: my_source + schema: invalid_schema + database: invalid_database + freshness: + error_after: {count: 3, period: hour} + tables: + - name: my_table + identifier: my_seed + columns: + - name: id + tests: + - unique + - not_null + - name: color + tests: + - unique + - not_null + - name: my_other_table + identifier: my_other_seed + columns: + - name: id + tests: + - unique + - not_null + - name: letter + tests: + - unique + - not_null + - name: snapshot_freshness + identifier: snapshot_freshness_base + loaded_at_field: updated_at + freshness: + error_after: {count: 1, period: hour} + - name: my_other_source + schema: "{{ target.schema }}" + database: "{{ target.database }}" + freshness: + error_after: {count: 1, period: day} + tables: + - name: never_fresh + loaded_at_field: updated_at diff --git a/test/integration/059_source_overrides_test/models/schema.yml b/test/integration/059_source_overrides_test/models/schema.yml new file mode 100644 index 00000000000..778618d51f8 --- /dev/null +++ b/test/integration/059_source_overrides_test/models/schema.yml @@ -0,0 +1,26 @@ +version: 2 +sources: + - name: my_source + overrides: localdep + schema: "{{ target.schema }}" + database: "{{ target.database }}" + freshness: + error_after: {count: 3, period: day} + tables: + - name: my_table + identifier: my_real_seed + # on the override, the "color" column is only unique, it can be null! + columns: + - name: id + tests: + - not_null + - unique + - name: color + tests: + - unique + - name: my_other_table + identifier: my_real_other_seed + - name: snapshot_freshness + identifier: snapshot_freshness_base + freshness: + error_after: {count: 1, period: day} diff --git a/test/integration/059_source_overrides_test/test_source_overrides.py b/test/integration/059_source_overrides_test/test_source_overrides.py new file mode 100644 index 00000000000..25a3563e9a2 --- /dev/null +++ b/test/integration/059_source_overrides_test/test_source_overrides.py @@ -0,0 +1,191 @@ +import os +from datetime import datetime, timedelta +from test.integration.base import DBTIntegrationTest, use_profile +from dbt.exceptions import CompilationException + + +class TestSourceOverrides(DBTIntegrationTest): + def setUp(self): + super().setUp() + self._id = 101 + + @property + def schema(self): + return "source_overrides_059" + + @property + def models(self): + return 'models' + + @property + def packages_config(self): + return { + 'packages': [ + {'local': 'local_dependency'}, + ], + } + + @property + def project_config(self): + return { + 'config-version': 2, + 'seeds': { + 'localdep': { + 'enabled': False, + 'keep': { + 'enabled': True, + } + }, + 'quote_columns': False, + }, + 'sources': { + 'localdep': { + 'my_other_source': { + 'enabled': False, + } + } + } + } + + def _set_updated_at_to(self, delta): + insert_time = datetime.utcnow() + delta + timestr = insert_time.strftime("%Y-%m-%d %H:%M:%S") + # favorite_color,id,first_name,email,ip_address,updated_at + insert_id = self._id + self._id += 1 + raw_sql = """INSERT INTO {schema}.{source} + ({quoted_columns}) + VALUES ( + 'blue',{id},'Jake','abc@example.com','192.168.1.1','{time}' + )""" + quoted_columns = ','.join( + self.adapter.quote(c) if self.adapter_type != 'bigquery' else c + for c in + ('favorite_color', 'id', 'first_name', 'email', 'ip_address', 'updated_at') + ) + self.run_sql( + raw_sql, + kwargs={ + 'schema': self.unique_schema(), + 'time': timestr, + 'id': insert_id, + 'source': self.adapter.quote('snapshot_freshness_base'), + 'quoted_columns': quoted_columns, + } + ) + + @use_profile('postgres') + def test_postgres_source_overrides(self): + # without running 'deps', our source overrides are invalid + _, stdout = self.run_dbt_and_capture(['compile'], strict=False) + self.assertIn('WARNING: During parsing, dbt encountered source overrides that had no target', stdout) + schema_path = os.path.join('models', 'schema.yml') + self.assertIn(f'Source localdep.my_source (in {schema_path})', stdout) + self.run_dbt(['deps']) + seed_results = self.run_dbt(['seed']) + assert len(seed_results) == 5 + + # There should be 7, as we disabled 1 test of the original 8 + test_results = self.run_dbt(['test']) + assert len(test_results) == 7 + + results = self.run_dbt(['run']) + assert len(results) == 1 + + self.assertTablesEqual('expected_result', 'my_model') + + # set the updated_at field of this seed to last week + self._set_updated_at_to(timedelta(days=-7)) + # if snapshot-freshness fails, freshness just didn't happen! + results = self.run_dbt( + ['source', 'snapshot-freshness'], expect_pass=False + ) + # we disabled my_other_source, so we only run the one freshness check + # in + self.assertEqual(len(results), 1) + # If snapshot-freshness passes, that means error_after was + # applied from the source override but not the source table override + self._set_updated_at_to(timedelta(days=-2)) + results = self.run_dbt( + ['source', 'snapshot-freshness'], expect_pass=False, + ) + self.assertEqual(len(results), 1) + + self._set_updated_at_to(timedelta(hours=-12)) + results = self.run_dbt( + ['source', 'snapshot-freshness'], expect_pass=True + ) + self.assertEqual(len(results), 1) + + self.use_default_project({ + 'sources': { + 'localdep': { + 'my_other_source': { + 'enabled': True, + } + } + } + }) + # enable my_other_source, snapshot freshness should fail due to the new + # not-fresh source + results = self.run_dbt( + ['source', 'snapshot-freshness'], expect_pass=False + ) + self.assertEqual(len(results), 2) + + +class TestSourceDuplicateOverrides(DBTIntegrationTest): + def setUp(self): + super().setUp() + self._id = 101 + + @property + def schema(self): + return "source_overrides_059" + + @property + def models(self): + return 'dupe-models' + + @property + def packages_config(self): + return { + 'packages': [ + {'local': 'local_dependency'}, + ], + } + + @property + def project_config(self): + return { + 'config-version': 2, + 'seeds': { + 'localdep': { + 'enabled': False, + 'keep': { + 'enabled': True, + } + }, + 'quote_columns': False, + }, + 'sources': { + 'localdep': { + 'my_other_source': { + 'enabled': False, + } + } + } + } + + @use_profile('postgres') + def test_postgres_source_duplicate_overrides(self): + self.run_dbt(['deps']) + with self.assertRaises(CompilationException) as exc: + self.run_dbt(['compile']) + + self.assertIn('dbt found two schema.yml entries for the same source named', str(exc.exception)) + self.assertIn('one of these files', str(exc.exception)) + schema1_path = os.path.join('dupe-models', 'schema1.yml') + schema2_path = os.path.join('dupe-models', 'schema2.yml') + self.assertIn(schema1_path, str(exc.exception)) + self.assertIn(schema2_path, str(exc.exception)) diff --git a/test/integration/base.py b/test/integration/base.py index f1e3c56a38f..d8dff6aeaea 100644 --- a/test/integration/base.py +++ b/test/integration/base.py @@ -70,6 +70,7 @@ def __init__(self, kwargs): self.which = 'run' self.single_threaded = False self.profiles_dir = None + self.project_dir = None self.__dict__.update(kwargs) @@ -523,7 +524,9 @@ def _drop_schemas(self): @property def project_config(self): - return {} + return { + 'config-version': 2, + } @property def profile_config(self): diff --git a/test/rpc/test_base.py b/test/rpc/test_base.py deleted file mode 100644 index e94f86d49a6..00000000000 --- a/test/rpc/test_base.py +++ /dev/null @@ -1,916 +0,0 @@ -# flake8: disable=redefined-outer-name -from datetime import datetime, timedelta -import time -import yaml -from .util import ( - ProjectDefinition, get_querier, -) - - -def test_rpc_basics( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - querier.async_wait_for_result(querier.run_sql('select 1 as id')) - - querier.async_wait_for_result(querier.run()) - - querier.async_wait_for_result( - querier.run_sql('select * from {{ ref("my_model") }}') - ) - - querier.async_wait_for_error( - querier.run_sql('select * from {{ reff("my_model") }}') - ) - - -def deps_with_packages(packages, bad_packages, project_dir, profiles_dir, schema): - project = ProjectDefinition( - models={ - 'my_model.sql': 'select 1 as id', - }, - packages={'packages': packages}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_dir, - profiles_dir=profiles_dir, - schema=schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - # we should be able to run sql queries at startup - querier.async_wait_for_result(querier.run_sql('select 1 as id')) - - # the status should be something positive - querier.is_result(querier.status()) - - # deps should pass - querier.async_wait_for_result(querier.deps()) - - # queries should work after deps - tok1 = querier.is_async_result(querier.run()) - tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) - - querier.is_result(querier.async_wait(tok2)) - querier.is_result(querier.async_wait(tok1)) - - # now break the project - project.packages['packages'] = bad_packages - project.write_packages(project_dir, remove=True) - - # queries should still work because we haven't reloaded - tok1 = querier.is_async_result(querier.run()) - tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) - - querier.is_result(querier.async_wait(tok2)) - querier.is_result(querier.async_wait(tok1)) - - # now run deps again, it should be sad - querier.async_wait_for_error(querier.deps()) - # it should also not be running. - result = querier.is_result(querier.ps(active=True, completed=False)) - assert result['rows'] == [] - - # fix packages again - project.packages['packages'] = packages - project.write_packages(project_dir, remove=True) - # keep queries broken, we haven't run deps yet - querier.is_error(querier.run()) - - # deps should pass now - querier.async_wait_for_result(querier.deps()) - querier.is_result(querier.status()) - - tok1 = querier.is_async_result(querier.run()) - tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) - - querier.is_result(querier.async_wait(tok2)) - querier.is_result(querier.async_wait(tok1)) - - -def test_rpc_deps_packages(project_root, profiles_root, postgres_profile, unique_schema): - packages = [{ - 'package': 'fishtown-analytics/dbt_utils', - 'version': '0.2.1', - }] - bad_packages = [{ - 'package': 'fishtown-analytics/dbt_util', - 'version': '0.2.1', - }] - deps_with_packages(packages, bad_packages, project_root, profiles_root, unique_schema) - - -def test_rpc_deps_git(project_root, profiles_root, postgres_profile, unique_schema): - packages = [{ - 'git': 'https://github.com/fishtown-analytics/dbt-utils.git', - 'revision': '0.2.1' - }] - # if you use a bad URL, git thinks it's a private repo and prompts for auth - bad_packages = [{ - 'git': 'https://github.com/fishtown-analytics/dbt-utils.git', - 'revision': 'not-a-real-revision' - }] - deps_with_packages(packages, bad_packages, project_root, profiles_root, unique_schema) - - -bad_schema_yml = ''' -version: 2 -sources: - - name: test_source - loader: custom - schema: "{{ var('test_run_schema') }}" - tables: - - name: test_table - identifier: source - tests: - - relationships: - # this is invalid - - column_name: favorite_color - - to: ref('descendant_model') - - field: favorite_color -''' - -fixed_schema_yml = ''' -version: 2 -sources: - - name: test_source - loader: custom - schema: "{{ var('test_run_schema') }}" - tables: - - name: test_table - identifier: source -''' - - -def test_rpc_status_error(project_root, profiles_root, postgres_profile, unique_schema): - project = ProjectDefinition( - models={ - 'descendant_model.sql': 'select * from {{ source("test_source", "test_table") }}', - 'schema.yml': bad_schema_yml, - } - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - criteria='error', - ) - with querier_ctx as querier: - - # the status should be an error result - result = querier.is_result(querier.status()) - assert 'error' in result - assert 'message' in result['error'] - assert 'Invalid test config' in result['error']['message'] - assert 'state' in result - assert result['state'] == 'error' - assert 'logs' in result - logs = result['logs'] - assert len(logs) > 0 - for key in ('message', 'timestamp', 'levelname', 'level'): - assert key in logs[0] - assert 'pid' in result - assert querier.server.pid == result['pid'] - - error = querier.is_error(querier.compile_sql('select 1 as id')) - assert 'code' in error - assert error['code'] == 10011 - assert 'message' in error - assert error['message'] == 'RPC server failed to compile project, call the "status" method for compile status' - assert 'data' in error - assert 'message' in error['data'] - assert 'Invalid test config' in error['data']['message'] - - # deps should fail because it still can't parse the manifest - querier.async_wait_for_error(querier.deps()) - - # and not resolve the issue - result = querier.is_result(querier.status()) - assert 'error' in result - assert 'message' in result['error'] - assert 'Invalid test config' in result['error']['message'] - - error = querier.is_error(querier.compile_sql('select 1 as id')) - assert 'code' in error - assert error['code'] == 10011 - - project.models['schema.yml'] = fixed_schema_yml - project.write_models(project_root, remove=True) - - # deps should work - querier.async_wait_for_result(querier.deps()) - - result = querier.is_result(querier.status()) - assert result.get('error') is None - assert 'state' in result - assert result['state'] == 'ready' - - querier.is_result(querier.compile_sql('select 1 as id')) - - -def test_gc_change_interval(project_root, profiles_root, postgres_profile, unique_schema): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - - for _ in range(10): - querier.async_wait_for_result(querier.run()) - - result = querier.is_result(querier.ps(True, True)) - assert len(result['rows']) == 10 - - result = querier.is_result(querier.gc(settings=dict(maxsize=1000, reapsize=5, auto_reap_age=0.1))) - - for k in ('deleted', 'missing', 'running'): - assert k in result - assert len(result[k]) == 0 - - time.sleep(0.5) - - result = querier.is_result(querier.ps(True, True)) - assert len(result['rows']) == 0 - - result = querier.is_result(querier.gc(settings=dict(maxsize=2, reapsize=5, auto_reap_age=100000))) - for k in ('deleted', 'missing', 'running'): - assert k in result - assert len(result[k]) == 0 - - time.sleep(0.5) - - for _ in range(10): - querier.async_wait_for_result(querier.run()) - - time.sleep(0.5) - result = querier.is_result(querier.ps(True, True)) - assert len(result['rows']) == 2 - - -def test_ps_poll_output_match(project_root, profiles_root, postgres_profile, unique_schema): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - - poll_result = querier.async_wait_for_result(querier.run()) - - result = querier.is_result(querier.ps(active=True, completed=True)) - assert 'rows' in result - rows = result['rows'] - assert len(rows) == 1 - ps_result = rows[0] - - for key in ('start', 'end', 'elapsed', 'state'): - assert ps_result[key] == poll_result[key] - - -macros_data = ''' -{% macro foo() %} - {{ return(1) }} -{% endmacro %} -{% macro bar(value) %} - {{ return(value + 1) }} -{% endmacro %} -{% macro quux(value) %} - {{ return(asdf) }} -{% endmacro %} -''' - - -def test_run_operation( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'}, - macros={ - 'my_macros.sql': macros_data, - } - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - poll_result = querier.async_wait_for_result( - querier.run_operation(macro='foo', args={}) - ) - - assert 'success' in poll_result - assert poll_result['success'] is True - - poll_result = querier.async_wait_for_result( - querier.run_operation(macro='bar', args={'value': 10}) - ) - - assert 'success' in poll_result - assert poll_result['success'] is True - - poll_result = querier.async_wait_for_result( - querier.run_operation(macro='baz', args={}), - state='failed', - ) - assert 'state' in poll_result - assert poll_result['state'] == 'failed' - - poll_result = querier.async_wait_for_result( - querier.run_operation(macro='quux', args={}) - ) - assert 'success' in poll_result - assert poll_result['success'] is True - - -def test_run_operation_cli( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'}, - macros={ - 'my_macros.sql': macros_data, - } - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - poll_result = querier.async_wait_for_result( - querier.cli_args(cli='run-operation foo') - ) - - assert 'success' in poll_result - assert poll_result['success'] is True - - bar_cmd = '''run-operation bar --args="{'value': 10}"''' - poll_result = querier.async_wait_for_result( - querier.cli_args(cli=bar_cmd) - ) - - assert 'success' in poll_result - assert poll_result['success'] is True - - poll_result = querier.async_wait_for_result( - querier.cli_args(cli='run-operation baz'), - state='failed', - ) - assert 'state' in poll_result - assert poll_result['state'] == 'failed' - - poll_result = querier.async_wait_for_result( - querier.cli_args(cli='run-operation quux') - ) - assert 'success' in poll_result - assert poll_result['success'] is True - - -snapshot_data = ''' -{% snapshot snapshot_actual %} - - {{ - config( - target_database=database, - target_schema=schema, - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) - }} - select 1 as id, '2019-10-31 23:59:40' as updated_at - -{% endsnapshot %} -''' - - -def test_snapshots( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - snapshots={'my_snapshots.sql': snapshot_data}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.snapshot()) - assert len(results['results']) == 1 - - results = querier.async_wait_for_result(querier.snapshot( - exclude=['snapshot_actual']) - ) - - results = querier.async_wait_for_result( - querier.snapshot(select=['snapshot_actual']) - ) - assert len(results['results']) == 1 - - -def test_snapshots_cli( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - snapshots={'my_snapshots.sql': snapshot_data}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result( - querier.cli_args(cli='snapshot') - ) - assert len(results['results']) == 1 - - results = querier.async_wait_for_result( - querier.cli_args(cli='snapshot --exclude=snapshot_actual') - ) - assert len(results['results']) == 0 - - results = querier.async_wait_for_result( - querier.cli_args(cli='snapshot --select=snapshot_actual') - ) - assert len(results['results']) == 1 - - -def assert_has_threads(results, num_threads): - assert 'logs' in results - c_logs = [l for l in results['logs'] if 'Concurrency' in l['message']] - assert len(c_logs) == 1, \ - f'Got invalid number of concurrency logs ({len(c_logs)})' - assert 'message' in c_logs[0] - assert f'Concurrency: {num_threads} threads' in c_logs[0]['message'] - - -def test_rpc_run_threads( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.run(threads=5)) - assert_has_threads(results, 5) - - results = querier.async_wait_for_result( - querier.cli_args('run --threads=7') - ) - assert_has_threads(results, 7) - - -def test_rpc_compile_threads( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.compile(threads=5)) - assert_has_threads(results, 5) - - results = querier.async_wait_for_result( - querier.cli_args('compile --threads=7') - ) - assert_has_threads(results, 7) - - -def test_rpc_test_threads( - project_root, profiles_root, postgres_profile, unique_schema -): - schema_yaml = { - 'version': 2, - 'models': [{ - 'name': 'my_model', - 'columns': [ - { - 'name': 'id', - 'tests': ['not_null', 'unique'], - }, - ], - }], - } - project = ProjectDefinition( - models={ - 'my_model.sql': 'select 1 as id', - 'schema.yml': yaml.safe_dump(schema_yaml)} - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - with querier_ctx as querier: - # first run dbt to get the model built - querier.async_wait_for_result(querier.run()) - - results = querier.async_wait_for_result(querier.test(threads=5)) - assert_has_threads(results, 5) - - results = querier.async_wait_for_result( - querier.cli_args('test --threads=7') - ) - assert_has_threads(results, 7) - - -def test_rpc_snapshot_threads( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - snapshots={'my_snapshots.sql': snapshot_data}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.snapshot(threads=5)) - assert_has_threads(results, 5) - - results = querier.async_wait_for_result( - querier.cli_args('snapshot --threads=7') - ) - assert_has_threads(results, 7) - - -def test_rpc_seed_threads( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - project_data={'seeds': {'quote_columns': False}}, - seeds={'data.csv': 'a,b\n1,hello\n2,goodbye'}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.seed(threads=5)) - assert_has_threads(results, 5) - - results = querier.async_wait_for_result( - querier.cli_args('seed --threads=7') - ) - assert_has_threads(results, 7) - - -def test_rpc_seed_include_exclude( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - project_data={'seeds': {'quote_columns': False}}, - seeds={ - 'data_1.csv': 'a,b\n1,hello\n2,goodbye', - 'data_2.csv': 'a,b\n1,data', - }, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.seed(select=['data_1'])) - assert len(results['results']) == 1 - results = querier.async_wait_for_result(querier.seed(select='data_1')) - assert len(results['results']) == 1 - results = querier.async_wait_for_result(querier.cli_args('seed --select=data_1')) - assert len(results['results']) == 1 - - results = querier.async_wait_for_result(querier.seed(exclude=['data_2'])) - assert len(results['results']) == 1 - results = querier.async_wait_for_result(querier.seed(exclude='data_2')) - assert len(results['results']) == 1 - results = querier.async_wait_for_result(querier.cli_args('seed --exclude=data_2')) - assert len(results['results']) == 1 - - -sleeper_sql = ''' -{{ log('test output', info=True) }} -{{ run_query('select * from pg_sleep(20)') }} -select 1 as id -''' - -logger_sql = ''' -{{ log('test output', info=True) }} -select 1 as id -''' - - -def find_log_ordering(logs, *messages) -> bool: - log_iter = iter(logs) - found = 0 - - while found < len(messages): - try: - log = next(log_iter) - except StopIteration: - return False - if messages[found] in log['message']: - found += 1 - return True - - -def poll_logs(querier, token): - has_log = querier.is_result(querier.poll(token)) - assert 'logs' in has_log - return has_log['logs'] - - -def wait_for_log_ordering(querier, token, attempts, *messages) -> int: - for _ in range(attempts): - time.sleep(1) - logs = poll_logs(querier, token) - if find_log_ordering(logs, *messages): - return len(logs) - - msg = 'Never got expected messages {} in {}'.format( - messages, - [log['message'] for log in logs], - ) - assert False, msg - - -def test_get_status( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={'my_model.sql': 'select 1 as id'}, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - # make sure that logs_start/logs are honored during a task - token = querier.is_async_result(querier.run_sql(sleeper_sql)) - - no_log = querier.is_result(querier.poll(token, logs=False)) - assert 'logs' in no_log - assert len(no_log['logs']) == 0 - - num_logs = wait_for_log_ordering(querier, token, 10) - - trunc_log = querier.is_result(querier.poll(token, logs_start=num_logs)) - assert 'logs' in trunc_log - assert len(trunc_log['logs']) == 0 - - querier.kill(token) - - # make sure that logs_start/logs are honored after a task has finished - token = querier.is_async_result(querier.run_sql(logger_sql)) - result = querier.is_result(querier.async_wait(token)) - assert 'logs' in result - num_logs = len(result['logs']) - assert num_logs > 0 - - result = querier.is_result(querier.poll(token, logs_start=num_logs)) - assert 'logs' in result - assert len(result['logs']) == 0 - - result = querier.is_result(querier.poll(token, logs=False)) - assert 'logs' in result - assert len(result['logs']) == 0 - - -source_freshness_schema_yml = ''' -version: 2 -sources: - - name: test_source - loaded_at_field: b - schema: {schema} - freshness: - warn_after: {{count: 10, period: hour}} - error_after: {{count: 1, period: day}} - tables: - - name: test_table - identifier: source - - name: failure_table - identifier: other_source -''' - - -def test_source_freshness( - project_root, profiles_root, postgres_profile, unique_schema -): - start_time = datetime.utcnow() - warn_me = start_time - timedelta(hours=18) - error_me = start_time - timedelta(days=2) - # this should trigger a 'warn' - project = ProjectDefinition( - project_data={'seeds': {'quote_columns': False}}, - seeds={ - 'source.csv': 'a,b\n1,{}\n'.format(error_me.strftime('%Y-%m-%d %H:%M:%S')), - 'other_source.csv': 'a,b\n1,{}\n'.format(error_me.strftime('%Y-%m-%d %H:%M:%S')) - }, - models={ - 'sources.yml': source_freshness_schema_yml.format(schema=unique_schema), - }, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - seeds = querier.async_wait_for_result(querier.seed()) - assert len(seeds['results']) == 2 - # should error - error_results = querier.async_wait_for_result(querier.snapshot_freshness(), state='failed') - assert len(error_results['results']) == 2 - for result in error_results['results']: - assert result['status'] == 'error' - error_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness'), state='failed') - assert len(error_results['results']) == 2 - for result in error_results['results']: - assert result['status'] == 'error' - - project.seeds['source.csv'] += '2,{}\n'.format(warn_me.strftime('%Y-%m-%d %H:%M:%S')) - project.write_seeds(project_root, remove=True) - querier.async_wait_for_result(querier.seed()) - # should warn - warn_results = querier.async_wait_for_result(querier.snapshot_freshness(select='test_source.test_table')) - assert len(warn_results['results']) == 1 - assert warn_results['results'][0]['status'] == 'warn' - warn_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness -s test_source.test_table')) - assert len(warn_results['results']) == 1 - assert warn_results['results'][0]['status'] == 'warn' - - project.seeds['source.csv'] += '3,{}\n'.format(start_time.strftime('%Y-%m-%d %H:%M:%S')) - project.write_seeds(project_root, remove=True) - querier.async_wait_for_result(querier.seed()) - # should pass! - pass_results = querier.async_wait_for_result(querier.snapshot_freshness(select=['test_source.test_table'])) - assert len(pass_results['results']) == 1 - assert pass_results['results'][0]['status'] == 'pass' - pass_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness --select test_source.test_table')) - assert len(pass_results['results']) == 1 - assert pass_results['results'][0]['status'] == 'pass' - - -def test_missing_tag_sighup( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={ - 'my_docs.md': '{% docs asdf %}have a close tag{% enddocs %}', - }, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - with querier_ctx as querier: - # everything is fine - assert querier.wait_for_status('ready') is True - - # write a junk docs file - project.models['my_docs.md'] = '{% docs asdf %}do not have a close tag' - project.write_models(project_root, remove=True) - - querier.sighup() - - assert querier.wait_for_status('error') is True - result = querier.is_result(querier.status()) - assert 'error' in result - assert 'message' in result['error'] - assert 'without finding a close tag for docs' in result['error']['message'] - - project.models['my_docs.md'] = '{% docs asdf %}have a close tag again{% enddocs %}' - project.write_models(project_root, remove=True) - - querier.sighup() - - assert querier.wait_for_status('ready') is True - - -def test_rpc_vars( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={ - 'my_model.sql': 'select {{ var("param") }} as id', - }, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.cli_args('run --vars "{param: 100}"')) - assert len(results['results']) == 1 - assert results['results'][0]['node']['compiled_sql'] == 'select 100 as id' - - -def test_get_manifest( - project_root, profiles_root, postgres_profile, unique_schema -): - project = ProjectDefinition( - models={ - 'my_model.sql': 'select 1 as id', - }, - ) - querier_ctx = get_querier( - project_def=project, - project_dir=project_root, - profiles_dir=profiles_root, - schema=unique_schema, - test_kwargs={}, - ) - - with querier_ctx as querier: - results = querier.async_wait_for_result(querier.cli_args('run')) - assert len(results['results']) == 1 - assert results['results'][0]['node']['compiled_sql'] == 'select 1 as id' - result = querier.async_wait_for_result(querier.get_manifest()) - assert 'manifest' in result - manifest = result['manifest'] - assert manifest['nodes']['model.test.my_model']['raw_sql'] == 'select 1 as id' - assert 'manifest' in result - manifest = result['manifest'] - assert manifest['nodes']['model.test.my_model']['compiled_sql'] == 'select 1 as id' diff --git a/test/rpc/test_compile.py b/test/rpc/test_compile.py new file mode 100644 index 00000000000..d6e7e0e4a2a --- /dev/null +++ b/test/rpc/test_compile.py @@ -0,0 +1,28 @@ +from .util import ( + assert_has_threads, + get_querier, + ProjectDefinition, +) + + +def test_rpc_compile_threads( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.compile(threads=5)) + assert_has_threads(results, 5) + + results = querier.async_wait_for_result( + querier.cli_args('compile --threads=7') + ) + assert_has_threads(results, 7) diff --git a/test/rpc/test_deps.py b/test/rpc/test_deps.py new file mode 100644 index 00000000000..ea83cfb9ec8 --- /dev/null +++ b/test/rpc/test_deps.py @@ -0,0 +1,96 @@ +from .util import ( + get_querier, + ProjectDefinition, +) + + +def deps_with_packages(packages, bad_packages, project_dir, profiles_dir, schema): + project = ProjectDefinition( + models={ + 'my_model.sql': 'select 1 as id', + }, + packages={'packages': packages}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_dir, + profiles_dir=profiles_dir, + schema=schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + # we should be able to run sql queries at startup + querier.async_wait_for_result(querier.run_sql('select 1 as id')) + + # the status should be something positive + querier.is_result(querier.status()) + + # deps should pass + querier.async_wait_for_result(querier.deps()) + + # queries should work after deps + tok1 = querier.is_async_result(querier.run()) + tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) + + querier.is_result(querier.async_wait(tok2)) + querier.is_result(querier.async_wait(tok1)) + + # now break the project + project.packages['packages'] = bad_packages + project.write_packages(project_dir, remove=True) + + # queries should still work because we haven't reloaded + tok1 = querier.is_async_result(querier.run()) + tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) + + querier.is_result(querier.async_wait(tok2)) + querier.is_result(querier.async_wait(tok1)) + + # now run deps again, it should be sad + querier.async_wait_for_error(querier.deps()) + # it should also not be running. + result = querier.is_result(querier.ps(active=True, completed=False)) + assert result['rows'] == [] + + # fix packages again + project.packages['packages'] = packages + project.write_packages(project_dir, remove=True) + # keep queries broken, we haven't run deps yet + querier.is_error(querier.run()) + + # deps should pass now + querier.async_wait_for_result(querier.deps()) + querier.is_result(querier.status()) + + tok1 = querier.is_async_result(querier.run()) + tok2 = querier.is_async_result(querier.run_sql('select 1 as id')) + + querier.is_result(querier.async_wait(tok2)) + querier.is_result(querier.async_wait(tok1)) + + +def test_rpc_deps_packages(project_root, profiles_root, postgres_profile, unique_schema): + packages = [{ + 'package': 'fishtown-analytics/dbt_utils', + 'version': '0.2.1', + }] + bad_packages = [{ + 'package': 'fishtown-analytics/dbt_util', + 'version': '0.2.1', + }] + deps_with_packages(packages, bad_packages, project_root, profiles_root, unique_schema) + + +def test_rpc_deps_git(project_root, profiles_root, postgres_profile, unique_schema): + packages = [{ + 'git': 'https://github.com/fishtown-analytics/dbt-utils.git', + 'revision': '0.2.1' + }] + # if you use a bad URL, git thinks it's a private repo and prompts for auth + bad_packages = [{ + 'git': 'https://github.com/fishtown-analytics/dbt-utils.git', + 'revision': 'not-a-real-revision' + }] + deps_with_packages(packages, bad_packages, project_root, profiles_root, unique_schema) + diff --git a/test/rpc/test_management.py b/test/rpc/test_management.py new file mode 100644 index 00000000000..370f629441f --- /dev/null +++ b/test/rpc/test_management.py @@ -0,0 +1,362 @@ +import time +from .util import ( + get_querier, + ProjectDefinition, +) + + +def test_rpc_basics( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + querier.async_wait_for_result(querier.run_sql('select 1 as id')) + + querier.async_wait_for_result(querier.run()) + + querier.async_wait_for_result( + querier.run_sql('select * from {{ ref("my_model") }}') + ) + + querier.async_wait_for_error( + querier.run_sql('select * from {{ reff("my_model") }}') + ) + + +bad_schema_yml = ''' +version: 2 +sources: + - name: test_source + loader: custom + schema: "{{ var('test_run_schema') }}" + tables: + - name: test_table + identifier: source + tests: + - relationships: + # this is invalid + - column_name: favorite_color + - to: ref('descendant_model') + - field: favorite_color +''' + +fixed_schema_yml = ''' +version: 2 +sources: + - name: test_source + loader: custom + schema: "{{ var('test_run_schema') }}" + tables: + - name: test_table + identifier: source +''' + + +def test_rpc_status_error(project_root, profiles_root, postgres_profile, unique_schema): + project = ProjectDefinition( + models={ + 'descendant_model.sql': 'select * from {{ source("test_source", "test_table") }}', + 'schema.yml': bad_schema_yml, + } + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + criteria='error', + ) + with querier_ctx as querier: + + # the status should be an error result + result = querier.is_result(querier.status()) + assert 'error' in result + assert 'message' in result['error'] + assert 'Invalid test config' in result['error']['message'] + assert 'state' in result + assert result['state'] == 'error' + assert 'logs' in result + logs = result['logs'] + assert len(logs) > 0 + for key in ('message', 'timestamp', 'levelname', 'level'): + assert key in logs[0] + assert 'pid' in result + assert querier.server.pid == result['pid'] + + error = querier.is_error(querier.compile_sql('select 1 as id')) + assert 'code' in error + assert error['code'] == 10011 + assert 'message' in error + assert error['message'] == 'RPC server failed to compile project, call the "status" method for compile status' + assert 'data' in error + assert 'message' in error['data'] + assert 'Invalid test config' in error['data']['message'] + + # deps should fail because it still can't parse the manifest + querier.async_wait_for_error(querier.deps()) + + # and not resolve the issue + result = querier.is_result(querier.status()) + assert 'error' in result + assert 'message' in result['error'] + assert 'Invalid test config' in result['error']['message'] + + error = querier.is_error(querier.compile_sql('select 1 as id')) + assert 'code' in error + assert error['code'] == 10011 + + project.models['schema.yml'] = fixed_schema_yml + project.write_models(project_root, remove=True) + + # deps should work + querier.async_wait_for_result(querier.deps()) + + result = querier.is_result(querier.status()) + assert result.get('error') is None + assert 'state' in result + assert result['state'] == 'ready' + + querier.is_result(querier.compile_sql('select 1 as id')) + + +def test_gc_change_interval(project_root, profiles_root, postgres_profile, unique_schema): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + + for _ in range(10): + querier.async_wait_for_result(querier.run()) + + result = querier.is_result(querier.ps(True, True)) + assert len(result['rows']) == 10 + + result = querier.is_result(querier.gc(settings=dict(maxsize=1000, reapsize=5, auto_reap_age=0.1))) + + for k in ('deleted', 'missing', 'running'): + assert k in result + assert len(result[k]) == 0 + + time.sleep(0.5) + + result = querier.is_result(querier.ps(True, True)) + assert len(result['rows']) == 0 + + result = querier.is_result(querier.gc(settings=dict(maxsize=2, reapsize=5, auto_reap_age=100000))) + for k in ('deleted', 'missing', 'running'): + assert k in result + assert len(result[k]) == 0 + + time.sleep(0.5) + + for _ in range(10): + querier.async_wait_for_result(querier.run()) + + time.sleep(0.5) + result = querier.is_result(querier.ps(True, True)) + assert len(result['rows']) == 2 + + +def test_ps_poll_output_match(project_root, profiles_root, postgres_profile, unique_schema): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + + poll_result = querier.async_wait_for_result(querier.run()) + + result = querier.is_result(querier.ps(active=True, completed=True)) + assert 'rows' in result + rows = result['rows'] + assert len(rows) == 1 + ps_result = rows[0] + + for key in ('start', 'end', 'elapsed', 'state'): + assert ps_result[key] == poll_result[key] + + +sleeper_sql = ''' +{{ log('test output', info=True) }} +{{ run_query('select * from pg_sleep(20)') }} +select 1 as id +''' + +logger_sql = ''' +{{ log('test output', info=True) }} +select 1 as id +''' + + +def find_log_ordering(logs, *messages) -> bool: + log_iter = iter(logs) + found = 0 + + while found < len(messages): + try: + log = next(log_iter) + except StopIteration: + return False + if messages[found] in log['message']: + found += 1 + return True + + +def poll_logs(querier, token): + has_log = querier.is_result(querier.poll(token)) + assert 'logs' in has_log + return has_log['logs'] + + +def wait_for_log_ordering(querier, token, attempts, *messages) -> int: + for _ in range(attempts): + time.sleep(1) + logs = poll_logs(querier, token) + if find_log_ordering(logs, *messages): + return len(logs) + + msg = 'Never got expected messages {} in {}'.format( + messages, + [log['message'] for log in logs], + ) + assert False, msg + + +def test_get_status( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + # make sure that logs_start/logs are honored during a task + token = querier.is_async_result(querier.run_sql(sleeper_sql)) + + no_log = querier.is_result(querier.poll(token, logs=False)) + assert 'logs' in no_log + assert len(no_log['logs']) == 0 + + num_logs = wait_for_log_ordering(querier, token, 10) + + trunc_log = querier.is_result(querier.poll(token, logs_start=num_logs)) + assert 'logs' in trunc_log + assert len(trunc_log['logs']) == 0 + + querier.kill(token) + + # make sure that logs_start/logs are honored after a task has finished + token = querier.is_async_result(querier.run_sql(logger_sql)) + result = querier.is_result(querier.async_wait(token)) + assert 'logs' in result + num_logs = len(result['logs']) + assert num_logs > 0 + + result = querier.is_result(querier.poll(token, logs_start=num_logs)) + assert 'logs' in result + assert len(result['logs']) == 0 + + result = querier.is_result(querier.poll(token, logs=False)) + assert 'logs' in result + assert len(result['logs']) == 0 + + +def test_missing_tag_sighup( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={ + 'my_docs.md': '{% docs asdf %}have a close tag{% enddocs %}', + }, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + with querier_ctx as querier: + # everything is fine + assert querier.wait_for_status('ready') is True + + # write a junk docs file + project.models['my_docs.md'] = '{% docs asdf %}do not have a close tag' + project.write_models(project_root, remove=True) + + querier.sighup() + + assert querier.wait_for_status('error') is True + result = querier.is_result(querier.status()) + assert 'error' in result + assert 'message' in result['error'] + assert 'without finding a close tag for docs' in result['error']['message'] + + project.models['my_docs.md'] = '{% docs asdf %}have a close tag again{% enddocs %}' + project.write_models(project_root, remove=True) + + querier.sighup() + + assert querier.wait_for_status('ready') is True + + +def test_get_manifest( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={ + 'my_model.sql': 'select 1 as id', + }, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.cli_args('run')) + assert len(results['results']) == 1 + assert results['results'][0]['node']['compiled_sql'] == 'select 1 as id' + result = querier.async_wait_for_result(querier.get_manifest()) + assert 'manifest' in result + manifest = result['manifest'] + assert manifest['nodes']['model.test.my_model']['raw_sql'] == 'select 1 as id' + assert 'manifest' in result + manifest = result['manifest'] + assert manifest['nodes']['model.test.my_model']['compiled_sql'] == 'select 1 as id' diff --git a/test/rpc/test_run.py b/test/rpc/test_run.py new file mode 100644 index 00000000000..7d1a44c8c01 --- /dev/null +++ b/test/rpc/test_run.py @@ -0,0 +1,77 @@ +from .util import ( + assert_has_threads, + get_querier, + ProjectDefinition, +) + + +def test_rpc_run_threads( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.run(threads=5)) + assert_has_threads(results, 5) + + results = querier.async_wait_for_result( + querier.cli_args('run --threads=7') + ) + assert_has_threads(results, 7) + + +def test_rpc_run_vars( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={ + 'my_model.sql': 'select {{ var("param") }} as id', + }, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.cli_args('run --vars "{param: 100}"')) + assert len(results['results']) == 1 + assert results['results'][0]['node']['compiled_sql'] == 'select 100 as id' + + +def test_rpc_run_vars_compiled( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={ + 'my_model.sql': '{{ config(materialized=var("materialized_var", "view")) }} select 1 as id', + }, + ) + + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.cli_args('run --vars "{materialized_var: table}"')) + assert len(results['results']) == 1 + assert results['results'][0]['node']['config']['materialized'] == 'table' + # make sure that `--vars` doesn't update global state - if it does, + # this run() will result in a view! + results = querier.async_wait_for_result(querier.cli_args('run')) + assert len(results['results']) == 1 + assert results['results'][0]['node']['config']['materialized'] == 'view' diff --git a/test/rpc/test_run_operation.py b/test/rpc/test_run_operation.py new file mode 100644 index 00000000000..f07f772d35e --- /dev/null +++ b/test/rpc/test_run_operation.py @@ -0,0 +1,110 @@ +from .util import ( + get_querier, + ProjectDefinition, +) + +macros_data = ''' +{% macro foo() %} + {{ return(1) }} +{% endmacro %} +{% macro bar(value) %} + {{ return(value + 1) }} +{% endmacro %} +{% macro quux(value) %} + {{ return(asdf) }} +{% endmacro %} +''' + + +def test_run_operation( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'}, + macros={ + 'my_macros.sql': macros_data, + } + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + poll_result = querier.async_wait_for_result( + querier.run_operation(macro='foo', args={}) + ) + + assert 'success' in poll_result + assert poll_result['success'] is True + + poll_result = querier.async_wait_for_result( + querier.run_operation(macro='bar', args={'value': 10}) + ) + + assert 'success' in poll_result + assert poll_result['success'] is True + + poll_result = querier.async_wait_for_result( + querier.run_operation(macro='baz', args={}), + state='failed', + ) + assert 'state' in poll_result + assert poll_result['state'] == 'failed' + + poll_result = querier.async_wait_for_result( + querier.run_operation(macro='quux', args={}) + ) + assert 'success' in poll_result + assert poll_result['success'] is True + + +def test_run_operation_cli( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + models={'my_model.sql': 'select 1 as id'}, + macros={ + 'my_macros.sql': macros_data, + } + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + poll_result = querier.async_wait_for_result( + querier.cli_args(cli='run-operation foo') + ) + + assert 'success' in poll_result + assert poll_result['success'] is True + + bar_cmd = '''run-operation bar --args="{'value': 10}"''' + poll_result = querier.async_wait_for_result( + querier.cli_args(cli=bar_cmd) + ) + + assert 'success' in poll_result + assert poll_result['success'] is True + + poll_result = querier.async_wait_for_result( + querier.cli_args(cli='run-operation baz'), + state='failed', + ) + assert 'state' in poll_result + assert poll_result['state'] == 'failed' + + poll_result = querier.async_wait_for_result( + querier.cli_args(cli='run-operation quux') + ) + assert 'success' in poll_result + assert poll_result['success'] is True + diff --git a/test/rpc/test_seed.py b/test/rpc/test_seed.py new file mode 100644 index 00000000000..8b8c7dc28c8 --- /dev/null +++ b/test/rpc/test_seed.py @@ -0,0 +1,64 @@ +from .util import ( + assert_has_threads, + get_querier, + ProjectDefinition, +) + + +def test_rpc_seed_threads( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + project_data={'seeds': {'config': {'quote_columns': False}}}, + seeds={'data.csv': 'a,b\n1,hello\n2,goodbye'}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.seed(threads=5)) + assert_has_threads(results, 5) + + results = querier.async_wait_for_result( + querier.cli_args('seed --threads=7') + ) + assert_has_threads(results, 7) + + +def test_rpc_seed_include_exclude( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + project_data={'seeds': {'config': {'quote_columns': False}}}, + seeds={ + 'data_1.csv': 'a,b\n1,hello\n2,goodbye', + 'data_2.csv': 'a,b\n1,data', + }, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.seed(select=['data_1'])) + assert len(results['results']) == 1 + results = querier.async_wait_for_result(querier.seed(select='data_1')) + assert len(results['results']) == 1 + results = querier.async_wait_for_result(querier.cli_args('seed --select=data_1')) + assert len(results['results']) == 1 + + results = querier.async_wait_for_result(querier.seed(exclude=['data_2'])) + assert len(results['results']) == 1 + results = querier.async_wait_for_result(querier.seed(exclude='data_2')) + assert len(results['results']) == 1 + results = querier.async_wait_for_result(querier.cli_args('seed --exclude=data_2')) + assert len(results['results']) == 1 diff --git a/test/rpc/test_snapshots.py b/test/rpc/test_snapshots.py new file mode 100644 index 00000000000..b05758dacfe --- /dev/null +++ b/test/rpc/test_snapshots.py @@ -0,0 +1,106 @@ +from .util import ( + assert_has_threads, + get_querier, + ProjectDefinition, +) + +snapshot_data = ''' +{% snapshot snapshot_actual %} + + {{ + config( + target_database=database, + target_schema=schema, + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) + }} + select 1 as id, '2019-10-31 23:59:40' as updated_at + +{% endsnapshot %} +''' + + +def test_snapshots( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + snapshots={'my_snapshots.sql': snapshot_data}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.snapshot()) + assert len(results['results']) == 1 + + results = querier.async_wait_for_result(querier.snapshot( + exclude=['snapshot_actual']) + ) + + results = querier.async_wait_for_result( + querier.snapshot(select=['snapshot_actual']) + ) + assert len(results['results']) == 1 + + +def test_snapshots_cli( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + snapshots={'my_snapshots.sql': snapshot_data}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result( + querier.cli_args(cli='snapshot') + ) + assert len(results['results']) == 1 + + results = querier.async_wait_for_result( + querier.cli_args(cli='snapshot --exclude=snapshot_actual') + ) + assert len(results['results']) == 0 + + results = querier.async_wait_for_result( + querier.cli_args(cli='snapshot --select=snapshot_actual') + ) + assert len(results['results']) == 1 + + +def test_rpc_snapshot_threads( + project_root, profiles_root, postgres_profile, unique_schema +): + project = ProjectDefinition( + snapshots={'my_snapshots.sql': snapshot_data}, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + results = querier.async_wait_for_result(querier.snapshot(threads=5)) + assert_has_threads(results, 5) + + results = querier.async_wait_for_result( + querier.cli_args('snapshot --threads=7') + ) + assert_has_threads(results, 7) + diff --git a/test/rpc/test_source_freshness.py b/test/rpc/test_source_freshness.py new file mode 100644 index 00000000000..323b1aeb45a --- /dev/null +++ b/test/rpc/test_source_freshness.py @@ -0,0 +1,82 @@ +from datetime import datetime, timedelta +from .util import ( + get_querier, + ProjectDefinition, +) + +source_freshness_schema_yml = ''' +version: 2 +sources: + - name: test_source + loaded_at_field: b + schema: {schema} + freshness: + warn_after: {{count: 10, period: hour}} + error_after: {{count: 1, period: day}} + tables: + - name: test_table + identifier: source + - name: failure_table + identifier: other_source +''' + + +def test_source_freshness( + project_root, profiles_root, postgres_profile, unique_schema +): + start_time = datetime.utcnow() + warn_me = start_time - timedelta(hours=18) + error_me = start_time - timedelta(days=2) + # this should trigger a 'warn' + project = ProjectDefinition( + project_data={'seeds': {'config': {'quote_columns': False}}}, + seeds={ + 'source.csv': 'a,b\n1,{}\n'.format(error_me.strftime('%Y-%m-%d %H:%M:%S')), + 'other_source.csv': 'a,b\n1,{}\n'.format(error_me.strftime('%Y-%m-%d %H:%M:%S')) + }, + models={ + 'sources.yml': source_freshness_schema_yml.format(schema=unique_schema), + }, + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + + with querier_ctx as querier: + seeds = querier.async_wait_for_result(querier.seed()) + assert len(seeds['results']) == 2 + # should error + error_results = querier.async_wait_for_result(querier.snapshot_freshness(), state='failed') + assert len(error_results['results']) == 2 + for result in error_results['results']: + assert result['status'] == 'error' + error_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness'), state='failed') + assert len(error_results['results']) == 2 + for result in error_results['results']: + assert result['status'] == 'error' + + project.seeds['source.csv'] += '2,{}\n'.format(warn_me.strftime('%Y-%m-%d %H:%M:%S')) + project.write_seeds(project_root, remove=True) + querier.async_wait_for_result(querier.seed()) + # should warn + warn_results = querier.async_wait_for_result(querier.snapshot_freshness(select='test_source.test_table')) + assert len(warn_results['results']) == 1 + assert warn_results['results'][0]['status'] == 'warn' + warn_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness -s test_source.test_table')) + assert len(warn_results['results']) == 1 + assert warn_results['results'][0]['status'] == 'warn' + + project.seeds['source.csv'] += '3,{}\n'.format(start_time.strftime('%Y-%m-%d %H:%M:%S')) + project.write_seeds(project_root, remove=True) + querier.async_wait_for_result(querier.seed()) + # should pass! + pass_results = querier.async_wait_for_result(querier.snapshot_freshness(select=['test_source.test_table'])) + assert len(pass_results['results']) == 1 + assert pass_results['results'][0]['status'] == 'pass' + pass_results = querier.async_wait_for_result(querier.cli_args('source snapshot-freshness --select test_source.test_table')) + assert len(pass_results['results']) == 1 + assert pass_results['results'][0]['status'] == 'pass' diff --git a/test/rpc/test_test.py b/test/rpc/test_test.py new file mode 100644 index 00000000000..5e2db6c6e18 --- /dev/null +++ b/test/rpc/test_test.py @@ -0,0 +1,46 @@ +import yaml +from .util import ( + assert_has_threads, + get_querier, + ProjectDefinition, +) + + +def test_rpc_test_threads( + project_root, profiles_root, postgres_profile, unique_schema +): + schema_yaml = { + 'version': 2, + 'models': [{ + 'name': 'my_model', + 'columns': [ + { + 'name': 'id', + 'tests': ['not_null', 'unique'], + }, + ], + }], + } + project = ProjectDefinition( + models={ + 'my_model.sql': 'select 1 as id', + 'schema.yml': yaml.safe_dump(schema_yaml)} + ) + querier_ctx = get_querier( + project_def=project, + project_dir=project_root, + profiles_dir=profiles_root, + schema=unique_schema, + test_kwargs={}, + ) + with querier_ctx as querier: + # first run dbt to get the model built + querier.async_wait_for_result(querier.run()) + + results = querier.async_wait_for_result(querier.test(threads=5)) + assert_has_threads(results, 5) + + results = querier.async_wait_for_result( + querier.cli_args('test --threads=7') + ) + assert_has_threads(results, 7) diff --git a/test/rpc/util.py b/test/rpc/util.py index c684238f95c..d2caf7579ef 100644 --- a/test/rpc/util.py +++ b/test/rpc/util.py @@ -37,7 +37,7 @@ def __init__( self.criteria = criteria self.error = None handle_and_check_args = [ - '--strict', 'rpc', '--log-cache-events', + 'rpc', '--log-cache-events', '--port', str(self.port), '--profiles-dir', profiles_dir ] @@ -473,6 +473,7 @@ def __init__( seeds=None, ): self.project = { + 'config-version': 2, 'name': name, 'version': version, 'profile': profile, @@ -550,6 +551,7 @@ def __init__(self, profiles_dir, which='run-operation', kwargs={}): self.which = which self.single_threaded = False self.profiles_dir = profiles_dir + self.project_dir = None self.profile = None self.target = None self.__dict__.update(kwargs) @@ -614,3 +616,12 @@ def get_querier( ) with schema_ctx, server_ctx as server: yield Querier(server) + + +def assert_has_threads(results, num_threads): + assert 'logs' in results + c_logs = [l for l in results['logs'] if 'Concurrency' in l['message']] + assert len(c_logs) == 1, \ + f'Got invalid number of concurrency logs ({len(c_logs)})' + assert 'message' in c_logs[0] + assert f'Concurrency: {num_threads} threads' in c_logs[0]['message'] diff --git a/test/unit/test_bigquery_adapter.py b/test/unit/test_bigquery_adapter.py index 8f88bb97e8f..5fc8fd87e2d 100644 --- a/test/unit/test_bigquery_adapter.py +++ b/test/unit/test_bigquery_adapter.py @@ -447,19 +447,11 @@ class TestBigQueryTableOptions(BaseTestBigQueryAdapter): def test_parse_partition_by(self): adapter = self.get_adapter('oauth') - self.assertEqual( - adapter.parse_partition_by("date(ts)").to_dict(), { - "field": "ts", - "data_type": "timestamp" - } - ) + with self.assertRaises(dbt.exceptions.CompilationException): + adapter.parse_partition_by("date(ts)") - self.assertEqual( - adapter.parse_partition_by("ts").to_dict(), { - "field": "ts", - "data_type": "date" - } - ) + with self.assertRaises(dbt.exceptions.CompilationException): + adapter.parse_partition_by("ts") self.assertEqual( adapter.parse_partition_by({ diff --git a/test/unit/test_compiler.py b/test/unit/test_compiler.py index 9fc0519e646..42cb010dd00 100644 --- a/test/unit/test_compiler.py +++ b/test/unit/test_compiler.py @@ -105,6 +105,7 @@ def test__prepend_ctes__already_has_cte(self): injected_sql='' ), }, + sources={}, docs={}, # '2018-02-14T09:15:13Z' generated_at=datetime(2018, 2, 14, 9, 15, 13), @@ -184,6 +185,7 @@ def test__prepend_ctes__no_ctes(self): compiled_sql=('select * from source_table') ), }, + sources={}, docs={}, generated_at='2018-02-14T09:15:13Z', disabled=[], @@ -269,6 +271,7 @@ def test__prepend_ctes(self): compiled_sql='select * from source_table' ), }, + sources={}, docs={}, generated_at='2018-02-14T09:15:13Z', disabled=[], @@ -371,6 +374,7 @@ def test__prepend_ctes__multiple_levels(self): compiled_sql='select * from source_table' ), }, + sources={}, docs={}, generated_at='2018-02-14T09:15:13Z', disabled=[], diff --git a/test/unit/test_config.py b/test/unit/test_config.py index 8a15d2d1a1f..3b5a589c020 100644 --- a/test/unit/test_config.py +++ b/test/unit/test_config.py @@ -34,8 +34,12 @@ def temp_cd(path): os.chdir(current_path) -def empty_renderer(): - return dbt.config.ConfigRenderer(generate_base_context({})) +def empty_profile_renderer(): + return dbt.config.renderer.ProfileRenderer(generate_base_context({})) + + +def empty_project_renderer(): + return dbt.config.renderer.DbtProjectYamlRenderer(generate_base_context({})) model_config = { @@ -224,7 +228,7 @@ def setUp(self): super().setUp() def from_raw_profiles(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() return dbt.config.Profile.from_raw_profiles( self.default_profile_data, 'default', renderer ) @@ -302,7 +306,7 @@ def test_missing_target(self): self.assertEqual(profile.credentials.type, 'postgres') def test_profile_invalid_project(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() with self.assertRaises(dbt.exceptions.DbtProjectError) as exc: dbt.config.Profile.from_raw_profiles( self.default_profile_data, 'invalid-profile', renderer @@ -313,7 +317,7 @@ def test_profile_invalid_project(self): self.assertIn('invalid-profile', str(exc.exception)) def test_profile_invalid_target(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() with self.assertRaises(dbt.exceptions.DbtProfileError) as exc: dbt.config.Profile.from_raw_profiles( self.default_profile_data, 'default', renderer, @@ -326,7 +330,7 @@ def test_profile_invalid_target(self): self.assertIn('- with-vars', str(exc.exception)) def test_no_outputs(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() with self.assertRaises(dbt.exceptions.DbtProfileError) as exc: dbt.config.Profile.from_raw_profiles( @@ -340,7 +344,7 @@ def test_neq(self): self.assertNotEqual(profile, object()) def test_eq(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() profile = dbt.config.Profile.from_raw_profiles( deepcopy(self.default_profile_data), 'default', renderer ) @@ -352,7 +356,7 @@ def test_eq(self): def test_invalid_env_vars(self): self.env_override['env_value_port'] = 'hello' - renderer = empty_renderer() + renderer = empty_profile_renderer() with mock.patch.dict(os.environ, self.env_override): with self.assertRaises(dbt.exceptions.DbtProfileError) as exc: dbt.config.Profile.from_raw_profile_info( @@ -372,7 +376,7 @@ def setUp(self): def from_raw_profile_info(self, raw_profile=None, profile_name='default', **kwargs): if raw_profile is None: raw_profile = self.default_profile_data['default'] - renderer = empty_renderer() + renderer = empty_profile_renderer() kw = { 'raw_profile': raw_profile, 'profile_name': profile_name, @@ -385,7 +389,7 @@ def from_args(self, project_profile_name='default', **kwargs): kw = { 'args': self.args, 'project_profile_name': project_profile_name, - 'renderer': empty_renderer() + 'renderer': empty_profile_renderer() } kw.update(kwargs) return dbt.config.Profile.render_from_args(**kw) @@ -510,7 +514,7 @@ def test_invalid_env_vars(self): def test_cli_and_env_vars(self): self.args.target = 'cli-and-env-vars' self.args.vars = '{"cli_value_host": "cli-postgres-host"}' - renderer = dbt.config.ConfigRenderer(generate_base_context({'cli_value_host': 'cli-postgres-host'})) + renderer = dbt.config.renderer.ProfileRenderer(generate_base_context({'cli_value_host': 'cli-postgres-host'})) with mock.patch.dict(os.environ, self.env_override): profile = self.from_args(renderer=renderer) from_raw = self.from_raw_profile_info( @@ -542,7 +546,7 @@ def test_empty_profile(self): self.assertIn('profiles.yml is empty', str(exc.exception)) def test_profile_with_empty_profile_data(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() with self.assertRaises(dbt.exceptions.DbtProfileError) as exc: dbt.config.Profile.from_raw_profiles( self.default_profile_data, 'empty_profile_data', renderer @@ -764,7 +768,7 @@ def test_invalid_project_name(self): self.assertIn('invalid-project-name', str(exc.exception)) def test_no_project(self): - renderer = empty_renderer() + renderer = empty_project_renderer() with self.assertRaises(dbt.exceptions.DbtProjectError) as exc: dbt.config.Project.from_project_root(self.project_dir, renderer) @@ -780,55 +784,6 @@ def test_unsupported_version(self): # allowed, because the RuntimeConfig checks, not the Project itself dbt.config.Project.from_project_config(self.default_project_data, None) - def test__no_unused_resource_config_paths(self): - self.default_project_data.update({ - 'models': model_config, - 'seeds': {}, - }) - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - - resource_fqns = {'models': model_fqns} - unused = project.get_unused_resource_config_paths(resource_fqns, []) - self.assertEqual(len(unused), 0) - - def test__unused_resource_config_paths(self): - self.default_project_data.update({ - 'models': model_config['my_package_name'], - 'seeds': {}, - }) - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - - resource_fqns = {'models': model_fqns} - unused = project.get_unused_resource_config_paths(resource_fqns, []) - self.assertEqual(len(unused), 3) - - def test__get_unused_resource_config_paths_empty(self): - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - unused = project.get_unused_resource_config_paths({'models': frozenset(( - ('my_test_project', 'foo', 'bar'), - ('my_test_project', 'foo', 'baz'), - ))}, []) - self.assertEqual(len(unused), 0) - - def test__warn_for_unused_resource_config_paths_empty(self): - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - dbt.flags.WARN_ERROR = True - try: - unused = project.warn_for_unused_resource_config_paths({'models': frozenset(( - ('my_test_project', 'foo', 'bar'), - ('my_test_project', 'foo', 'baz'), - ))}, []) - finally: - dbt.flags.WARN_ERROR = False - def test_none_values(self): self.default_project_data.update({ 'models': None, @@ -906,58 +861,6 @@ def test_custom_query_comment_append(self): self.assertEqual(project.query_comment.comment, 'run by user test') self.assertEqual(project.query_comment.append, True) -class TestProjectWithConfigs(BaseConfigTest): - def setUp(self): - self.profiles_dir = '/invalid-profiles-path' - self.project_dir = '/invalid-root-path' - super().setUp() - self.default_project_data['project-root'] = self.project_dir - self.default_project_data['models'] = { - 'enabled': True, - 'my_test_project': { - 'foo': { - 'materialized': 'view', - 'bar': { - 'materialized': 'table', - } - }, - 'baz': { - 'materialized': 'table', - } - } - } - self.used = {'models': frozenset(( - ('my_test_project', 'foo', 'bar'), - ('my_test_project', 'foo', 'baz'), - ))} - - def test__get_unused_resource_config_paths(self): - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - unused = project.get_unused_resource_config_paths(self.used, []) - self.assertEqual(len(unused), 1) - self.assertEqual(unused[0], ('models', 'my_test_project', 'baz')) - - @mock.patch.object(dbt.config.project, 'warn_or_error') - def test__warn_for_unused_resource_config_paths(self, warn_or_error): - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - unused = project.warn_for_unused_resource_config_paths(self.used, []) - warn_or_error.assert_called_once() - - def test__warn_for_unused_resource_config_paths_disabled(self): - project = dbt.config.Project.from_project_config( - self.default_project_data, None - ) - unused = project.get_unused_resource_config_paths( - self.used, - frozenset([('my_test_project', 'baz')]) - ) - - self.assertEqual(len(unused), 0) - class TestProjectFile(BaseFileTest): def setUp(self): @@ -967,7 +870,7 @@ def setUp(self): self.default_project_data['project-root'] = self.project_dir def test_from_project_root(self): - renderer = empty_renderer() + renderer = empty_project_renderer() project = dbt.config.Project.from_project_root(self.project_dir, renderer) from_config = dbt.config.Project.from_project_config( self.default_project_data, None @@ -977,7 +880,7 @@ def test_from_project_root(self): self.assertEqual(project.project_name, 'my_test_project') def test_with_invalid_package(self): - renderer = empty_renderer() + renderer = empty_project_renderer() self.write_packages({'invalid': ['not a package of any kind']}) with self.assertRaises(dbt.exceptions.DbtProjectError): dbt.config.Project.from_project_root(self.project_dir, renderer) @@ -1019,7 +922,7 @@ def setUp(self): self.default_project_data['project-root'] = self.project_dir def test_cli_and_env_vars(self): - renderer = dbt.config.ConfigRenderer(generate_base_context({'cli_version': '0.1.2'})) + renderer = dbt.config.renderer.DbtProjectYamlRenderer(generate_base_context({'cli_version': '0.1.2'})) with mock.patch.dict(os.environ, self.env_override): project = dbt.config.Project.from_project_root( self.project_dir, @@ -1044,7 +947,7 @@ def get_project(self): ) def get_profile(self): - renderer = empty_renderer() + renderer = empty_profile_renderer() return dbt.config.Profile.from_raw_profiles( self.default_profile_data, self.default_project_data['profile'], renderer ) @@ -1148,6 +1051,116 @@ def test_archive_not_allowed(self): with self.assertRaises(dbt.exceptions.DbtProjectError): self.get_project() + def test__no_unused_resource_config_paths(self): + self.default_project_data.update({ + 'models': model_config, + 'seeds': {}, + }) + project = self.from_parts() + + resource_fqns = {'models': model_fqns} + unused = project.get_unused_resource_config_paths(resource_fqns, []) + self.assertEqual(len(unused), 0) + + def test__unused_resource_config_paths(self): + self.default_project_data.update({ + 'models': model_config['my_package_name'], + 'seeds': {}, + }) + project = self.from_parts() + + resource_fqns = {'models': model_fqns} + unused = project.get_unused_resource_config_paths(resource_fqns, []) + self.assertEqual(len(unused), 3) + + def test__get_unused_resource_config_paths_empty(self): + project = self.from_parts() + unused = project.get_unused_resource_config_paths({'models': frozenset(( + ('my_test_project', 'foo', 'bar'), + ('my_test_project', 'foo', 'baz'), + ))}, []) + self.assertEqual(len(unused), 0) + + def test__warn_for_unused_resource_config_paths_empty(self): + project = self.from_parts() + dbt.flags.WARN_ERROR = True + try: + project.warn_for_unused_resource_config_paths({'models': frozenset(( + ('my_test_project', 'foo', 'bar'), + ('my_test_project', 'foo', 'baz'), + ))}, []) + finally: + dbt.flags.WARN_ERROR = False + + +class TestRuntimeConfigWithConfigs(BaseConfigTest): + def setUp(self): + self.profiles_dir = '/invalid-profiles-path' + self.project_dir = '/invalid-root-path' + super().setUp() + self.default_project_data['project-root'] = self.project_dir + self.default_project_data['models'] = { + 'enabled': True, + 'my_test_project': { + 'foo': { + 'materialized': 'view', + 'bar': { + 'materialized': 'table', + } + }, + 'baz': { + 'materialized': 'table', + } + } + } + self.used = {'models': frozenset(( + ('my_test_project', 'foo', 'bar'), + ('my_test_project', 'foo', 'baz'), + ))} + + def get_project(self): + return dbt.config.Project.from_project_config( + self.default_project_data, None + ) + + def get_profile(self): + renderer = empty_profile_renderer() + return dbt.config.Profile.from_raw_profiles( + self.default_profile_data, self.default_project_data['profile'], renderer + ) + + def from_parts(self, exc=None): + project = self.get_project() + profile = self.get_profile() + if exc is None: + return dbt.config.RuntimeConfig.from_parts(project, profile, self.args) + + with self.assertRaises(exc) as err: + dbt.config.RuntimeConfig.from_parts(project, profile, self.args) + return err + + + def test__get_unused_resource_config_paths(self): + project = self.from_parts() + unused = project.get_unused_resource_config_paths(self.used, []) + self.assertEqual(len(unused), 1) + self.assertEqual(unused[0], ('models', 'my_test_project', 'baz')) + + @mock.patch.object(dbt.config.runtime, 'warn_or_error') + def test__warn_for_unused_resource_config_paths(self, warn_or_error): + project = self.from_parts() + project.warn_for_unused_resource_config_paths(self.used, []) + warn_or_error.assert_called_once() + + def test__warn_for_unused_resource_config_paths_disabled(self): + project = self.from_parts() + unused = project.get_unused_resource_config_paths( + self.used, + frozenset([('my_test_project', 'baz')]) + ) + + self.assertEqual(len(unused), 0) + class TestRuntimeConfigFiles(BaseFileTest): def setUp(self): diff --git a/test/unit/test_context.py b/test/unit/test_context.py index 4352bbfa310..2ef99f4e248 100644 --- a/test/unit/test_context.py +++ b/test/unit/test_context.py @@ -7,10 +7,12 @@ # make sure 'postgres' is in PACKAGES from dbt.adapters import postgres # noqa +from dbt.adapters.base import AdapterConfig from dbt.clients.jinja import MacroStack from dbt.contracts.graph.parsed import ( ParsedModelNode, NodeConfig, DependsOn, ParsedMacro ) +from dbt.config.project import V1VarProvider from dbt.context import base, target, configured, providers, docs from dbt.node_types import NodeType import dbt.exceptions @@ -52,38 +54,52 @@ def setUp(self): columns={} ) self.context = mock.MagicMock() + self.provider = V1VarProvider({}, {}, {}) + self.config = mock.MagicMock( + config_version=1, vars=self.provider, cli_vars={}, project_name='root' + ) - def test_var_default_something(self): - var = providers.RuntimeVar(self.model, self.context, overrides={'foo': 'baz'}) + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_var_default_something(self, mock_get_cls): + self.config.cli_vars = {'foo': 'baz'} + var = providers.RuntimeVar(self.context, self.config, self.model) self.assertEqual(var('foo'), 'baz') self.assertEqual(var('foo', 'bar'), 'baz') - def test_var_default_none(self): - var = providers.RuntimeVar(self.model, self.context, overrides={'foo': None}) + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_var_default_none(self, mock_get_cls): + self.config.cli_vars = {'foo': None} + var = providers.RuntimeVar(self.context, self.config, self.model) self.assertEqual(var('foo'), None) self.assertEqual(var('foo', 'bar'), None) - def test_var_not_defined(self): - var = providers.RuntimeVar(self.model, self.context, overrides={}) + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_var_not_defined(self, mock_get_cls): + var = providers.RuntimeVar(self.context, self.config, self.model) self.assertEqual(var('foo', 'bar'), 'bar') with self.assertRaises(dbt.exceptions.CompilationException): var('foo') - def test_parser_var_default_something(self): - var = providers.ParseVar(self.model, self.context, overrides={'foo': 'baz'}) + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_parser_var_default_something(self, mock_get_cls): + self.config.cli_vars = {'foo': 'baz'} + var = providers.ParseVar(self.context, self.config, self.model) self.assertEqual(var('foo'), 'baz') self.assertEqual(var('foo', 'bar'), 'baz') - def test_parser_var_default_none(self): - var = providers.ParseVar(self.model, self.context, overrides={'foo': None}) + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_parser_var_default_none(self, mock_get_cls): + self.config.cli_vars = {'foo': None} + var = providers.ParseVar(self.context, self.config, self.model) self.assertEqual(var('foo'), None) self.assertEqual(var('foo', 'bar'), None) - def test_parser_var_not_defined(self): + @mock.patch('dbt.legacy_config_updater.get_config_class_by_name', return_value=AdapterConfig) + def test_parser_var_not_defined(self, mock_get_cls): # at parse-time, we should not raise if we encounter a missing var # that way disabled models don't get parse errors - var = providers.ParseVar(self.model, self.context, overrides={}) + var = providers.ParseVar(self.context, self.config, self.model) self.assertEqual(var('foo', 'bar'), 'bar') self.assertEqual(var('foo'), None) @@ -373,7 +389,7 @@ def test_model_parse_context(config, manifest, get_adapter): model=mock_model(), config=config, manifest=manifest, - source_config=mock.MagicMock(), + context_config=mock.MagicMock(), ) assert_has_keys(REQUIRED_MODEL_KEYS, MAYBE_KEYS, ctx) @@ -387,11 +403,6 @@ def test_model_runtime_context(config, manifest, get_adapter): assert_has_keys(REQUIRED_MODEL_KEYS, MAYBE_KEYS, ctx) -def test_docs_parse_context(config): - ctx = docs.generate_parser_docs(config, mock_model()) - assert_has_keys(REQUIRED_DOCS_KEYS, MAYBE_KEYS, ctx) - - def test_docs_runtime_context(config): ctx = docs.generate_runtime_docs(config, mock_model(), [], 'root') assert_has_keys(REQUIRED_DOCS_KEYS, MAYBE_KEYS, ctx) diff --git a/test/unit/test_contracts_graph_compiled.py b/test/unit/test_contracts_graph_compiled.py index 129e4bcce0e..4dc16616bff 100644 --- a/test/unit/test_contracts_graph_compiled.py +++ b/test/unit/test_contracts_graph_compiled.py @@ -235,7 +235,7 @@ def test_basic_uncompiled(self): 'quoting': {}, 'tags': [], 'vars': {}, - 'severity': 'error', + 'severity': 'ERROR', }, 'docs': {'show': True}, 'columns': {}, diff --git a/test/unit/test_contracts_graph_parsed.py b/test/unit/test_contracts_graph_parsed.py index 12028a66d79..dace3716bbb 100644 --- a/test/unit/test_contracts_graph_parsed.py +++ b/test/unit/test_contracts_graph_parsed.py @@ -1,19 +1,23 @@ import pickle from dbt.node_types import NodeType +from dbt.contracts.graph.model_config import ( + All, + NodeConfig, + TestConfig, + TimestampSnapshotConfig, + CheckSnapshotConfig, + SourceConfig, + EmptySnapshotConfig, + SnapshotStrategy, + Hook, +) from dbt.contracts.graph.parsed import ( ParsedModelNode, DependsOn, - NodeConfig, ColumnInfo, - Hook, ParsedSchemaTestNode, - TestConfig, ParsedSnapshotNode, - TimestampSnapshotConfig, - All, - CheckSnapshotConfig, - SnapshotStrategy, IntermediateSnapshotNode, ParsedNodePatch, ParsedMacro, @@ -654,7 +658,7 @@ def test_ok(self): 'quoting': {}, 'tags': [], 'vars': {}, - 'severity': 'error', + 'severity': 'ERROR', }, 'docs': {'show': True}, 'columns': {}, @@ -752,7 +756,7 @@ def _cfg_basic(self): return { 'column_types': {}, 'enabled': True, - 'materialized': 'view', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], @@ -782,7 +786,7 @@ def test_populated(self): cfg_dict = { 'column_types': {'a': 'text'}, 'enabled': True, - 'materialized': 'table', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [{'sql': 'insert into blah(a, b) select "1", 1', 'transaction': True}], 'pre-hook': [], @@ -798,7 +802,7 @@ def test_populated(self): } cfg = self.ContractType( column_types={'a': 'text'}, - materialized='table', + materialized='snapshot', post_hook=[Hook(sql='insert into blah(a, b) select "1", 1')], strategy=SnapshotStrategy.Timestamp, target_database='some_snapshot_db', @@ -829,7 +833,7 @@ def _cfg_ok(self): return { 'column_types': {}, 'enabled': True, - 'materialized': 'view', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], @@ -859,7 +863,7 @@ def test_populated(self): cfg_dict = { 'column_types': {'a': 'text'}, 'enabled': True, - 'materialized': 'table', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [{'sql': 'insert into blah(a, b) select "1", 1', 'transaction': True}], 'pre-hook': [], @@ -875,7 +879,7 @@ def test_populated(self): } cfg = self.ContractType( column_types={'a': 'text'}, - materialized='table', + materialized='snapshot', post_hook=[Hook(sql='insert into blah(a, b) select "1", 1')], strategy=SnapshotStrategy.Check, check_cols=['a', 'b'], @@ -928,7 +932,7 @@ def _ts_ok(self): 'config': { 'column_types': {}, 'enabled': True, - 'materialized': 'view', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], @@ -976,10 +980,10 @@ def test_timestamp_ok(self): ), ) - cfg = NodeConfig() + cfg = EmptySnapshotConfig() cfg._extra.update({ - 'unique_key': 'id', 'strategy': 'timestamp', + 'unique_key': 'id', 'updated_at': 'last_update', 'target_database': 'some_snapshot_db', 'target_schema': 'some_snapshot_schema', @@ -1037,7 +1041,7 @@ def test_check_ok(self): 'config': { 'column_types': {}, 'enabled': True, - 'materialized': 'view', + 'materialized': 'snapshot', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], @@ -1081,7 +1085,7 @@ def test_check_ok(self): target_schema='some_snapshot_schema', ), ) - cfg = NodeConfig() + cfg = EmptySnapshotConfig() cfg._extra.update({ 'unique_key': 'id', 'strategy': 'check', @@ -1317,6 +1321,9 @@ def test_basic(self): 'meta': {}, 'source_meta': {}, 'tags': [], + 'config': { + 'enabled': True, + } } source_def = self.ContractType( columns={}, @@ -1337,6 +1344,7 @@ def test_basic(self): source_name='my_source', unique_id='test.source.my_source.my_source_table', tags=[], + config=SourceConfig(), ) self.assert_symmetric(source_def, source_def_dict) minimum = self._minimum_dict() diff --git a/test/unit/test_contracts_graph_unparsed.py b/test/unit/test_contracts_graph_unparsed.py index 0ae9e83b9dc..1ddd8135822 100644 --- a/test/unit/test_contracts_graph_unparsed.py +++ b/test/unit/test_contracts_graph_unparsed.py @@ -316,7 +316,6 @@ def test_table_defaults(self): 'tests': [], 'columns': [], 'quoting': {}, - 'external': {}, 'freshness': {}, 'meta': {}, 'tags': [], @@ -328,7 +327,6 @@ def test_table_defaults(self): 'tests': [], 'columns': [], 'quoting': {'database': True}, - 'external': {}, 'freshness': {}, 'meta': {}, 'tags': [], diff --git a/test/unit/test_contracts_project.py b/test/unit/test_contracts_project.py index 3dad495460b..5664d68bc0d 100644 --- a/test/unit/test_contracts_project.py +++ b/test/unit/test_contracts_project.py @@ -2,10 +2,10 @@ from hologram import ValidationError -from dbt.contracts.project import Project +from dbt.contracts.project import ProjectV1 -class TestProject(ContractTestCase): - ContractType = Project +class TestProjectV1(ContractTestCase): + ContractType = ProjectV1 def test_minimal(self): dct = { @@ -14,7 +14,7 @@ def test_minimal(self): 'profile': 'test', 'project-root': '/usr/src/app', } - project = Project( + project = ProjectV1( name='test', version='1.0', profile='test', @@ -30,4 +30,4 @@ def test_invalid_name(self): 'project-root': '/usr/src/app', } with self.assertRaises(ValidationError): - Project.from_dict(dct) + ProjectV1.from_dict(dct) diff --git a/test/unit/test_docs_generate.py b/test/unit/test_docs_generate.py index 79c60eab37f..258371b9707 100644 --- a/test/unit/test_docs_generate.py +++ b/test/unit/test_docs_generate.py @@ -20,14 +20,16 @@ def tearDown(self): def map_uids(self, effects): results = { - generate.CatalogKey(db, sch, tbl): [uid] + generate.CatalogKey(db, sch, tbl): uid for db, sch, tbl, uid in effects } - self.mock_get_unique_id_mapping.return_value = results + self.mock_get_unique_id_mapping.return_value = results, {} def generate_catalog_dict(self, columns): + nodes, sources = generate.Catalog(columns).make_unique_id_map(self.manifest) result = generate.CatalogResults( - nodes=generate.Catalog(columns).make_unique_id_map(self.manifest), + nodes=nodes, + sources=sources, generated_at=datetime.utcnow(), errors=None, ) diff --git a/test/unit/test_manifest.py b/test/unit/test_manifest.py index 15d55a94192..d6f77eaf083 100644 --- a/test/unit/test_manifest.py +++ b/test/unit/test_manifest.py @@ -179,18 +179,41 @@ def setUp(self): raw_sql='does not matter' ), } + + self.sources = { + 'source.root.my_source.my_table': ParsedSourceDefinition( + database='raw', + schema='analytics', + resource_type=NodeType.Source, + identifier='some_source', + name='my_table', + source_name='my_source', + source_description='My source description', + description='Table description', + loader='a_loader', + unique_id='source.test.my_source.my_table', + fqn=['test', 'my_source', 'my_table'], + package_name='root', + root_path='', + path='schema.yml', + original_file_path='schema.yml', + ), + } for node in self.nested_nodes.values(): node.validate(node.to_dict()) + for source in self.sources.values(): + source.validate(source.to_dict()) @freezegun.freeze_time('2018-02-14T09:15:13Z') def test__no_nodes(self): - manifest = Manifest(nodes={}, macros={}, docs={}, + manifest = Manifest(nodes={}, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) self.assertEqual( manifest.writable_manifest().to_dict(), { 'nodes': {}, + 'sources': {}, 'macros': {}, 'parent_map': {}, 'child_map': {}, @@ -204,7 +227,7 @@ def test__no_nodes(self): @freezegun.freeze_time('2018-02-14T09:15:13Z') def test__nested_nodes(self): nodes = copy.copy(self.nested_nodes) - manifest = Manifest(nodes=nodes, macros={}, docs={}, + manifest = Manifest(nodes=nodes, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) serialized = manifest.writable_manifest().to_dict() @@ -269,14 +292,17 @@ def test__nested_nodes(self): def test__build_flat_graph(self): nodes = copy.copy(self.nested_nodes) - manifest = Manifest(nodes=nodes, macros={}, docs={}, + sources = copy.copy(self.sources) + manifest = Manifest(nodes=nodes, sources=sources, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) manifest.build_flat_graph() flat_graph = manifest.flat_graph flat_nodes = flat_graph['nodes'] - self.assertEqual(set(flat_graph), set(['nodes'])) + flat_sources = flat_graph['sources'] + self.assertEqual(set(flat_graph), set(['nodes', 'sources'])) self.assertEqual(set(flat_nodes), set(self.nested_nodes)) + self.assertEqual(set(flat_sources), set(self.sources)) for node in flat_nodes.values(): self.assertEqual(frozenset(node), REQUIRED_PARSED_NODE_KEYS) @@ -306,7 +332,7 @@ def test_no_nodes_with_metadata(self, mock_user): project_id='098f6bcd4621d373cade4e832627b4f6', adapter_type='postgres', ) - manifest = Manifest(nodes={}, macros={}, docs={}, + manifest = Manifest(nodes={}, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], metadata=metadata, files={}) @@ -314,6 +340,7 @@ def test_no_nodes_with_metadata(self, mock_user): manifest.writable_manifest().to_dict(), { 'nodes': {}, + 'sources': {}, 'macros': {}, 'parent_map': {}, 'child_map': {}, @@ -330,7 +357,7 @@ def test_no_nodes_with_metadata(self, mock_user): ) def test_get_resource_fqns_empty(self): - manifest = Manifest(nodes={}, macros={}, docs={}, + manifest = Manifest(nodes={}, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) self.assertEqual(manifest.get_resource_fqns(), {}) @@ -342,7 +369,7 @@ def test_get_resource_fqns(self): database='dbt', schema='analytics', alias='seed', - resource_type='seed', + resource_type=NodeType.Seed, unique_id='seed.root.seed', fqn=['root', 'seed'], package_name='root', @@ -356,7 +383,7 @@ def test_get_resource_fqns(self): root_path='', raw_sql='-- csv --', ) - manifest = Manifest(nodes=nodes, macros={}, docs={}, + manifest = Manifest(nodes=nodes, sources=self.sources, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) expect = { @@ -368,7 +395,12 @@ def test_get_resource_fqns(self): ('root', 'sibling'), ('root', 'multi'), ]), - 'seeds': frozenset([('root', 'seed')]), + 'seeds': frozenset([ + ('root', 'seed') + ]), + 'sources': frozenset([ + ('test', 'my_source', 'my_table') + ]) } resource_fqns = manifest.get_resource_fqns() self.assertEqual(resource_fqns, expect) @@ -527,7 +559,7 @@ def setUp(self): @freezegun.freeze_time('2018-02-14T09:15:13Z') def test__no_nodes(self): - manifest = Manifest(nodes={}, macros={}, docs={}, + manifest = Manifest(nodes={}, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) self.assertEqual( @@ -535,6 +567,7 @@ def test__no_nodes(self): { 'nodes': {}, 'macros': {}, + 'sources': {}, 'parent_map': {}, 'child_map': {}, 'generated_at': '2018-02-14T09:15:13Z', @@ -547,7 +580,7 @@ def test__no_nodes(self): @freezegun.freeze_time('2018-02-14T09:15:13Z') def test__nested_nodes(self): nodes = copy.copy(self.nested_nodes) - manifest = Manifest(nodes=nodes, macros={}, docs={}, + manifest = Manifest(nodes=nodes, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) serialized = manifest.writable_manifest().to_dict() @@ -611,13 +644,13 @@ def test__nested_nodes(self): def test__build_flat_graph(self): nodes = copy.copy(self.nested_nodes) - manifest = Manifest(nodes=nodes, macros={}, docs={}, + manifest = Manifest(nodes=nodes, sources={}, macros={}, docs={}, generated_at=datetime.utcnow(), disabled=[], files={}) manifest.build_flat_graph() flat_graph = manifest.flat_graph flat_nodes = flat_graph['nodes'] - self.assertEqual(set(flat_graph), set(['nodes'])) + self.assertEqual(set(flat_graph), set(['nodes', 'sources'])) self.assertEqual(set(flat_nodes), set(self.nested_nodes)) compiled_count = 0 for node in flat_nodes.values(): @@ -720,7 +753,7 @@ def setUp(self): ) -def make_manifest(nodes=[], macros=[], docs=[]): +def make_manifest(nodes=[], sources=[], macros=[], docs=[]): return Manifest( nodes={ n.unique_id: n for n in nodes @@ -728,6 +761,9 @@ def make_manifest(nodes=[], macros=[], docs=[]): macros={ m.unique_id: m for m in macros }, + sources={ + s.unique_id: s for s in sources + }, docs={ d.unique_id: d for d in docs }, @@ -982,19 +1018,20 @@ def test_find_materialization_by_name(macros, adapter_type, expected): assert result.package_name == expected_package -FindNodeSpec = namedtuple('FindNodeSpec', 'nodes,package,expected') +FindNodeSpec = namedtuple('FindNodeSpec', 'nodes,sources,package,expected') def _refable_parameter_sets(): sets = [ # empties - FindNodeSpec(nodes=[], package=None, expected=None), - FindNodeSpec(nodes=[], package='root', expected=None), + FindNodeSpec(nodes=[], sources=[], package=None, expected=None), + FindNodeSpec(nodes=[], sources=[], package='root', expected=None), ] sets.extend( # only one model, no package specified -> find it in any package FindNodeSpec( nodes=[MockNode(project, 'my_model')], + sources=[], package=None, expected=(project, 'my_model'), ) for project in ['root', 'dep'] @@ -1003,35 +1040,41 @@ def _refable_parameter_sets(): sets.extend([ FindNodeSpec( nodes=[MockNode('root', 'my_model')], + sources=[], package='root', expected=('root', 'my_model'), ), FindNodeSpec( nodes=[MockNode('dep', 'my_model')], + sources=[], package='root', expected=None, ), # a source with that name exists, but not a refable FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_model')], + nodes=[], + sources=[MockSource('root', 'my_source', 'my_model')], package=None, expected=None ), # a source with that name exists, and a refable FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_model'), MockNode('root', 'my_model')], + nodes=[MockNode('root', 'my_model')], + sources=[MockSource('root', 'my_source', 'my_model')], package=None, expected=('root', 'my_model'), ), FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_model'), MockNode('root', 'my_model')], + nodes=[MockNode('root', 'my_model')], + sources=[MockSource('root', 'my_source', 'my_model')], package='root', expected=('root', 'my_model'), ), FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_model'), MockNode('root', 'my_model')], + nodes=[MockNode('root', 'my_model')], + sources=[MockSource('root', 'my_source', 'my_model')], package='dep', expected=None, ), @@ -1049,12 +1092,12 @@ def id_nodes(arg): @pytest.mark.parametrize( - 'nodes,package,expected', + 'nodes,sources,package,expected', _refable_parameter_sets(), ids=id_nodes, ) -def test_find_refable_by_name(nodes, package, expected): - manifest = make_manifest(nodes=nodes) +def test_find_refable_by_name(nodes, sources, package, expected): + manifest = make_manifest(nodes=nodes, sources=sources) result = manifest.find_refable_by_name(name='my_model', package=package) if expected is None: assert result is expected @@ -1069,13 +1112,14 @@ def test_find_refable_by_name(nodes, package, expected): def _source_parameter_sets(): sets = [ # empties - FindNodeSpec(nodes=[], package=None, expected=None), - FindNodeSpec(nodes=[], package='root', expected=None), + FindNodeSpec(nodes=[], sources=[], package=None, expected=None), + FindNodeSpec(nodes=[], sources=[], package='root', expected=None), ] sets.extend( # models with the name, but not sources FindNodeSpec( nodes=[MockNode('root', name)], + sources=[], package=project, expected=None, ) @@ -1084,7 +1128,8 @@ def _source_parameter_sets(): # exists in root alongside nodes with name parts sets.extend( FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_table'), MockNode('root', 'my_source'), MockNode('root', 'my_table')], + nodes=[MockNode('root', 'my_source'), MockNode('root', 'my_table')], + sources=[MockSource('root', 'my_source', 'my_table')], package=project, expected=('root', 'my_source', 'my_table'), ) @@ -1093,7 +1138,8 @@ def _source_parameter_sets(): sets.extend( # wrong source name FindNodeSpec( - nodes=[MockSource('root', 'my_other_source', 'my_table')], + nodes=[], + sources=[MockSource('root', 'my_other_source', 'my_table')], package=project, expected=None, ) @@ -1102,7 +1148,8 @@ def _source_parameter_sets(): sets.extend( # wrong table name FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_other_table')], + nodes=[], + sources=[MockSource('root', 'my_source', 'my_other_table')], package=project, expected=None, ) @@ -1111,7 +1158,8 @@ def _source_parameter_sets(): sets.append( # wrong project name (should not be found in 'root') FindNodeSpec( - nodes=[MockSource('other', 'my_source', 'my_table')], + nodes=[], + sources=[MockSource('other', 'my_source', 'my_table')], package='root', expected=None, ) @@ -1119,7 +1167,8 @@ def _source_parameter_sets(): sets.extend( # exists in root check various projects (other project -> not found) FindNodeSpec( - nodes=[MockSource('root', 'my_source', 'my_table')], + nodes=[], + sources=[MockSource('root', 'my_source', 'my_table')], package=project, expected=('root', 'my_source', 'my_table'), ) @@ -1130,12 +1179,12 @@ def _source_parameter_sets(): @pytest.mark.parametrize( - 'nodes,package,expected', + 'nodes,sources,package,expected', _source_parameter_sets(), ids=id_nodes, ) -def test_find_source_by_name(nodes, package, expected): - manifest = make_manifest(nodes=nodes) +def test_find_source_by_name(nodes, sources, package, expected): + manifest = make_manifest(nodes=nodes, sources=sources) result = manifest.find_source_by_name(source_name='my_source', table_name='my_table', package=package) if expected is None: assert result is expected diff --git a/test/unit/test_parser.py b/test/unit/test_parser.py index 5a7c8ad108f..183d1a37e1b 100644 --- a/test/unit/test_parser.py +++ b/test/unit/test_parser.py @@ -22,11 +22,13 @@ from dbt.contracts.graph.manifest import ( Manifest, FilePath, SourceFile, FileHash ) +from dbt.contracts.graph.model_config import ( + NodeConfig, TestConfig, TimestampSnapshotConfig, SnapshotStrategy, +) from dbt.contracts.graph.parsed import ( ParsedModelNode, ParsedMacro, ParsedNodePatch, ParsedSourceDefinition, - NodeConfig, DependsOn, ColumnInfo, ParsedDataTestNode, TestConfig, - ParsedSnapshotNode, TimestampSnapshotConfig, SnapshotStrategy, - ParsedAnalysisNode, ParsedDocumentation + DependsOn, ColumnInfo, ParsedDataTestNode, ParsedSnapshotNode, + ParsedAnalysisNode, ParsedDocumentation, UnpatchedSourceDefinition ) from dbt.contracts.graph.unparsed import ( FreshnessThreshold, ExternalTable, Docs @@ -116,6 +118,9 @@ def setUp(self): 'root': self.root_project_config, 'snowplow': self.snowplow_project_config } + + self.root_project_config.dependencies = self.all_projects + self.snowplow_project_config.dependencies = self.all_projects self.patcher = mock.patch('dbt.context.providers.get_adapter') self.factory = self.patcher.start() @@ -199,6 +204,21 @@ def assert_has_results_length(self, results, files=1, macros=0, nodes=0, ''' +SINGLE_TABLE_SOURCE_PATCH = ''' +version: 2 +sources: + - name: my_source + overrides: snowplow + tables: + - name: my_table + columns: + - name: id + tests: + - not_null + - unique +''' + + class SchemaParserTest(BaseParserTest): def setUp(self): super().setUp() @@ -229,74 +249,66 @@ def test__read_basic_source(self): macro_blocks = MacroPatchParser(self.parser, block, 'macros').parse() self.assertEqual(len(analysis_blocks), 0) self.assertEqual(len(model_blocks), 0) - self.assertEqual(len(source_blocks), 1) + self.assertEqual(len(source_blocks), 0) self.assertEqual(len(macro_blocks), 0) self.assertEqual(len(list(self.parser.results.patches)), 0) self.assertEqual(len(list(self.parser.results.nodes)), 0) results = list(self.parser.results.sources.values()) self.assertEqual(len(results), 1) - self.assertEqual(results[0].source_name, 'my_source') - self.assertEqual(results[0].name, 'my_table') - self.assertEqual(results[0].description, '') - self.assertEqual(len(results[0].columns), 0) + self.assertEqual(results[0].source.name, 'my_source') + self.assertEqual(results[0].table.name, 'my_table') + self.assertEqual(results[0].table.description, '') + self.assertEqual(len(results[0].table.columns), 0) def test__parse_basic_source(self): block = self.file_block_for(SINGLE_TABLE_SOURCE, 'test_one.yml') self.parser.parse_file(block) self.assert_has_results_length(self.parser.results, sources=1) src = list(self.parser.results.sources.values())[0] - expected = ParsedSourceDefinition( - package_name='snowplow', - source_name='my_source', - schema='my_source', - name='my_table', - loader='', - freshness=FreshnessThreshold(), - external=ExternalTable(), - source_description='', - identifier='my_table', - fqn=['snowplow', 'my_source', 'my_table'], - database='test', - unique_id='source.snowplow.my_source.my_table', - root_path=get_abs_os_path('./dbt_modules/snowplow'), - path=normalize('models/test_one.yml'), - original_file_path=normalize('models/test_one.yml'), - resource_type=NodeType.Source, - ) - self.assertEqual(src, expected) + assert isinstance(src, UnpatchedSourceDefinition) + assert src.package_name == 'snowplow' + assert src.source.name == 'my_source' + assert src.table.name == 'my_table' + assert src.resource_type == NodeType.Source + assert src.fqn == ['snowplow', 'my_source', 'my_table'] def test__read_basic_source_tests(self): block = self.yaml_block_for(SINGLE_TABLE_SOURCE_TESTS, 'test_one.yml') - analysis_blocks = AnalysisPatchParser(self.parser, block, 'analyses').parse() - model_blocks = TestablePatchParser(self.parser, block, 'models').parse() - source_blocks = SourceParser(self.parser, block, 'sources').parse() - macro_blocks = MacroPatchParser(self.parser, block, 'macros').parse() - self.assertEqual(len(analysis_blocks), 0) - self.assertEqual(len(model_blocks), 0) - self.assertEqual(len(source_blocks), 1) - self.assertEqual(len(macro_blocks), 0) + analysis_tests = AnalysisPatchParser(self.parser, block, 'analyses').parse() + model_tests = TestablePatchParser(self.parser, block, 'models').parse() + source_tests = SourceParser(self.parser, block, 'sources').parse() + macro_tests = MacroPatchParser(self.parser, block, 'macros').parse() + self.assertEqual(len(analysis_tests), 0) + self.assertEqual(len(model_tests), 0) + self.assertEqual(len(source_tests), 0) + self.assertEqual(len(macro_tests), 0) self.assertEqual(len(list(self.parser.results.nodes)), 0) self.assertEqual(len(list(self.parser.results.patches)), 0) + self.assertEqual(len(list(self.parser.results.source_patches)), 0) results = list(self.parser.results.sources.values()) self.assertEqual(len(results), 1) - self.assertEqual(results[0].source_name, 'my_source') - self.assertEqual(results[0].name, 'my_table') - self.assertEqual(results[0].description, 'A description of my table') - self.assertEqual(len(results[0].columns), 1) + self.assertEqual(results[0].source.name, 'my_source') + self.assertEqual(results[0].table.name, 'my_table') + self.assertEqual(results[0].table.description, 'A description of my table') + self.assertEqual(len(results[0].table.columns), 1) def test__parse_basic_source_tests(self): block = self.file_block_for(SINGLE_TABLE_SOURCE_TESTS, 'test_one.yml') self.parser.parse_file(block) - self.assertEqual(len(self.parser.results.nodes), 2) + self.assertEqual(len(self.parser.results.nodes), 0) self.assertEqual(len(self.parser.results.sources), 1) self.assertEqual(len(self.parser.results.patches), 0) src = list(self.parser.results.sources.values())[0] - self.assertEqual(src.source_name, 'my_source') - self.assertEqual(src.schema, 'my_source') - self.assertEqual(src.name, 'my_table') - self.assertEqual(src.description, 'A description of my table') + self.assertEqual(src.source.name, 'my_source') + self.assertEqual(src.source.schema, None) + self.assertEqual(src.table.name, 'my_table') + self.assertEqual(src.table.description, 'A description of my table') - tests = sorted(self.parser.results.nodes.values(), key=lambda n: n.unique_id) + tests = [ + self.parser.parse_source_test(src, test, col) + for test, col in src.get_tests() + ] + tests.sort(key=lambda n: n.unique_id) self.assertEqual(tests[0].config.severity, 'ERROR') self.assertEqual(tests[0].tags, ['schema']) @@ -311,11 +323,36 @@ def test__parse_basic_source_tests(self): path = get_abs_os_path('./dbt_modules/snowplow/models/test_one.yml') self.assertIn(path, self.parser.results.files) - self.assertEqual(sorted(self.parser.results.files[path].nodes), - [t.unique_id for t in tests]) + self.assertEqual(self.parser.results.files[path].nodes, []) self.assertIn(path, self.parser.results.files) self.assertEqual(self.parser.results.files[path].sources, ['source.snowplow.my_source.my_table']) + self.assertEqual(self.parser.results.files[path].source_patches, []) + + def test__read_source_patch(self): + block = self.yaml_block_for(SINGLE_TABLE_SOURCE_PATCH, 'test_one.yml') + analysis_tests = AnalysisPatchParser(self.parser, block, 'analyses').parse() + model_tests = TestablePatchParser(self.parser, block, 'models').parse() + source_tests = SourceParser(self.parser, block, 'sources').parse() + macro_tests = MacroPatchParser(self.parser, block, 'macros').parse() + self.assertEqual(len(analysis_tests), 0) + self.assertEqual(len(model_tests), 0) + self.assertEqual(len(source_tests), 0) + self.assertEqual(len(macro_tests), 0) + self.assertEqual(len(list(self.parser.results.nodes)), 0) + self.assertEqual(len(list(self.parser.results.patches)), 0) + self.assertEqual(len(list(self.parser.results.sources)), 0) + results = list(self.parser.results.source_patches.values()) + self.assertEqual(len(results), 1) + self.assertEqual(results[0].name, 'my_source') + self.assertEqual(results[0].overrides, 'snowplow') + self.assertIsNone(results[0].description) + self.assertEqual(len(results[0].tables), 1) + table = results[0].tables[0] + self.assertEqual(table.name, 'my_table') + self.assertIsNone(table.description) + self.assertEqual(len(table.columns), 1) + self.assertEqual(len(table.columns[0].tests), 2) class SchemaParserModelsTest(SchemaParserTest): @@ -792,6 +829,8 @@ def setUp(self): nodes = { x_uid: self.x_node, y_uid: self.y_node, + } + sources = { src_uid: self.src_node, } docs = { @@ -804,7 +843,7 @@ def setUp(self): ) } self.manifest = Manifest( - nodes=nodes, macros={}, docs=docs, disabled=[], files={}, generated_at=mock.MagicMock() + nodes=nodes, sources=sources, macros={}, docs=docs, disabled=[], files={}, generated_at=mock.MagicMock() ) def test_process_docs(self): diff --git a/test/unit/test_source_config.py b/test/unit/test_source_config.py index 5b9cac11875..69377551e94 100644 --- a/test/unit/test_source_config.py +++ b/test/unit/test_source_config.py @@ -1,14 +1,16 @@ import os from unittest import TestCase, mock +from dbt.adapters import postgres # we want this available! import dbt.flags +from dbt.context.context_config import LegacyContextConfig +from dbt.legacy_config_updater import ConfigUpdater from dbt.node_types import NodeType -from dbt.source_config import SourceConfig from .utils import config_from_parts_or_dicts -class SourceConfigTest(TestCase): +class LegacyContextConfigTest(TestCase): def setUp(self): dbt.flags.STRICT_MODE = True dbt.flags.WARN_ERROR = True @@ -65,9 +67,11 @@ def setUp(self): def tearDown(self): self.patcher.stop() - def test__source_config_single_call(self): - cfg = SourceConfig(self.root_project_config, self.root_project_config, - ['root', 'x'], NodeType.Model) + def test__context_config_single_call(self): + cfg = LegacyContextConfig( + self.root_project_config, self.root_project_config, + ['root', 'x'], NodeType.Model + ) cfg.update_in_model_config({ 'materialized': 'something', 'sort': 'my sort key', @@ -86,11 +90,13 @@ def test__source_config_single_call(self): 'tags': [], 'vars': {'a': 1, 'b': 2}, } - self.assertEqual(cfg.config, expect) + self.assertEqual(cfg.build_config_dict(), expect) - def test__source_config_multiple_calls(self): - cfg = SourceConfig(self.root_project_config, self.root_project_config, - ['root', 'x'], NodeType.Model) + def test__context_config_multiple_calls(self): + cfg = LegacyContextConfig( + self.root_project_config, self.root_project_config, + ['root', 'x'], NodeType.Model + ) cfg.update_in_model_config({ 'materialized': 'something', 'sort': 'my sort key', @@ -118,12 +124,14 @@ def test__source_config_multiple_calls(self): 'tags': [], 'vars': {'a': 4, 'b': 2, 'c': 3}, } - self.assertEqual(cfg.config, expect) + self.assertEqual(cfg.build_config_dict(), expect) - def test__source_config_merge(self): + def test__context_config_merge(self): self.root_project_config.models = {'sort': ['a', 'b']} - cfg = SourceConfig(self.root_project_config, self.root_project_config, - ['root', 'x'], NodeType.Model) + cfg = LegacyContextConfig( + self.root_project_config, self.root_project_config, + ['root', 'x'], NodeType.Model + ) cfg.update_in_model_config({ 'materialized': 'something', 'sort': ['d', 'e'] @@ -140,22 +148,30 @@ def test__source_config_merge(self): 'tags': [], 'vars': {}, } - self.assertEqual(cfg.config, expect) - - def test_source_config_all_keys_accounted_for(self): - used_keys = frozenset(SourceConfig.AppendListFields) | \ - frozenset(SourceConfig.ExtendDictFields) | \ - frozenset(SourceConfig.ClobberFields) + self.assertEqual(cfg.build_config_dict(), expect) + + def test_context_config_all_keys_accounted_for(self): + updater = ConfigUpdater('postgres') + used_keys = ( + frozenset(updater.AppendListFields) | + frozenset(updater.ExtendDictFields) | + frozenset(updater.ClobberFields) | + frozenset({'unlogged'}) + ) - self.assertEqual(used_keys, frozenset(SourceConfig.ConfigKeys)) + self.assertEqual(used_keys, frozenset(updater.ConfigKeys)) - def test__source_config_wrong_type(self): + def test__context_config_wrong_type(self): # ExtendDict fields should handle non-dict inputs gracefully self.root_project_config.models = {'persist_docs': False} - cfg = SourceConfig(self.root_project_config, self.root_project_config, - ['root', 'x'], NodeType.Model) + cfg = LegacyContextConfig( + self.root_project_config, self.root_project_config, + ['root', 'x'], NodeType.Model + ) + + model = mock.MagicMock(resource_type=NodeType.Model, fqn=['root', 'x'], project_name='root') with self.assertRaises(dbt.exceptions.CompilationException) as exc: - cfg.get_project_config(self.root_project_config) + cfg.updater.get_project_config(model, self.root_project_config) self.assertIn('must be a dict', str(exc.exception)) diff --git a/test/unit/test_utils.py b/test/unit/test_utils.py index 0f294fa69ca..e624d5da4e0 100644 --- a/test/unit/test_utils.py +++ b/test/unit/test_utils.py @@ -140,7 +140,6 @@ def test_trivial(self): dbt.utils.deep_map(lambda x, _: x, {'foo': object()}) - class TestBytesFormatting(unittest.TestCase): def test__simple_cases(self): @@ -152,3 +151,42 @@ def test__simple_cases(self): self.assertEqual(dbt.utils.format_bytes(1024**3*52.6), '52.6 GB') self.assertEqual(dbt.utils.format_bytes(1024**4*128), '128.0 TB') self.assertEqual(dbt.utils.format_bytes(1024**5+1), '> 1024 TB') + + +class TestMultiDict(unittest.TestCase): + def test_one_member(self): + dct = {'a': 1, 'b': 2, 'c': 3} + md = dbt.utils.MultiDict([dct]) + assert len(md) == 3 + for key in 'abc': + assert key in md + assert md['a'] == 1 + assert md['b'] == 2 + assert md['c'] == 3 + + def test_two_members_no_overlap(self): + first = {'a': 1, 'b': 2, 'c': 3} + second = {'d': 1, 'e': 2, 'f': 3} + md = dbt.utils.MultiDict([first, second]) + assert len(md) == 6 + for key in 'abcdef': + assert key in md + assert md['a'] == 1 + assert md['b'] == 2 + assert md['c'] == 3 + assert md['d'] == 1 + assert md['e'] == 2 + assert md['f'] == 3 + + def test_two_members_overlap(self): + first = {'a': 1, 'b': 2, 'c': 3} + second = {'c': 1, 'd': 2, 'e': 3} + md = dbt.utils.MultiDict([first, second]) + assert len(md) == 5 + for key in 'abcde': + assert key in md + assert md['a'] == 1 + assert md['b'] == 2 + assert md['c'] == 1 + assert md['d'] == 2 + assert md['e'] == 3 diff --git a/test/unit/utils.py b/test/unit/utils.py index fdb80a08001..d27793afd87 100644 --- a/test/unit/utils.py +++ b/test/unit/utils.py @@ -37,13 +37,14 @@ def mock_connection(name): def profile_from_dict(profile, profile_name, cli_vars='{}'): - from dbt.config import Profile, ConfigRenderer + from dbt.config import Profile + from dbt.config.renderer import ProfileRenderer from dbt.context.base import generate_base_context from dbt.utils import parse_cli_vars if not isinstance(cli_vars, dict): cli_vars = parse_cli_vars(cli_vars) - renderer = ConfigRenderer(generate_base_context(cli_vars)) + renderer = ProfileRenderer(generate_base_context(cli_vars)) return Profile.from_raw_profile_info( profile, profile_name, @@ -53,12 +54,13 @@ def profile_from_dict(profile, profile_name, cli_vars='{}'): def project_from_dict(project, profile, packages=None, cli_vars='{}'): from dbt.context.target import generate_target_context - from dbt.config import Project, ConfigRenderer + from dbt.config import Project + from dbt.config.renderer import DbtProjectYamlRenderer from dbt.utils import parse_cli_vars if not isinstance(cli_vars, dict): cli_vars = parse_cli_vars(cli_vars) - renderer = ConfigRenderer(generate_target_context(profile, cli_vars)) + renderer = DbtProjectYamlRenderer(generate_target_context(profile, cli_vars)) project_root = project.pop('project-root', os.getcwd()) diff --git a/third-party-stubs/agate/__init__.pyi b/third-party-stubs/agate/__init__.pyi index 2d1f01f1020..ce27e1fb175 100644 --- a/third-party-stubs/agate/__init__.pyi +++ b/third-party-stubs/agate/__init__.pyi @@ -59,8 +59,17 @@ class Table: def from_csv(cls, path: Iterable[str], *, column_types: Optional['TypeTester'] = None) -> 'Table': ... @classmethod def merge(cls, tables: Iterable['Table']) -> 'Table': ... + def rename(self, column_names: Optional[Iterable[str]] = None, row_names: Optional[Any] = None, slug_columns: bool = False, slug_rows: bool=False, **kwargs: Any) -> 'Table': ... class TypeTester: def __init__(self, force: Any = ..., limit: Optional[Any] = ..., types: Optional[Any] = ...) -> None: ... def run(self, rows: Any, column_names: Any): ... + + +class MaxPrecision: + def __init__(self, column_name: Any) -> None: ... + + +# this is not strictly true, but it's all we care about. +def aggregate(self, aggregations: MaxPrecision) -> int: ...