Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-352] catch and retry malformed json #4982

Merged
merged 14 commits into from
Apr 5, 2022
33 changes: 27 additions & 6 deletions core/dbt/clients/registry.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
import functools
import json
import requests
from dbt.events.functions import fire_event
from dbt.events.types import RegistryProgressMakingGETRequest, RegistryProgressGETResponse
from dbt.events.types import (
RegistryProgressMakingGETRequest,
RegistryProgressGETResponse,
RegistryMalformedResponse,
)
from dbt.utils import memoized, _connection_exception_retry as connection_exception_retry
from dbt import deprecations
import os
Expand Down Expand Up @@ -31,18 +36,34 @@ def _get(path, registry_base_url=None):
fire_event(RegistryProgressGETResponse(url=url, resp_code=resp.status_code))
resp.raise_for_status()

# It is unexpected for the content of the response to be None so if it is, raising this error
# will cause this function to retry (if called within _get_with_retries) and hopefully get
# a response. This seems to happen when there's an issue with the Hub.
# It is unexpected for the content of the response to be None or malformed. If it is,
# raising this error will cause this function to retry (if called within _get_with_retries)
# and hopefully get a valid response. This seems to happen when there's an issue with the Hub.
# See https:/dbt-labs/dbt-core/issues/4577
if resp.json() is None:
# and https:/dbt-labs/dbt-core/issues/4849
if not validJson(resp.json()):
raise requests.exceptions.ContentDecodingError(
"Request error: The response is None", response=resp
"Request error: The response is not valid json", response=resp
)
return resp.json()


def validJson(jsonData):
try:
# the quotes in the response were converted to single quotes which is not
# valid json. json.dumps() will fix that.
json.loads(json.dumps(jsonData))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels bad. Any suggestions on a better way to do this welcome!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't state this yet but the api/v1/index.json path also returns a list objects that causes it fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... I think it should be valid JSON... it's just a list... according to jsonlint dot com it's valid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 my understanding is json.resp() returns a json object which really means it's returning a python list in this case. So json.loads(jsonData) freaks out with TypeError: the JSON object must be str, bytes or bytearray, not list. I've updated all this confusing logic I added to just straight check if json.resp() is a list or dict. That's all we ever expect it to be. Since we control the HUB and the only time it's not one of those two things something else is going wrong, this is safe. It's also much clearer as to what's happening.

# We need to catch both malformed json and json = None
except (ValueError, TypeError) as exc:
# This event will allow us to see the actual json causing issues if more deps
# issues crop up. Otherwise we're flying a little blind.
fire_event(RegistryMalformedResponse(jsonData=jsonData, exc=exc))
return False
return True


def index(registry_base_url=None):
# this returns a list of all packages on the Hub
return _get_with_retries("api/v1/index.json", registry_base_url)


Expand Down
13 changes: 13 additions & 0 deletions core/dbt/events/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -2393,6 +2393,18 @@ def message(self) -> str:
return f"External call exception: {self.exc}"


@dataclass
class RegistryMalformedResponse(DebugLevel):
jsonData: str
exc: Exception
code: str = "M022"

def message(self) -> str:
prefix = f"Malformed json: {self.jsonData}"
error = f"Exception: {self.exc}"
return f"{prefix}\n{error}"


# since mypy doesn't run on every file we need to suggest to mypy that every
# class gets instantiated. But we don't actually want to run this code.
# making the conditional `if False` causes mypy to skip it as dead code so
Expand Down Expand Up @@ -2747,3 +2759,4 @@ def message(self) -> str:
GeneralWarningException(exc=Exception(""), log_fmt="")
EventBufferFull()
RecordRetryException(exc=Exception(""))
RegistryMalformedResponse(jsonData="", exc=Exception(""))
1 change: 1 addition & 0 deletions test/unit/test_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ def MockNode():
EventBufferFull(),
RecordRetryException(exc=Exception('')),
UnitTestInfo(msg=''),
RegistryMalformedResponse(jsonData="",exc=Exception('')),
]


Expand Down