Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should it be possible to pass parameter(s) to the post-init function? #17

Closed
ericvsmith opened this issue Jun 10, 2017 · 27 comments
Closed

Comments

@ericvsmith
Copy link
Owner

Currently the post-init function __dataclass_post_init__ takes no parameters. Should it be possible to pass one or more parameters from __init__ to the post-init function? Is it possible to make the parameter optional?

It would be nice to pass parameters to __init__ which do not initialize fields, but are accessible to the post-init function. But, it might mess up the interface too much, and there are issues with calling __dataclass_post_init__'s super() function if we decide to change the number of parameters.

This issue is a placeholder to decide this issue before the code and PEP are finalized. Any suggestions or thoughts are welcome.

@ericvsmith
Copy link
Owner Author

One option is to inspect __dataclass_post_init__ and add any parameters as optional? keyword-only? parameters to __init__.

Something like:

@dataclass
class C:
    a: int
    b: int
    c: int = field(init=False, default=0)
    def __dataclass_post_init__(self, database=None):
        # perform some database lookup to initialize 'c'
        if database is not None:
            self.c = lookup(database)

c = C(0, 1, database=my_database)

The generated __init__ might look like:

    def __init__(self, x, y, *, database=None)

This could always be added at a later date. I've occasionally needed functionality like this, but I've found ways to work around it.

@ericvsmith
Copy link
Owner Author

On the other hand, finding out the parameter names in the presence of classmethod, staticmethod, and other descriptors might be impossible.

Maybe something like:

@dataclass(post_init_args='database')
class C:
    a: int
    b: int
    c: int = field(init=False, default=0)
    def __dataclass_post_init__(self, database=None):
        ...

@gvanrossum
Copy link

gvanrossum commented Jun 26, 2017 via email

@ericvsmith
Copy link
Owner Author

To my knowledge, attrs doesn't let you pass in parameters to their __attrs_post_init__ function. The closest I could find is python-attrs/attrs#180 which was closed in preference of using classmethods. So maybe that's good enough.

@gvanrossum
Copy link

At least in this stage, and given the discussion about attrs vs. stdlib, let's stick to that.

ericvsmith added a commit that referenced this issue Sep 16, 2017
…of parameterized __dataclass_post_init__. Add a test showing the technique.
@ericvsmith
Copy link
Owner Author

Food for thought, no action item here.

It occurs to me that one way around this is to have init-only members of a dataclass. That is, members that are params to __init__(), and would be passed to __dataclass_post_init__(), but would not be otherwise used. They would not be part of the repr, cmp functions, etc. They are not fields (as defined in the PEP).

I think this best way to do this would be to treat them like we do with typing.ClassVar. Maybe have a dataclasses.InitVar. Like typing.ClassVar, they would not be fields.

@dataclass
class C:
    database: InitVar(DatabaseConnection)
    i: int
    j: str=None

    def __dataclass_post_init__(self, database):
        if self.j is None:
            self.j = database.ReadSomeValue(self.i)

c = C(db_connection, 3)
assert repr(c) == 'C(3, "some value looked up in the database")'
assert len(fields(C) == 2)  # database is not a field

I'm not sure what we'd want to leave in __annotations__ in this case. Maybe just remove the 'database' key entirely.

@ericvsmith ericvsmith reopened this Nov 19, 2017
@gvanrossum
Copy link

gvanrossum commented Nov 20, 2017 via email

@ericvsmith
Copy link
Owner Author

I guess it could be a member of the class, if it had a default value.

Let's first look at how ClassVar works, as an example:

>>> @dataclass
... class C:
...   i: int
...   cls_var: ClassVar[List[int]]
...
>>> c=C(0)
>>> c.i
0
>>> list(fields(c).keys())
['i']
>>> C.cls_var
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'C' has no attribute 'cls_var'

There's never any assignment to C.x, so there's no attribute defined on the class. And, it's not a field, so there's no per-instance attribute. Maybe you should be required to have a default value, but I say leave it up to the class author.

Here's how ClassVar works with a default value:

>>> @dataclass
... class D:
...   i: int
...   cls_var: ClassVar[List[int]] = []
...
>>> d=D(0)
>>> d.cls_var
[]
>>> d.__dict__
{'i': 0}
>>> D.cls_var
[]

So, there is now a class attribute cls_var because the default value assigned to it. Note that the above behavior with respect to cls_var. where it only exists if it has a default value, is the same for a non-dataclass class.

I would propose that an InitVar field would work exactly the same way, as far as how it's handled as a field and as a class attribute. If you gave it a default value, it would be a class attribute, but I don't think anyone would ever look for that class attribute.

Then the only difference between a ClassVar and an InitVar is that InitVar's are passed to __dataclass_post_init__().

@ericvsmith
Copy link
Owner Author

Note that as of #73, __dataclass_post_init__ has been renamed __post_init__.

@gvanrossum
Copy link

gvanrossum commented Nov 20, 2017 via email

@ericvsmith
Copy link
Owner Author

While I like this idea a lot, I'm running in to some complications with base classes that also have InitVar members. I'm going to have to think about how it should work. It also brings up base-class __post_init__() functions, and how they're called in general. I'll also research what attrs does there.

@ericvsmith
Copy link
Owner Author

@ilevkivskyi : Any comments on this design? I want to make sure I'm not doing anything that won't work well with static type checkers.

The only wart on it is that you'd need to repeat typing information if you wanted to fully specify __post_init__()'s params. Here is a class that takes one of it's fields in __init__(), and initialize the other in __post_init__() via a config file:

@dataclass
class C:
    i: int
    j: int = field(init=False, default=None)
    config_file: InitVar[ConfigFileType]
    def __post_init__(self, config_file: ConfigFileType):
        self.j = config_file.read()

c = C(10, my_config_file)

Like ClassVar, the InitVar field (config_file in this case) would not be an instance member. And in this case, not even a class member since it's not initialized.

Later today I'll post a note about how this would work with base classes.

@ilevkivskyi
Copy link
Contributor

It is fine with mypy. We will anyway need some NamedTuple-like special support for dataclasses. Also concerning type duplication, we can allow omitting InitVar argument:

@dataclass
class C:
    i: int
    j: int = field(init=False, default=None)
    config_file: InitVar
    def __post_init__(self, config_file: ConfigFileType):
        self.j = config_file.read()

@ericvsmith
Copy link
Owner Author

@ilevkivskyi : Thanks. Good news.

And omitting the type on the InitVar declaration is a good idea, but could mypy then figure out that it's a parameter to __init__(), and figure out its type from __post_init__()? If so, I'd suggest just always omitting the type, and always have InitVar with no argument.

@ilevkivskyi
Copy link
Contributor

but could mypy then figure out that it's a parameter to __init__(), and figure out its type from __post_init__()?

Yes, as I mentioned we will need some special-casing anyway to construct the __init__ signature, so incorporating __post_init__ is just one small additional step.

ericvsmith added a commit that referenced this issue Nov 21, 2017
… pseudo-fields in __dataclass_fields__, and filter them out as needed.
@ericvsmith
Copy link
Owner Author

Here's an example of how this would work with inheritance. The field ordering is the same as with real fields: base fields (in order) before derived fields (in order):

@dataclass
class Base:
    x: int
    init_base: InitVar[int]

@dataclass
class C(Base):
    y: int
    init_derived: InitVar[int]

    def __post_init__(self, init_base, init_derived):
        self.x = self.x + init_base
        self.y = self.y + init_derived

# order of params is (x, init_base, y, init_derived)
c = C(10, 11, 50, 51)
self.assertEqual(vars(c), {'x': 21, 'y': 101})

Normally, I think you'd end up using named parameters in such a complicated case, but this test doesn't have them to make sure I actually get the fields in the right order.

I can't find where we had the discussion, but we've previously determined that we can't automatically chain __init__ or __post_init__ functions, and that the caller will have to do so if desired. In this simple test I didn't chain them, but I assume in most real-world cases that a derived __post_init__ would call the base class __post_init__ directly.

ericvsmith added a commit that referenced this issue Nov 22, 2017
… simply implement InitVar. An open issue is if I support any type on it at all, or just make it always plain "init_param: InitVar".
@gvanrossum
Copy link

This sounds right. We have regular fields, InitVars, and ClassVars. Only the first kind gets put on the instance and included in the list of fields. The first two kinds get passed to __init__. Only the second kind gets passed to __post_init__. And the latter is excluded from everything.

Agreed on the chaining, but we should make it very clear in the docs that users should call super methods. In fact maybe there should be a no-op __post_init__ with no arguments on class object so you can always call the super method, and cooperative multiple inheritance will work.

@ericvsmith
Copy link
Owner Author

ericvsmith commented Nov 22, 2017

Yes, that description of the kinds of fields is spot-on.

Instead of adding object.__post_init__, I think I could generate code to call super.__post_init__ only if it exists. But as always the problem with cooperative multiple inheritance is knowing what parameters to call the super method with, since with this change we've added params to __post_init__.

I think we had a similar discussion with chaining __init__ calls and decided to leave it up to the user to work it out: we don't call super.__init__ at all. I'd like to do the same here: if you the class author understand the __post_init__ hierarchy enough, just chain the calls yourself with the right params, because we can't get it right.

That said, I think we could inspect super.__post_init__ and find the parameter names. Since the parameter names aren't arbitrary (they're the names of the InitVar fields, after all), we could then build up the list of params and call the method with the right values. But that seems kind of extreme.

@ericvsmith
Copy link
Owner Author

One "interesting" byproduct of having InitVars is the interaction with replace(). Because replace() calls __init__() which calls __post_init__(), the InitVar ends up being required in calls to replace(). This requirement can be satisfied with a default value, or by specifying the InitVar as a replace() parameter.

Here, with a default value for an InitVar:

f@dataclass
class C:
    i: int
    lookup: InitVar[Mapping[int,int]] = None

    def __post_init__(self, lookup):
        if lookup is not None:
            self.i = lookup[self.i]

config = {1: 100,
          2: 200,
          }
c = C(1, lookup=config)
print(c)
print(replace(c, i=2))
print(replace(c, i=2, lookup=config))

Produces:

C(i=100)
C(i=2)
C(i=200)

If we omit the default for lookup, then it's required on calls to replace():

@dataclass
class C:
    i: int
    lookup: InitVar[Mapping[int,int]]

    def __post_init__(self, lookup):
        self.i = lookup[self.i]

config = {1: 100,
          2: 200,
          }
c = C(1, lookup=config)
replace(c, i=2)

Produces:

Traceback (most recent call last):
  File "i.py", line 18, in <module>
    replace(c, i=2)
  File "/cygdrive/c/home/eric/local/dataclasses/dataclasses.py", line 1008, in replace
    new_obj = obj.__class__(**changes)
TypeError: __init__() missing 1 required positional argument: 'lookup'

I think this all follows logically from how the various functions are defined, but it might be surprising to callers. But then InitVar is sort of an advanced topic, and 99% of use cases won't run in to this.

@ilevkivskyi
Copy link
Contributor

Apart from replace() this sounds right, but then there is a possible solution for replace() that I proposed in #74 (comment) or am I missing something?

@ericvsmith
Copy link
Owner Author

Yes, this replace() issue and #74 are closely related. I think my proposed solution in #74 (comment) to both these issues is: dataclass isn't going to handle it. Let the class author decide.

With that, I'd like to refocus this issue on InitVars. I don't think there are any more outstanding issues, and once I'm done with the implementation and PEP updates, this issue is ready to be merged.

@ilevkivskyi
Copy link
Contributor

With that, I'd like to refocus this issue on InitVars. I don't think there are any more outstanding issues, and once I'm done with the implementation and PEP updates, this issue is ready to be merged.

OK.

@gvanrossum
Copy link

gvanrossum commented Nov 22, 2017 via email

@ericvsmith
Copy link
Owner Author

@gvanrossum : agreed.

I lied about there being no more issues. The issue of where to specify the type of the InitVar remains.

It could be:

@dataclass
class C:
    i: int
    db: InitVar[DatabaseType]
    def __post_init_(self, db): pass

Or:

@dataclass
class C:
    i: int
    db: InitVar
    def __post_init__(self, db: DatabaseType): pass

@ilevkivskyi : Any preference?

All things being equal I'd prefer the former because of the parallel with ClassVar, but I could go either way.

@ilevkivskyi
Copy link
Contributor

@ericvsmith
Actually it would be not hard to allow both options in mypy (and even a third one with duplicating DatabaseType). I slightly more prefer the second one, but only slightly.

@ericvsmith
Copy link
Owner Author

ericvsmith commented Nov 22, 2017

I'd like to support just one. It doesn't really make a difference to dataclasses, since it ignore the type. It just needs InitVar to tell it what type of field this is.

@gvanrossum : Do you have an opinion? See #17 (comment) (2 messages up). The goal is to not repeat the type information for the InitVar. Should it appear on the class variable annotation, or on the definition of __post_init__, or both?

Another reason to require it on the class variable annotation (my slight preference) is that this is valid (if ill-advised):

@dataclass
class Base:
    i: int
    db: InitVar[int]

@dataclass
class C(Base):
    j: int = field(init=False, default=None)
    def __post_init__(self, db):
        self.j = db

c = C(1, db=10)
print(c)

Which produces C(i=1, j=10). If the typing information is given on the call to __post_init__ (@ilevkivskyi's slight preference), it's not even on the same class as the class variable!

[Edit: slight wording improvement]

@gvanrossum
Copy link

I would make the former canonical:

@dataclass
class C:
    i: int
    db: InitVar[DatabaseType]
    def __post_init_(self, db): ...

This keeps the types all in one place, and matches what's done for ClassVar.

We can't stop users from also annotating the methods (if they want type checking) so the redundant form must also be allowed:

@dataclass
class C:
    i: int
    db: InitVar[DatabaseType]
    def __post_init_(self, db: DatabaseType) -> None: ...

but that's the case for all sorts of correspondences. Having the form with just InitVar in the class feels weird, we don't have any precedent for that.

ericvsmith added a commit that referenced this issue Nov 23, 2017
… approach is to implement it from scratch, instead of trying to whittle down ClassVar.
ericvsmith added a commit that referenced this issue Nov 25, 2017
…on that InitVar fields must be passed to replace().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants