Equivalence classes on requests. #40

kantp · 2016-01-27T13:13:23Z

Hi,

this pull request proposes a new function, dataFetchEquiv, that allows to define equivalent requests, i.e. classes of requests where it can be known a priori that they will yield the same result.

For a given equivalence class, at most one request will be performed, and all equivalent requests will yield the same result.

I use this facility to stretch Haxl a bit, and allow to perform write operations that are disguised as reads: my request type will look like

data MyRequest a where
  EnterOrRetrieveData :: MyData -> MyRequest MyData
  RetrieveData :: MyKey -> MyRequest MyData

I will also have a function that maps MyData to MyKey,

key :: MyData -> MyKey

The query that I like to perform is this: given some value x of type MyData, is there already some y in my data source that has the same key as x? If yes, give me that y, if not, enter x into my data source, and give me x as the result of the query. All future queries for a value z with key z == key x should yield the same result (the goal is to find similar pieces of data, and the function key is chosen so as to map similar data to equal keys).

This can be achieved with the function dataFetchEquiv,

getSimilar :: MyData -> (GenHaxl UserEnv) MyData
getSimilar =
  let equiv :: MyRequest a -> MyRequest a -> Bool
      equiv (EnterOrRetrieveData x) (EnterOrRetrieveData y) = key x == key y
      representative :: MyRequest a -> MyRequest a
      representative (EnterOrRetrieveData x) = RetrieveData (key x)
  in dataFetchEquiv equiv representative . EnterOrRetrieveData

The write operation is idempotent: Haxl will perform at most one write for each key. It is not observable whether anything was written at all, or if the data was already there in the first place. This ensures that the caching and re-ordering made by Haxl are safe (as long as it is not important which data is written for a given key, since that may well be affected by the re-ordering).

This might be a pretty narrow use case, but since the changes are non-invasive, I thought I'd offer a pull-request, in case it is useful to somebody else. Any feedback is greatly appreciated.

kantp · 2016-01-27T14:46:02Z

Strange, it gets a Kind mis-match on ghc-7.6.3, but is accepted by ghc-7.8.4 and 7.10.2. I'll have to look into this.

This commit introduces a new function dataFetchEquiv, that allows to define equivalent requests, i.e. classes of requests where it can be known a priori that they will yield the same result. For a given equivalence class, at most one request will be performed, and equivalent requests will give the same result.

The type signature needs to be slightly different on older ghc versions (Typeable vs. Typeable1).

For illustration and testing.

kantp · 2016-02-04T08:49:46Z

Ok, with 46d9dfd, it works with older ghc versions as well.

Also, for added context (because the example I give in the test is not particularly useful), I described how I use this new functionality to solve a real problem in a lightning talk at the last Haskell eXchange: https://skillsmatter.com/skillscasts/6538-data-deduplication-in-haskell-an-experience-report.

simonmar · 2016-02-05T11:32:47Z

I started at this for a while and I don't think I understand all the ramifications. You can certainly do very strange things with dataFetchEquiv. What are the properties under which dataFetchEquiv is safe to use?

I'm also concerned that there's an /O(n)/ search in there when the request is not already cached.

This needs more thought IMO. We've come across cases that are similar but I'm not sure if they're catered for by dataFetchEquiv. For example, a query that counts the number of Xs is satisfied by a previous query that fetches all the Xs.

kantp · 2016-02-05T13:30:44Z

Thanks for your feedback, Simon. I think the documentation needs some improvement.

In order to safely use dataFetchEquiv, the following conditions have to hold for the two functions equiv and representative that you supply to it:

equiv has to be an equivalence relation (i.e., a equiv b and b equiv c imply a equiv c, a equiv b is the same as b equiv a, and a equiv a always holds)
representative a == representative b iff a equiv b
Two requests for which equiv is true always give the same results.

If you can define these two functions on your requests, you can avoid some redundant data fetching, since for any two requests a, b with a equiv b, only one will be performed, and the result used for both.

Regarding the O(n) search, this might be problematic if there are many BlockedFetches in the queue. However, I don't see a way to avoid this, without changing the internals of the RequestStore.

The case you described is interesting, but I don't think it is covered by this as is. Maybe it could be solved in a somewhat similar fashion, though. You are not trying to identify requests that give the same answer, but for two requests where the answer to one can be inferred from the answer to the other.

I think it could be possible to write down a function that is similar to dataFetchEquiv, but instead of the function equiv, you would have to provide one function that can identify a request that provides enough information to get the answer to the request you try to perform, and another function that uses the result from the other request to get the answer to this one.

In your example, before entering a request Count xs, it would check if Get xs was already there (in which case it could use the length of the result of Get xs). What do you think?

kantp · 2016-02-05T19:28:38Z

@simonmar I just made a draft of how I think the use case you described can be handled (see #41). It could probably benefit from some refactoring, as it introduces a little code redundancy, but it demonstrates the general idea.

facebook-github-bot added the CLA Signed label Jan 27, 2016

Philipp Kant added 2 commits January 30, 2016 21:30

Fix type signature of requestsOfType.

46d9dfd

The type signature needs to be slightly different on older ghc versions (Typeable vs. Typeable1).

kantp force-pushed the equivalence-classes branch from acff66e to 46d9dfd Compare January 30, 2016 20:31

Added test for dataFetchEquiv.

32827ba

For illustration and testing.

kantp force-pushed the equivalence-classes branch from 3f10d2e to 32827ba Compare February 2, 2016 12:20

Additional comments in dataFetchEquiv.

14df314

kantp mentioned this pull request Feb 5, 2016

Specialised requests #41

Open

ghost added the CLA Signed label Jul 12, 2016

fregante mentioned this pull request May 31, 2023

Show pr-base-commit on more PRs refined-github/refined-github#6574

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equivalence classes on requests. #40

Equivalence classes on requests. #40

kantp commented Jan 27, 2016

kantp commented Jan 27, 2016

kantp commented Feb 4, 2016

simonmar commented Feb 5, 2016

kantp commented Feb 5, 2016

kantp commented Feb 5, 2016

Equivalence classes on requests. #40

Are you sure you want to change the base?

Equivalence classes on requests. #40

Conversation

kantp commented Jan 27, 2016

kantp commented Jan 27, 2016

kantp commented Feb 4, 2016

simonmar commented Feb 5, 2016

kantp commented Feb 5, 2016

kantp commented Feb 5, 2016