-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different merge types results in the same logic #14
Comments
Thanks! We are thinking about deprecating this. This had different logic before, but we reverted to the same one because fuzzy merge is slightly different from a standard merge. However, we are open to suggestions, and if you have a case for keeping these and adjusting the logic appropriately, we are happy to revisit this. 1:m and m:1 would just be symmetric and 1:1 is a special case of 1:m. Also, fun side note, python's standard merge has an option called "Validate" where these options can be specified. These only check for duplication of keys- it has no bearing on the match itself - we can probably rename this option as validate as well. |
Interesting, thanks for the explanation and getting back to me so quickly! I'll describe the type of datasets I am using and how the 1:m would be really helpful for me. In dataset one I have about 30,000 users each with a unique id assigned to them. The problem I'm running into is I need to map one id from dataset one to multiple ids in dataset two as in dataset two the company can change an individuals id more than once. The reason I need 1:m is I want to be able to map 1 id to multiple ids so I can create a universal id for that person. However, after the I know I could switch the datasets around but I need to keep the 30,000 dataset as df1 and the 50,000 dataset as df2. If you could provide the snippet I need for a 1:m mapping that would be really appreciated 🙏 Thanks once again! |
Thanks! Just so I understand this clearly, what is preventing you from switching the two datasets around?
If you think about this a bit more conceptually, when we want a 1:m fuzzy merge we think : "I want to have several suitable merges for each row in the df1 from df2". This goal can be achieved by either 1 or 2 - does this make sense? |
Closing due to inactivity; solutions are provided above. |
Thanks for such a great repository! I just was going through the code where you're able to implement run
merge()
on two data frames and it looks as if the logic is the same for all three merges(1:m or m:1 or 1:1)
:Could someone explain why this is the case, please? Thanks! 😄
The text was updated successfully, but these errors were encountered: