Skip to content
This repository has been archived by the owner on Jan 26, 2023. It is now read-only.

48 HOUR CHALLENGE - Grants Round 3 CLR -- Data Analysis Bounty #40

Closed
owocki opened this issue Oct 6, 2019 · 27 comments
Closed

48 HOUR CHALLENGE - Grants Round 3 CLR -- Data Analysis Bounty #40

owocki opened this issue Oct 6, 2019 · 27 comments

Comments

@owocki
Copy link
Contributor

owocki commented Oct 6, 2019

This is a data analysis bounty for Gitcoin Grants CLR Round 3 ( https://gitcoin.co/blog/gitcoins-q3-match-100k-to-oss-projects/ )

Gitcoin Grants CLR is based upon this paper by Vitalik Buterin and Glen Weyl ( https://medium.com/coinmonks/breaking-down-buterin-hitzig-and-weyls-liberal-radicalism-paper-ba5192248b2 )

Round 3 makes use of Pairwise Bonding ( https://ethresear.ch/t/pairwise-coordination-subsidies-a-new-quadratic-funding-design/5553 ) to help prevent collusion.

The task for this bounty is to analyze an anonymous dataset of grants clr contributions and find

  • interesting trends
  • evidence of collusion
  • beautiful datavizes

The data set is here:
REDACTED

The gitcoin team has already done some dataviz of this dataset, which you can find on twitter: https://twitter.com/gitcoin/status/1179365868628787200, and we do our own analysis of the above, but we want to hear from the community can do here.

We need to see a first iteration of data analysis done by 10am JST on 10/8; and final submissions by 10/9 at noon JST. This gives the community 48 hours to put soething together.

I am posting this as a 7 ETH bounty; which I intend to divide up among the top 3 contributors (by my judgement) as follows:
1: 4 ETH
2. 2 ETH
3. 1 ETH

Conditions:

  • Provide an executive summary to your results.
  • Open source any code you used to analyze the results.
@gitcoinbot
Copy link
Member

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 7.0 ETH (1234.85 USD @ $176.41/ETH) attached to it.

@owocki
Copy link
Contributor Author

owocki commented Oct 6, 2019

Got an idea for what the researchers should look for? Post a comment below.
Some ideas from me:

  • Does the community have a bias towards certain types of projects?
  • Does the community have a bias towards project leads with a large email/twitter list? Vs. actual importance of project.
  • Is there on-chain collusion?
  • Is there off-chain collusion?

@owocki owocki changed the title Grants Round 3 CLR -- Data Analysis Bounty 48 HOUR CHALLENGE - Grants Round 3 CLR -- Data Analysis Bounty Oct 6, 2019
@mul1sh
Copy link

mul1sh commented Oct 7, 2019

I'm working on this challenge, so far I've been able to scaffold the viz dashboard with basic stats.

I'll continue updating it in the next few hours and add tables and charts that will enable researchers easily answer the above questions by @owocki, thanks 🙂

@owocki
Copy link
Contributor Author

owocki commented Oct 7, 2019 via email

@gitcoinbot
Copy link
Member

gitcoinbot commented Oct 8, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 2 weeks, 6 days from now.
Please review their action plans below:

1) jessemorningstar has started work.

I have spent the last 24 hours researching the problem, developing a prototype, and deploying a first iteration of a solution whose code I will share next.
2) adivyas99 has started work.

Starting work and understanding data
3) mul1sh has started work.

Been developing this want to submit now
4) think-in-universe has started work.

I have done with the analysis with Jupyter Notebook

Learn more on the Gitcoin Issue Details page.

@adivyas99
Copy link

Hi,
I have started work at 9am IST.

As you said this community as of now is a little biased towards certain kinds of projects.

Thats why I didn't get to know earlier (2 days ago) ;)

But now I have done almost half of the task.

Thanks, @owocki for providing this opportunity.

Let's connect :)
LinkedIn- https://www.linkedin.com/in/adityavyas99/

@gitcoinbot
Copy link
Member

gitcoinbot commented Oct 8, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work for 7.0 ETH (1265.15 USD @ $180.74/ETH) has been submitted by:

  1. @adivyas99
  2. @mul1sh
  3. @think-in-universe

@owocki please take a look at the submitted work:


@think-in-universe
Copy link

think-in-universe commented Oct 9, 2019

Hi @owocki , there're something fun (maybe) we can found with the dataset, but as the dataset is anonymous, we cannot tell for sure why some pattern happens.

For example, one question as described in my project: I want to understand why accounts with profile fields of 775fec778ed2672f511d864e139552a3690de36a93de3a8733773678 and , ae03c652db8c8a17ea7a89c0593da5ed6c22598fa7a050210c5feb16 have made 5 USD contribution in total, but split into 19 or 73 contributions evenly.

Is it possible anyone from Gitcoin team can share more info about the accounts, or figure out why that pattern happens?

@owocki
Copy link
Contributor Author

owocki commented Oct 11, 2019

@think-in-universe its because we gave out $5 vouchers to a handful of people !

@owocki
Copy link
Contributor Author

owocki commented Oct 11, 2019

thanks forthe submissions everyone; working with the gitcoin team on follow ups this upcoming week

@think-in-universe
Copy link

@owocki thanks for the info. So these accounts are controlled by gitcoin team, right? I have a few more questions (umm... just curious about the patterns). Could I discuss with anyone from gitcoin team on Discord?

@frankchen07
Copy link
Collaborator

frankchen07 commented Oct 11, 2019

@think-in-universe, heyo, frank from Gitcoin here!

interesting analysis! loving the ipython notebook (not the most pretty data science tool, but definitely does the job. I had my ipython days too - some comments below:

  • not sure what the pie chart is supposed to be telling me in this case

  • try not to use pie charts for distribution due to the human eye being unable to make area comparisons very well, try stacked bar charts (better), or break out the ones that are significant and do normal histogram comparisons (even better)

  • for the scatterplot, the dots aren't labeled, making it a little confusing (I believe there should be labels on them)

  • generally I go for a seaborn, or use R and do ggplot (but R is annoying too so, tradeoffs)

  • plotly is another one that could be useful, but I think seaborn and clever usage of minimal graphs via matplotlib should be sufficient

  • I like how you broke it down by tags and by extracted title words - great contrasting analysis from multiple perspectives

  • another interesting metric we could've looked at is the ratio of r3 / total funds, as a measure of how popular a fund is during a CLR round - to some extent you showed that in the side by side histogram, but it wasn't displayed as a ratio

  • interesting analysis under "history", how are you calculating the "historical" amounts?

  • I think with the way Grants is set up it's natural that there's less of a correlation for newer grants with less history. There's definitely an expalanation for this - can you take an educated guess as to why that is? (hint, subscription)

  • what kind of collusion analysis would you run if you had the singular contribution data for ip addresses with timestamps? what additional information would you like to see?

  • 5 usd for a total of 53 projects is also the result of our phantom funding, not collusion, (something else we can discuss)

you can hit up fronk#2724 on discord if you want to chat more about this! I can add @owocki too

@frankchen07
Copy link
Collaborator

@adivyas99 - I see your .ipynb notebook, but it's not loading?

In your executive summary, you gave us a detailed breakdown of some simple analytics on the data, but I didn't see any of the four main questions answered - did you leave it out of your executive summary?

some pointers below on those Qs:

The task for this bounty is to analyze an anonymous dataset of grants clr contributions and find

interesting trends
evidence of collusion
beautiful datavizes

  1. Does the community have a bias towards certain types of projects?
  2. Does the community have a bias towards project leads with a large email/twitter list? Vs. actual importance of project.
  3. Is there on-chain collusion?
  4. Is there off-chain collusion?

@adivyas99
Copy link

I’m sorry for inconvenience, but the python file is working correctly, it might be the case that it’s little large in size, so please try to download it and then run.

Hope this will work.
Thanks.

@frankchen07
Copy link
Collaborator

@mul1sh

interesting dashboard - some additional followup and ways to supercharge the analysis:

  1. how else might you detect bias aside from a rank ordered amount of funding?

  2. how did you go about detecting collusion? can you show your work or walk through how you would do it?

  3. "key trend I noticed is that people are more likely to contribute to a project they know" - how did you come to this conclusion?

  4. How might you go about detecting off-chain collusion (perhaps using the encrypted IP addresses?)

  5. Owocki had a comment above with clarity on the type of questions we wanted answered: 48 HOUR CHALLENGE - Grants Round 3 CLR -- Data Analysis Bounty #40 (comment)

  6. Don't forget to share your code!

@think-in-universe
Copy link

wow, thanks for your detailed comments and questions Frank @frankchen07 😄

yeah. I agree with your comments about ipython notebook (Jupyter Notebook), while it's more interactive and productive when exploring some new datasets to find new stuff.

yeah. I have quite a few more questions, let's talk via Discord to be more efficient.

@owocki
Copy link
Contributor Author

owocki commented Oct 16, 2019

status update:

aiming to pay out end of week!

still conferring with @frankchen07 on payouts.. for the top submission and/or continuation of this work, were def looking for insights (as opposed to just charts). the insights were looking for are articulated here

#40 (comment)

if anyone is interested in continuing along by say, adding a chain analysis component to look for on-chain collusion, time-based collusion (eg did a bunch of contributions come in all at once for certain grants, indicating an orchestrated effort to get contributions), let me know.

@think-in-universe
Copy link

@owocki cool. I'm interested to resume the analysis per talked with Fran @frankchen07

I think collusion analysis for an orchestrated effort to get contributions is definitely necessary, and it would be effective if we could reach to some more granular datasets, as I discussed with Frank.

I agree time-based collusion should also be meaningful, which was not included in previous analysis since I saw that we already explored the topic in the Tweets (https://twitter.com/gitcoin/status/1179365868628787200). Maybe we can add that in the consolidated version.

@owocki
Copy link
Contributor Author

owocki commented Oct 16, 2019

kewl let me know what exports you need for the next iteration of this @think-in-universe and also what you think it should cost ETH-wise

@gitcoinbot
Copy link
Member

⚡️ A tip worth 3.00000 ETH (529.5 USD @ $176.5/ETH) has been granted to @think-in-universe for this issue from @owocki. ⚡️

Nice work @think-in-universe! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot
Copy link
Member

⚡️ A tip worth 1.00000 ETH (176.5 USD @ $176.5/ETH) has been granted to @mul1sh for this issue from @owocki. ⚡️

Nice work @mul1sh! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot
Copy link
Member

⚡️ A tip worth 1.00000 ETH (176.5 USD @ $176.5/ETH) has been granted to @adivyas99 for this issue from @owocki. ⚡️

Nice work @adivyas99! Your tip has automatically been deposited in the ETH address we have on file.

@gitcoinbot
Copy link
Member

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This Bounty has been completed.

Additional Tips for this Bounty:

  • owocki tipped 1.0000 ETH worth 173.65 USD to adivyas99.
  • owocki tipped 1.0000 ETH worth 173.65 USD to mul1sh.
  • owocki tipped 3.0000 ETH worth 520.96 USD to think-in-universe.

@mul1sh
Copy link

mul1sh commented Oct 17, 2019

@owocki same here, i'm interested in continuing the analysis for the long term too :), sorry my dashboard was a bit sketchy because I was trying to beat the deadline, but with more time I can and will come up with something much better 😄

@frankchen07 thanks for your feedback as well, really appreciate it. I think for me to better answer those questions, i'll continue working on my dashboard and update my executive summary to reflect all this.

Also btw is it a must to do the analysis in jupyter notebook, just notice this repo is a jupyter notebook repo.

@adivyas99
Copy link

Thanks @owocki for providing this opportunity. Just to give my context, I'm only working on Machine Learning and Data Analysis, this was my first time on Gitcoin.
And due to the time constraint I was not able to catchup with the dataset properly.
Like @mul1sh I am also interested in continuing the analysis for the long term ;), just check out my LinkedIn and GitHub to check out my proficiencies (im not boasting myself :) ).
https://www.linkedin.com/in/adityavyas99/
https:/adivyas99

I'll be waiting for more such bounties.
Thanks!

@think-in-universe
Copy link

think-in-universe commented Oct 17, 2019

@owocki we may need more anonymous datasets that shows the profile and behaviors of the contributors and grant administrators/owners from different platforms such as GitCoin, Twitter, GitHub if possible, so we can check more behaviors to detect collusions.

I want to explore this challenge just for fun if possible. But if we need to go with ETH, we can talk offline on Discord.

BTW, I have communicated with Frank on Discord for the next steps, and feel free to check the details on that channel.

@owocki
Copy link
Contributor Author

owocki commented Oct 21, 2019

@think-in-universe kewl; will confer with @frankchen07 and if any other datasets are needed; write me up a quick ticket and ill see what i can do

@owocki owocki closed this as completed Oct 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants