API for the lotto numbers of the german lottery (1955-2020)

Background

This repo provides the german lotto numbers from 1955 - today in one single file. All people who are interested in data analysis or just to “calculate” their chances to win the lottery are invited to use the data.

Two JSON files are give: Choose the one you can work with :-)

Data analysis examples

The data provided is a JSON file and readable by all modern software languages. In the following two examples are shown (R and Python).

R

The package tidyverse is able to analyse the data very quickly with R.

In the next chunk, all data are read, filtered (just taking the lotto numbers) and grouped by the values and counted the number of apperance. We can see, that lotto number 6 is the nost frequent number.

library(tidyverse)
library(jsonlite)
library(lubridate)

data <- fromJSON("https://johannesfriedrich.github.io/LottoNumberArchive/Lottonumbers_tidy_complete.json")

lottonumbers_count <- data$data %>% 
  filter(variable == "Lottozahl") %>% 
  group_by(value) %>% 
  summarise(count = n())

lottonumbers_count %>% 
  arrange(desc(count)) %>% 
  top_n(5)
## Selecting by count
## # A tibble: 6 x 2
##   value count
##   <int> <int>
## 1     6   593
## 2    32   575
## 3    49   573
## 4    38   568
## 5    26   567
## 6    31   567

Now we want to summarise all numbers from 1-49 and their appearance.

library(ggplot2)

ggplot(lottonumbers_count, aes(value, count)) +
  geom_bar(stat = "identity") +
  labs(x = "Lottonumber", title = "Lottonumbers in Germany since 1955")

Since 2001 in the german lottery a number called “Zusatzzahl” was introduced. Every Wednesday and Saturday the number chosen. The following graph shows the distribution of the Zusatzzahl.

superzahl <- data$data %>% 
  filter(variable == "Superzahl") %>% 
  mutate(date = dmy(date),
         Day = weekdays(date),
         year = year(date)) %>% 
  filter(year >= 2001) %>% 
  group_by(value, Day) %>% 
  summarise(count = n())

ggplot(superzahl, aes(value, count, fill = Day)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_x_continuous(breaks = c(0:9)) +
  labs(x = "Zusatzzahl", title = "Zusatzzahl since 2001")

What were the numbers most chosen in 2019?

data$data %>% 
  filter(variable == "Lottozahl") %>% 
  mutate(date = dmy(date),
         year = year(date)) %>% 
  filter(year == 2019) %>% 
  group_by(value) %>% 
  summarise(count = n()) %>% 
   arrange(desc(count)) %>% 
  top_n(5)
## Selecting by count
## # A tibble: 7 x 2
##   value count
##   <int> <int>
## 1    42    22
## 2    29    20
## 3    36    18
## 4    11    17
## 5    19    17
## 6    31    17
## 7    47    17

Python

In python the module pandas is very handy to analyse data. In the following the same analysis as shown above will be executed.

import pandas as pd

data = pd.read_json("https://johannesfriedrich.github.io/LottoNumberArchive/Lottonumbers_tidy_complete.json",orient='split')

res = data[data.variable == "Lottozahl"].groupby("value")["value"].count().sort_values(ascending = False)

print(res.head(5))
## value
## 6     593
## 32    575
## 49    573
## 38    568
## 26    567
## Name: value, dtype: int64

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
README_figs		README_figs
LICENSE		LICENSE
Lottonumbers_complete.json		Lottonumbers_complete.json
Lottonumbers_tidy_complete.json		Lottonumbers_tidy_complete.json
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API for the lotto numbers of the german lottery (1955-2020)

Background

Data analysis examples

R

Python

About

Releases

Packages

License

klauslippert/LottoNumberArchive

Folders and files

Latest commit

History

Repository files navigation

API for the lotto numbers of the german lottery (1955-2020)

Background

Data analysis examples

R

Python

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages