Skip to content

JohannesFriedrich/LottoNumberArchive

Repository files navigation

API for the lotto numbers of the german lottery (1955-2024)

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Background

This repo provides the german lotto numbers from 1955 - today in one single file. All people who are interested in data analysis or just to “calculate” their chances to win the lottery are invited to use the data.

Two JSON files are give: Choose the one you can work with :-)

Data analysis examples

The data provided is a JSON file and readable by all modern software languages. In the following two examples are shown (R and Python).

R

The package tidyverse is able to analyze the data very quickly with R.

In the next chunk, all data are read, filtered (just taking the lotto numbers) and grouped by the values and counted the number of appearance. We can see, that lotto number 6 is the most frequent number.

library(tidyverse)
library(jsonlite)

data <- fromJSON("https://johannesfriedrich.github.io/LottoNumberArchive/Lottonumbers_tidy_complete.json")

lottonumbers_count <- data %>% 
  filter(variable == "Lottozahl") %>% 
  group_by(value) %>% 
  summarise(count = n())
lottonumbers_count %>% 
  arrange(desc(count)) %>% 
  top_n(5)
## Selecting by count
## # A tibble: 5 × 2
##   value count
##   <int> <int>
## 1     6   652
## 2    49   641
## 3    32   626
## 4    31   625
## 5    33   625

Now we want to summarise all numbers from 1-49 and their appearance.

library(ggplot2)

ggplot(lottonumbers_count, aes(value, count)) +
  geom_bar(stat = "identity") +
  labs(x = "Lottonumber", title = "Lottonumbers in Germany since 1955")

Since 2001 in the german lottery a number called “Zusatzzahl” was introduced. Every Wednesday and Saturday the number chosen. The following graph shows the distribution of the Zusatzzahl.

superzahl <- data %>% 
  filter(variable == "Superzahl") %>% 
  mutate(date = dmy(date),
         Day = weekdays(date),
         year = year(date)) %>% 
  filter(year >= 2001) %>% 
  group_by(value, Day) %>% 
  summarise(count = n())
## `summarise()` has grouped output by 'value'. You can override using the
## `.groups` argument.
ggplot(superzahl, aes(value, count, fill = Day)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_x_continuous(breaks = c(0:9)) +
  labs(x = "Zusatzzahl", title = "Zusatzzahl since 2001")

What were the numbers most chosen in 2023?

data %>% 
  filter(variable == "Lottozahl") %>% 
  mutate(date = dmy(date),
         year = year(date)) %>% 
  filter(year == 2023) %>% 
  group_by(value) %>% 
  summarise(count = n()) %>% 
  slice_max(count, n = 5)
## # A tibble: 8 × 2
##   value count
##   <int> <int>
## 1    19    19
## 2    22    18
## 3    33    18
## 4    25    17
## 5    23    16
## 6    28    16
## 7    42    16
## 8    43    16

Python

In python the module pandas is very handy to analyse data. In the following the same analysis as shown above will be executed.

import pandas as pd

data = pd.read_json("https://johannesfriedrich.github.io/LottoNumberArchive/Lottonumbers_tidy_complete.json", convert_dates = False)

res = data[data.variable == "Lottozahl"].groupby("value")["value"].count().sort_values(ascending = False)

print(res.head(5))
## value
## 6     652
## 49    641
## 32    626
## 33    625
## 31    625
## Name: value, dtype: int64