Skip to content

Extract & Plot Git Repo Clones/Views Beyond 14-day Insights

License

Notifications You must be signed in to change notification settings

demidenm/git_traffik

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git traffik

Situation: GitHub Insights → Traffic data only offers 14 days of visit/clone data.
Task: A workflow is needed to access the repository every 10-14 days to pull the data and store it, retaining and expanding visitor/clone history.
Action: git_traffic is a simple workflow that can be integrated directly within a repository, run in a separate repository to track another repo, or run locally.
Result: Historical data >14 days in .csv format and visualized automatically for any repo.

How Does it Work?

1. Running it via cloned git_traffik repo.

.github/workflows/repo.yaml:
Includes fields for repository name [REPO], repository owner [OWNER], and the API key, personal token.
The token provides read/write/pull/push access to the repository [MY_ACCESS_TOKEN]. The token is stored as a SECRET KEY for security.

To run, clone this repository and update the .github/workflows/repo.yaml file with your information, then monitor the actions.

repo info location

You can, at any time, manually trigger the workflow.

Workflow actions example

You can also modify the cron schedule.

cron schedule

When a workflow is completed, it will indicate this with a green checkmark. You can review the steps within it for specific details. Set up some manual and automated (timed) tests to ensure everything is working.

Workflow complete example

The results will then be populated in the ./git_traffik/output folder for the given repo in .csv and .png formats.

results output example

You can download and/or view the compiled data in the .csv file or as an image:

Example PyReliMRI Traffic Data Plot

Note:

  1. For the code to work, the token should have repository privileges.
  2. For the code to run, ensure the repo where actions are being performed (e.g., git_traffik) has Settings → Actions → General → Workflow Permissions set to Read and write permissions.

2. Running Git Traffik within your package/software repo with & without tokens

To run git_traffik within your package repo, copy the git_traffik folder to the root directory and place the .github/workflows/repo.yaml file in your .github/workflows folder. As with #1, update the repo details in the .yaml file and insert the two secret keys with your personal token.

repo info location
git token info

Skip the secret key! When the workflow & code is copied directly within your Github repository, you can skip the creation of the secret key by adding {{ secrets.GITHUB_TOKEN }}. In others works, in repo.yaml: MY_ACCESS_TOKEN: ${{ secrets.REPO_A_ACCESS_TOKEN }} --> MY_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}

For more details, see automatic authentication.

Note:

  1. For the code to work, the token should have repository privileges.
  2. For the code to run, ensure the repo where actions are being performed (e.g., <repo_name>) has Settings → Actions → General → Workflow Permissions set to Read and write permissions.

3. Running Git Traffik locally on your machine

To run it locally, you will only need the ./git_traffik/repo_check_traffic.py script. Since it is running locally, you can avoid secret keys, but you will still need a personal token with repo permissions. Update the owner, repo, and token information with your own:

local run updates

Then, either run the Python script to generate the data or write a bash script to call the Python code. If your machine is on most days of the week, you can set a schedule using crontab to run it on a set schedule. Test to ensure it works.

More Specific Details of the Code

.github/workflows/repo.yaml:

This YAML file defines a GitHub Actions workflow named "Repo Data & Figures" that performs the following tasks:

  • Schedule/Trigger: Runs on the 1st, 11th, and 21st of every month at midnight (UTC) & can be triggered manually. It uses cron syntax.
  • Job: update-data runs on Ubuntu and includes these steps:
    • Checkout Code: Retrieves the repository code within git_traffik.
    • Setup Python: Configures Python 3.9.
    • Install Dependencies: Installs dependencies in ./git_traffik/repo_check_traffic.py, such as requests, pandas, matplotlib, and seaborn.
    • Run Script: Gathers data and generates figures:
      • Setup: Configures owner, repo name, and personal access token based on the repo.yaml file.
      • Data Retrieval: Fetches views and clones data from the GitHub API. Converts data to DataFrames and merges clones/views data. Keeps only unique dates and excludes dates when both clones/views are 0. Ensures unique timestamps and fills missing values.
      • CSV Handling: Updates the existing .csv file if present; otherwise, creates a new .csv file with the traffic data.
      • Plotting: Generates plots for views and clones over time in a two-panel figure.
      • Saving: Saves the plots as .csv and PNG files.
    • List Files: Displays the contents generated in the output directory.
    • Upload Output Files: Uploads output files from the output directory as artifacts (saved as a .zip for each workflow).
    • Configure Git: Configures Git using a generic username and email.
    • Check Git Status: Checks the repository for changes (if hashes are identical, no changes).
    • Add Files: Stages output files for commit to the ./output/ directory.
    • Commit Files: Commits files with a message if any changes exist.

This workflow ensures that data and figures are updated regularly and consistently in the repository.

Example Using PyReliMRI Package

I created a small package that I wanted to observe the fluctuations in usage. It helps me determine whether people are using it and if I should consider maintaining and expanding it. Unfortunately, I discovered very quickly that the first 4-5 months of data were lost. I needed something more consistent.

Essentials to Update in repo.yaml

These are called as variables into the .py code for use with the API:

  • OWNER: Update the GitHub repository owner to your name or whoever has access and has granted you repo privileges.
  • REPO: Update the repository name (in my case, it is PyReliMRI).
  • MY_ACCESS_TOKEN: This is the token to access the data, and it needs to be private. In the repo_trafficplots repo, go to Settings → Secrets & Variables → Actions and create a New Repository Secret with your Personal Token. You can create one for yourself by following these instructions.

Once this is set up, the Actions are triggered via the event trigger (cron details). You can review all runs and the associated logs. When the figures are created, they are updated in ./git_traffik/output/. The figure below is compiled based on the running data.

About

Extract & Plot Git Repo Clones/Views Beyond 14-day Insights

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages