Metagoofil

Documents metadata. Information appended to files (e.g. owner, date & time of creation/modification, network location, geolocation). Does a google search of documents from a website, and downloads them. Extracts username, software version, server names, workstation names.

Installation

apt install metagoofil

Add proxy support (CNTLM)

Not working yet

nano /usr/share/metagoofil/metagoofil.py

class DownloadWorker(threading.Thread):
...
    def run(self):
...
                proxy = { "http" : "http://127.0.0.1:3128" }
                response = requests.get(url, headers=headers, verify=False, timeout=mg.url_timeout, stream=True, proxies=proxy)

Usage

metagoofil -d goodhacking.org -t doc -o dir -f results.txt

All file types

metagoofil -d goodhacking.org -t ALL

Help

metagoofil -h
usage: metagoofil.py [-h] -d DOMAIN [-e DELAY] [-f] [-i URL_TIMEOUT] [-l SEARCH_MAX] [-n DOWNLOAD_FILE_LIMIT] [-o SAVE_DIRECTORY]
                     [-r NUMBER_OF_THREADS] -t FILE_TYPES [-u [USER_AGENT]] [-w]

Metagoofil - Search and download specific filetypes

optional arguments:
  -h, --help            show this help message and exit
  -d DOMAIN             Domain to search.
  -e DELAY              Delay (in seconds) between searches. If it's too small Google may block your IP, too big and your search may take a while.
                        DEFAULT: 30.0
  -f                    Save the html links to html_links_<TIMESTAMP>.txt file.
  -i URL_TIMEOUT        Number of seconds to wait before timeout for unreachable/stale pages. DEFAULT: 15
  -l SEARCH_MAX         Maximum results to search. DEFAULT: 100
  -n DOWNLOAD_FILE_LIMIT
                        Maximum number of files to download per filetype. DEFAULT: 100
  -o SAVE_DIRECTORY     Directory to save downloaded files. DEFAULT is cwd, "."
  -r NUMBER_OF_THREADS  Number of search threads. DEFAULT: 8
  -t FILE_TYPES         file_types to download (pdf,doc,xls,ppt,odp,ods,docx,xlsx,pptx). To search all 17,576 three-letter file extensions, type
                        "ALL"
  -u [USER_AGENT]       User-Agent for file retrieval against -d domain.
                        no -u = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
                        -u = Randomize User-Agent
                        -u "My custom user agent 2.0" = Your customized User-Agent
  -w                    Download the files, instead of just viewing search results.