Documents metadata. Information appended to files (e.g. owner, date & time of creation/modification, network location, geolocation). Does a google search of documents from a website, and downloads them. Extracts username, software version, server names, workstation names.
Installation
apt install metagoofil
Add proxy support (CNTLM)
Not working yet
nano /usr/share/metagoofil/metagoofil.py
class DownloadWorker(threading.Thread):
...
def run(self):
...
proxy = { "http" : "http://127.0.0.1:3128" }
response = requests.get(url, headers=headers, verify=False, timeout=mg.url_timeout, stream=True, proxies=proxy)
Usage
metagoofil -d goodhacking.org -t doc -o dir -f results.txt
All file types
metagoofil -d goodhacking.org -t ALL
Help
metagoofil -h
usage: metagoofil.py [-h] -d DOMAIN [-e DELAY] [-f] [-i URL_TIMEOUT] [-l SEARCH_MAX] [-n DOWNLOAD_FILE_LIMIT] [-o SAVE_DIRECTORY]
[-r NUMBER_OF_THREADS] -t FILE_TYPES [-u [USER_AGENT]] [-w]
Metagoofil - Search and download specific filetypes
optional arguments:
-h, --help show this help message and exit
-d DOMAIN Domain to search.
-e DELAY Delay (in seconds) between searches. If it's too small Google may block your IP, too big and your search may take a while.
DEFAULT: 30.0
-f Save the html links to html_links_<TIMESTAMP>.txt file.
-i URL_TIMEOUT Number of seconds to wait before timeout for unreachable/stale pages. DEFAULT: 15
-l SEARCH_MAX Maximum results to search. DEFAULT: 100
-n DOWNLOAD_FILE_LIMIT
Maximum number of files to download per filetype. DEFAULT: 100
-o SAVE_DIRECTORY Directory to save downloaded files. DEFAULT is cwd, "."
-r NUMBER_OF_THREADS Number of search threads. DEFAULT: 8
-t FILE_TYPES file_types to download (pdf,doc,xls,ppt,odp,ods,docx,xlsx,pptx). To search all 17,576 three-letter file extensions, type
"ALL"
-u [USER_AGENT] User-Agent for file retrieval against -d domain.
no -u = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
-u = Randomize User-Agent
-u "My custom user agent 2.0" = Your customized User-Agent
-w Download the files, instead of just viewing search results.