Script: [[Scripts/Ciberseguridad/dirscan|dirscan]] Cada prompt ha generado una respuesta que me ha servido para mezclar partes del script que he acabado generando. ChatGPT no da un script bien hecho a la primera.
Vamos a hacer un script en Bash para mi laboratorio de ciberseguridad usando los paquetes apropiados: - Escanea el contenido de una página para buscar enlaces encontrados tanto en el HTML como en el CSS. - En estos enlaces, haz lo mismo. Será una búsqueda recursiva. - Haz output de los enlaces y archivos encontrados. La sintáxis será folderscan.sh <IP> y los comentarios los harás en inglés.
Make it so the verbose output is colorful. And also, please make it so links in the output does not repeat.
Now avoid scanning external websites.
./folderscan.sh: line 22: warning: command substitution: ignored null byte in input Make it so this does not display.
Okay, now please make another script to process output_links.txt and tell me all the directories found. For example: Having this output_links.txt:
http://sea.htb/
http://sea.htb/home
http://sea.htb/how-to-participate
http://sea.htb/contact.php
http://sea.htb/themes/bike/css/style.css
http://sea.htb/themes/bike/css/style.css/../img/1.png
http://sea.htb/themes/bike/img/velik71-new-logotip.png
http://sea.htb/themes/bike/css/style.css/../img/2.png
http://sea.htb/themes/bike/css/style.css/../img/3.png
http://sea.htb/themes/bike/css/style.css/../img/4.png
http://sea.htb/themes/bike/css/style.css/../img/5.png
http://sea.htb/themes/bike/css/style.css/../img/6.png
http://sea.htb/themes/bike/css/style.css/../img/bike.png
I'd like the script to tell me:
Folders found:
http://sea.htb/themes
http://sea.htb/themes/bike
http://sea.htb/themes/bike/img
[...]
Edit the script to avoid saying folders like "http:" and "http://sea.htb". And make the output more colorful.
I'd like to show a recursive output, where instead of telling:
Folders found:
http://sea.htb/themes/bike/css
http://sea.htb/themes/bike/css/style.css/../img
http://sea.htb/themes/bike/img
it tells:
Folders found:
http://sea.htb/themes/
http://sea.htb/themes/bike/
http://sea.htb/themes/bike/css/
http://sea.htb/themes/bike/img
Let's make a Bash script for my cybersecurity lab using the appropiate packages:
First, you will make a scan() function to scan a website. Website is provided at $1. scan() will look for any link found both in the HTML and the CSS. External links are discarded, so it will only be a "hit" if scan() finds an internal link or a link that has the base domain/IP.
For each found link, do the same. It will be a recursive search.
I want the script to be verbose. But do not echo the links skipped, just the found.
Also, make a colorful output for all links and files found:
- Links will be in blue
- Folders will be in green
- Verbose output will be in light grey.
Make sure that the links do not repeat. Each link found will be unique, so if there is a duplicate link, skip it.
Filter out null bytes from the content.
And save everything to $OUTPUT_FILE, that will be $(date +%s.txt)
And, later, make a folders() function that process $OUTPUT_FILE and gives me a hierarchical view of all links and files found, listing each level of the directory hierarchy of each link, skipping duplicates. The output will be a sorted hierarchical directory tree that will include trailing slashes.
Skip the base domain and "http:".
Let's make a Bash script for my cybersecurity lab using the appropiate packages:
## First part of the script:
- First, you will make a scan() function to scan a website. Website is provided at $1. scan() will look for any link found both in the HTML and the CSS.
- External links are discarded, so it will only be a "hit" if scan() finds an internal link or a link that has the base domain/IP.
- Skip external links. Just count links that are internal from the base domain. Example: http://example.org/themes/style.css will count. You will skip links that do not start with "example.org", for example.
- I will talk about counted links like found links. Found links will be unique and must not repeat. So if you make a hit but you already found that link, just skip it.
- It will be a recursive search, so for each found link you will look into the parent folder if it is a file, and also the parent folder of the parent folder if there is any, and so on. For example, if the script found http://example.org/themes/img/4.jpg, you will scan http://example.org/themes and http://example.org/themes/img.
- I want the script to be verbose. But do not echo the links skipped, just the found.
- Filter out null bytes from the content.
- Save everything to $OUTPUT_FILE, that will be $(date +%s.txt). Important to make the variable $OUTPUT_FILE at the start of the script and use that variable, so the filename is persistent.
## Second part of the script:
- You will make a folders() function to output all folders found in the scan. That means that:
- If the script found http://example.org/themes/img/4.jpg, you will output http://example.org/themes and http://example.org/themes/img.
- You will avoid duplicates. If the folder you processed is a duplicate, just skip it.
- Skip the base domain and "http:".
- Output will be like this:
```Folders found:
http://example.org/themes
http://example.org/themes/img
- The script will be as verbose as the first part.
- The output will be sorted alphabetically.
Make a colorful output for all links and files found:
- Links will be in blue
- Folders will be in green
-
Verbose output will be in gray.
- *Después de coger todas las partes de los scripts de las respuestas de ChatGPT que mejor funcionan acorde a lo que pido:* ```bash Fix this script:
!/bin/bash
Color definitions
BLUE='\033[34m' GREEN='\033[32m' GRAY='\033[90m' NC='\033[0m' # No Color
Output file variable
OUTPUT_FILE="$(date +%s).txt"
Variables
IP=$1 DOMAIN="http://$IP" SCANNED_URLS=() UNIQUE_LINKS=()
Function to extract and scan links from a given URL
scan() { local url=$1
# Add URL to scanned list
SCANNED_URLS+=("$url")
# Fetch the content of the URL and remove null bytes
content=$(curl -s "$url" | tr -d '\0')
# Extract all URLs from the HTML content (href and src attributes)
links=$(echo "$content" | grep -oP '(?<=href=")[^"]+|(?<=src=")[^"]+')
# Extract URLs from CSS content (url() references)
css_links=$(echo "$content" | grep -oP '(?<=url\().+?(?=\))')
# Combine links
all_links=$(echo -e "$links\n$css_links" | sort -u)
# Process each link found
for link in $all_links; do
# If the link is relative, convert it to absolute
if [[ "$link" != http* ]]; then
if [[ "$link" == /* ]]; then
link="$DOMAIN$link"
else
link="$url/$link"
fi
fi
# Skip external links (ensure they start with the same domain)
if [[ "$link" != "$DOMAIN"* ]]; then
# echo -e "\e[31mSkipping external link: $link\e[0m"
continue
fi
# Check if the link is already in the UNIQUE_LINKS array
if [[ ! " ${UNIQUE_LINKS[@]} " =~ " $link " ]]; then
echo -e "\e[32mFound link: $link\e[0m"
UNIQUE_LINKS+=("$link")
echo "$link" >> "$OUTPUT_FILE"
# Recursively scan the link
scan "$link"
fi
done
}
folders() { local unique_folders=()
echo "Folders found:" | tee -a $OUTPUT_FILE
# Extract all folders from found links in the output file
folders=$(grep -oP '(?<=Found link: )[^\r\n]*' $OUTPUT_FILE | sed 's|/[^/]*$||' | sort -u)
for folder in $folders; do
if [[ ! " ${unique_folders[@]} " =~ " ${folder} " ]]; then
unique_folders+=("$folder")
echo -e "${GREEN}${folder}${NC}" | tee -a $OUTPUT_FILE
fi
done
}
Usage
Provide the website as the first argument to the script
if [[ -z $1 ]]; then
echo "Usage: $0
Start scanning
scan "$1"
Output the folders
folders
- dirscan v1 hecho. Corrección de errores sobre la marcha:
```bash
Tengo este script para mi laboratorio de pentesting de HackTheBox:
#!/bin/bash
BLUE='\033[34m'
GREEN='\033[32m'
GRAY='\033[90m'
NC='\033[0m'
if [[ -z $1 ]]; then
echo "$0: Check for visible directories in the website."
echo "Usage: $0 <website IP or domain>"
exit 1
fi
OUTPUT_FILE="$(date +%s).txt"
# Variables
IP=$1
DOMAIN="http://$IP"
SCANNED_URLS=()
UNIQUE_LINKS=()
# Function to extract and scan links from a given URL
scan() {
local url=$1
# Add URL to scanned list
SCANNED_URLS+=("$url")
# Fetch the content of the URL and remove null bytes
content=$(curl -s "$url" | tr -d '\0')
# Extract all URLs from the HTML content (href and src attributes)
links=$(echo "$content" | grep -oP '(?<=href=")[^"]+|(?<=src=")[^"]+')
# Extract URLs from CSS content (url() references)
css_links=$(echo "$content" | grep -oP '(?<=url\().+?(?=\))')
# Combine links
all_links=$(echo -e "$links\n$css_links" | sort -u)
# Process each link found
for link in $all_links; do
# If the link is relative, convert it to absolute
if [[ "$link" != http* ]]; then
if [[ "$link" == /* ]]; then
link="$DOMAIN$link"
else
link="$url/$link"
fi
fi
# Skip external links (ensure they start with the same domain)
if [[ "$link" != "$DOMAIN"* ]]; then
continue
fi
# Check if the link is already in the UNIQUE_LINKS array
if [[ ! " ${UNIQUE_LINKS[@]} " =~ " ${link} " ]]; then
echo -e "${BLUE}${link}${NC}"
UNIQUE_LINKS+=("$link")
echo "$link" >> "$OUTPUT_FILE"
# Check if the link hasn't been scanned before to avoid infinite loops
if [[ ! " ${SCANNED_URLS[@]} " =~ " ${link} " ]]; then
# Recursively scan the link
scan "$link"
fi
fi
done
}
# Function to output all folders found in the scan
folders() {
local unique_folders=()
# Extract all folders from found links in the output file
folders=$(sed 's|/[^/]*$||' "$OUTPUT_FILE" | sort -u)
for folder in $folders; do
if [[ ! " ${unique_folders[@]} " =~ " ${folder} " ]]; then
unique_folders+=("$folder")
echo -e "${GREEN}${folder}${NC}" | tee -a "$OUTPUT_FILE"
fi
done
}
# Start scanning
echo -e "${GRAY}Scanning $DOMAIN${NC}"
scan "$DOMAIN"
# Output the folders
echo ""
echo -e "${GRAY}Folders found${NC}"
folders
Sin embargo, al ponerlo en práctica me salen los siguientes enlaces: http://instant.htb/#
http://instant.htb/#/#
http://instant.htb/#/#/#
http://instant.htb/#/#/#/#
http://instant.htb/#/#/#/#/# [...] y así en bucle.
¿Por qué puede suceder y puedes modificarlo para evitar este bucle? Es decir, omitir las almohadillas.
¡Gracias! Ahora tengo el siguiente problema. http://instant.htb/css/default.css http://instant.htb/css/default.css/"data:image/svg+xml,%3Csvg http://instant.htb/css/default.css/xmlns='http://www.w3.org/2000/svg' http://instant.htb/css/default.css/width='100%25' http://instant.htb/css/default.css/height='100%25' http://instant.htb/css/default.css/viewBox='0 http://instant.htb/css/default.css/0 http://instant.htb/css/default.css/800 http://instant.htb/css/default.css/800'%3E%3Cg http://instant.htb/css/default.css/%3E%3Ccircle http://instant.htb/css/default.css/cx='400' http://instant.htb/css/default.css/cy='400' http://instant.htb/css/default.css/r='600'/%3E%3Ccircle http://instant.htb/css/default.css/r='500'/%3E%3Ccircle http://instant.htb/css/default.css/r='400'/%3E%3Ccircle http://instant.htb/css/default.css/r='300'/%3E%3Ccircle http://instant.htb/css/default.css/r='200'/%3E%3Ccircle http://instant.htb/css/default.css/r='100'/%3E%3C/g%3E%3C/svg%3E" http://instant.htb/css/default.css/"data:image/svg+xml;charset=utf8,%3Csvg http://instant.htb/css/default.css/30 http://instant.htb/css/default.css/30' http://instant.htb/css/default.css/xmlns='http://www.w3.org/2000/svg'%3E%3Cpath http://instant.htb/css/default.css/stroke='rgba(0, http://instant.htb/css/default.css/0, http://instant.htb/css/default.css/0.7 http://instant.htb/css/default.css/stroke='rgba(255, http://instant.htb/css/default.css/255, http://instant.htb/css/default.css/fill='%23ffffff' [...] Me recoge el contenido de los archivos CSS. ¿Puedes evitar eso? Si es un archivo CSS, no irá más allá.
Scanning
http://instant.htb
http://instant.htb/css/default.css
http://instant.htb/downloads/instant.apk
http://instant.htb/img/blog-1.jpg
http://instant.htb/img/blog-2.jpg
http://instant.htb/img/blog-3.jpg
http://instant.htb/img/logo.png
http://instant.htb/index.html
http://instant.htb/index.html/css/default.css http://instant.htb/index.html/img/blog-1.jpg http://instant.htb/index.html/img/blog-2.jpg http://instant.htb/index.html/img/blog-3.jpg http://instant.htb/index.html/img/logo.png http://instant.htb/index.html/index.html http://instant.htb/index.html/js/scripts.js http://instant.htb/index.html/mailto:support@instant.htb http://instant.htb/js/scripts.js http://instant.htb/mailto:support@instant.htb
Folders found
http://instant.htb
http://instant.htb/css
http://instant.htb/downloads
http://instant.htb/img
http://instant.htb/index.html
http://instant.htb/index.html/css
http://instant.htb/index.html/img
http://instant.htb/index.html/js
http://instant.htb/js
index.html suele ser el enlace por defecto de la página principal. Sin embargo, en este output lo ha tratado de manera distinta. Haz que los enlaces que dirijan a /index.html, /index.htm, /index.php y demás index comunes no se muestren, porque ya han sido escaneados en su enlace original.