Script: [[Scripts/Ciberseguridad/dirscan|dirscan]] Cada prompt ha generado una respuesta que me ha servido para mezclar partes del script que he acabado generando. ChatGPT no da un script bien hecho a la primera.

Vamos a hacer un script en Bash para mi laboratorio de ciberseguridad usando los paquetes apropiados: - Escanea el contenido de una página para buscar enlaces encontrados tanto en el HTML como en el CSS. - En estos enlaces, haz lo mismo. Será una búsqueda recursiva. - Haz output de los enlaces y archivos encontrados. La sintáxis será folderscan.sh <IP> y los comentarios los harás en inglés.

Make it so the verbose output is colorful. And also, please make it so links in the output does not repeat.

Now avoid scanning external websites.

./folderscan.sh: line 22: warning: command substitution: ignored null byte in input Make it so this does not display.

Okay, now please make another script to process output_links.txt and tell me all the directories found. For example: Having this output_links.txt:

http://sea.htb/
http://sea.htb/home
http://sea.htb/how-to-participate
http://sea.htb/contact.php
http://sea.htb/themes/bike/css/style.css
http://sea.htb/themes/bike/css/style.css/../img/1.png
http://sea.htb/themes/bike/img/velik71-new-logotip.png
http://sea.htb/themes/bike/css/style.css/../img/2.png
http://sea.htb/themes/bike/css/style.css/../img/3.png
http://sea.htb/themes/bike/css/style.css/../img/4.png
http://sea.htb/themes/bike/css/style.css/../img/5.png
http://sea.htb/themes/bike/css/style.css/../img/6.png
http://sea.htb/themes/bike/css/style.css/../img/bike.png

I'd like the script to tell me: 
Folders found: 
http://sea.htb/themes 
http://sea.htb/themes/bike 
http://sea.htb/themes/bike/img 
[...]

Edit the script to avoid saying folders like "http:" and "http://sea.htb". And make the output more colorful.

I'd like to show a recursive output, where instead of telling:

Folders found:
http://sea.htb/themes/bike/css
http://sea.htb/themes/bike/css/style.css/../img
http://sea.htb/themes/bike/img

it tells:

Folders found:
http://sea.htb/themes/
http://sea.htb/themes/bike/
http://sea.htb/themes/bike/css/
http://sea.htb/themes/bike/img

Let's make a Bash script for my cybersecurity lab using the appropiate packages: 
First, you will make a scan() function to scan a website. Website is provided at $1. scan() will look for any link found both in the HTML and the CSS. External links are discarded, so it will only be a "hit" if scan() finds an internal link or a link that has the base domain/IP. 
For each found link, do the same. It will be a recursive search. 
I want the script to be verbose. But do not echo the links skipped, just the found. 
Also, make a colorful output for all links and files found: 
- Links will be in blue 
- Folders will be in green 
- Verbose output will be in light grey. 
Make sure that the links do not repeat. Each link found will be unique, so if there is a duplicate link, skip it. 
Filter out null bytes from the content. 
And save everything to $OUTPUT_FILE, that will be $(date +%s.txt) 
And, later, make a folders() function that process $OUTPUT_FILE and gives me a hierarchical view of all links and files found, listing each level of the directory hierarchy of each link, skipping duplicates. The output will be a sorted hierarchical directory tree that will include trailing slashes. 
Skip the base domain and "http:".

Let's make a Bash script for my cybersecurity lab using the appropiate packages:

## First part of the script:
- First, you will make a scan() function to scan a website. Website is provided at $1. scan() will look for any link found both in the HTML and the CSS. 
- External links are discarded, so it will only be a "hit" if scan() finds an internal link or a link that has the base domain/IP.
- Skip external links. Just count links that are internal from the base domain. Example: http://example.org/themes/style.css will count. You will skip links that do not start with "example.org", for example.
- I will talk about counted links like found links. Found links will be unique and must not repeat. So if you make a hit but you already found that link, just skip it.
- It will be a recursive search, so for each found link you will look into the parent folder if it is a file, and also the parent folder of the parent folder if there is any, and so on. For example, if the script found http://example.org/themes/img/4.jpg, you will scan http://example.org/themes and http://example.org/themes/img.
- I want the script to be verbose. But do not echo the links skipped, just the found. 
- Filter out null bytes from the content. 
- Save everything to $OUTPUT_FILE, that will be $(date +%s.txt). Important to make the variable $OUTPUT_FILE at the start of the script and use that variable, so the filename is persistent. 

## Second part of the script:
- You will make a folders() function to output all folders found in the scan. That means that:
    - If the script found http://example.org/themes/img/4.jpg, you will output http://example.org/themes and http://example.org/themes/img.
    - You will avoid duplicates. If the folder you processed is a duplicate, just skip it.
    - Skip the base domain and "http:".
- Output will be like this:
    ```Folders found:
    http://example.org/themes
    http://example.org/themes/img

The script will be as verbose as the first part.
The output will be sorted alphabetically.

Make a colorful output for all links and files found:

Links will be in blue
Folders will be in green

Verbose output will be in gray.



- *Después de coger todas las partes de los scripts de las respuestas de ChatGPT que mejor funcionan acorde a lo que pido:*
```bash
Fix this script:

!/bin/bash

Color definitions

BLUE='\033[34m' GREEN='\033[32m' GRAY='\033[90m' NC='\033[0m' # No Color

Output file variable

OUTPUT_FILE="$(date +%s).txt"

Variables

IP=$1 DOMAIN="http://$IP" SCANNED_URLS=() UNIQUE_LINKS=()

Function to extract and scan links from a given URL

scan() { local url=$1

# Add URL to scanned list
SCANNED_URLS+=("$url")

# Fetch the content of the URL and remove null bytes
content=$(curl -s "$url" | tr -d '\0')

# Extract all URLs from the HTML content (href and src attributes)
links=$(echo "$content" | grep -oP '(?<=href=")[^"]+|(?<=src=")[^"]+')

# Extract URLs from CSS content (url() references)
css_links=$(echo "$content" | grep -oP '(?<=url\().+?(?=\))')

# Combine links
all_links=$(echo -e "$links\n$css_links" | sort -u)

# Process each link found
for link in $all_links; do
    # If the link is relative, convert it to absolute
    if [[ "$link" != http* ]]; then
        if [[ "$link" == /* ]]; then
            link="$DOMAIN$link"
        else
            link="$url/$link"
        fi
    fi

    # Skip external links (ensure they start with the same domain)
    if [[ "$link" != "$DOMAIN"* ]]; then
        # echo -e "\e[31mSkipping external link: $link\e[0m"
        continue
    fi

    # Check if the link is already in the UNIQUE_LINKS array
    if [[ ! " ${UNIQUE_LINKS[@]} " =~ " $link " ]]; then
        echo -e "\e[32mFound link: $link\e[0m"
    UNIQUE_LINKS+=("$link")
        echo "$link" >> "$OUTPUT_FILE"

        # Recursively scan the link
        scan "$link"
    fi
done

}

folders() { local unique_folders=()

echo "Folders found:" | tee -a $OUTPUT_FILE

# Extract all folders from found links in the output file
folders=$(grep -oP '(?<=Found link: )[^\r\n]*' $OUTPUT_FILE | sed 's|/[^/]*$||' | sort -u)

for folder in $folders; do
    if [[ ! " ${unique_folders[@]} " =~ " ${folder} " ]]; then
        unique_folders+=("$folder")
        echo -e "${GREEN}${folder}${NC}" | tee -a $OUTPUT_FILE
    fi
done

}

Usage

Provide the website as the first argument to the script

if [[ -z $1 ]]; then echo "Usage: $0 " exit 1 fi

Start scanning

scan "$1"

Output the folders

folders


- dirscan v1 hecho. Corrección de errores sobre la marcha:
```bash
Tengo este script para mi laboratorio de pentesting de HackTheBox:

#!/bin/bash
BLUE='\033[34m'
GREEN='\033[32m'
GRAY='\033[90m'
NC='\033[0m'

if [[ -z $1 ]]; then
    echo "$0: Check for visible directories in the website."
    echo "Usage: $0 <website IP or domain>"
    exit 1
fi

OUTPUT_FILE="$(date +%s).txt"

# Variables
IP=$1
DOMAIN="http://$IP"
SCANNED_URLS=()
UNIQUE_LINKS=()

# Function to extract and scan links from a given URL
scan() {
    local url=$1

    # Add URL to scanned list
    SCANNED_URLS+=("$url")

    # Fetch the content of the URL and remove null bytes
    content=$(curl -s "$url" | tr -d '\0')

    # Extract all URLs from the HTML content (href and src attributes)
    links=$(echo "$content" | grep -oP '(?<=href=")[^"]+|(?<=src=")[^"]+')

    # Extract URLs from CSS content (url() references)
    css_links=$(echo "$content" | grep -oP '(?<=url\().+?(?=\))')

    # Combine links
    all_links=$(echo -e "$links\n$css_links" | sort -u)

    # Process each link found
    for link in $all_links; do
        # If the link is relative, convert it to absolute
        if [[ "$link" != http* ]]; then
            if [[ "$link" == /* ]]; then
                link="$DOMAIN$link"
            else
                link="$url/$link"
            fi
        fi

        # Skip external links (ensure they start with the same domain)
        if [[ "$link" != "$DOMAIN"* ]]; then
            continue
        fi

        # Check if the link is already in the UNIQUE_LINKS array
        if [[ ! " ${UNIQUE_LINKS[@]} " =~ " ${link} " ]]; then
            echo -e "${BLUE}${link}${NC}"
            UNIQUE_LINKS+=("$link")
            echo "$link" >> "$OUTPUT_FILE"

            # Check if the link hasn't been scanned before to avoid infinite loops
            if [[ ! " ${SCANNED_URLS[@]} " =~ " ${link} " ]]; then
                # Recursively scan the link
                scan "$link"
            fi
        fi
    done
}

# Function to output all folders found in the scan
folders() {
    local unique_folders=()

    # Extract all folders from found links in the output file
    folders=$(sed 's|/[^/]*$||' "$OUTPUT_FILE" | sort -u)

    for folder in $folders; do
        if [[ ! " ${unique_folders[@]} " =~ " ${folder} " ]]; then
            unique_folders+=("$folder")
            echo -e "${GREEN}${folder}${NC}" | tee -a "$OUTPUT_FILE"
        fi
    done
}

# Start scanning
echo -e "${GRAY}Scanning $DOMAIN${NC}"
scan "$DOMAIN"

# Output the folders
echo ""
echo -e "${GRAY}Folders found${NC}"
folders

Sin embargo, al ponerlo en práctica me salen los siguientes enlaces: http://instant.htb/# 
http://instant.htb/#/# 
http://instant.htb/#/#/# 
http://instant.htb/#/#/#/# 
http://instant.htb/#/#/#/#/# [...] y así en bucle. 
¿Por qué puede suceder y puedes modificarlo para evitar este bucle? Es decir, omitir las almohadillas.

¡Gracias! Ahora tengo el siguiente problema. http://instant.htb/css/default.css http://instant.htb/css/default.css/"data:image/svg+xml,%3Csvg http://instant.htb/css/default.css/xmlns='http://www.w3.org/2000/svg' http://instant.htb/css/default.css/width='100%25' http://instant.htb/css/default.css/height='100%25' http://instant.htb/css/default.css/viewBox='0 http://instant.htb/css/default.css/0 http://instant.htb/css/default.css/800 http://instant.htb/css/default.css/800'%3E%3Cg http://instant.htb/css/default.css/%3E%3Ccircle http://instant.htb/css/default.css/cx='400' http://instant.htb/css/default.css/cy='400' http://instant.htb/css/default.css/r='600'/%3E%3Ccircle http://instant.htb/css/default.css/r='500'/%3E%3Ccircle http://instant.htb/css/default.css/r='400'/%3E%3Ccircle http://instant.htb/css/default.css/r='300'/%3E%3Ccircle http://instant.htb/css/default.css/r='200'/%3E%3Ccircle http://instant.htb/css/default.css/r='100'/%3E%3C/g%3E%3C/svg%3E" http://instant.htb/css/default.css/"data:image/svg+xml;charset=utf8,%3Csvg http://instant.htb/css/default.css/30 http://instant.htb/css/default.css/30' http://instant.htb/css/default.css/xmlns='http://www.w3.org/2000/svg'%3E%3Cpath http://instant.htb/css/default.css/stroke='rgba(0, http://instant.htb/css/default.css/0, http://instant.htb/css/default.css/0.7 http://instant.htb/css/default.css/stroke='rgba(255, http://instant.htb/css/default.css/255, http://instant.htb/css/default.css/fill='%23ffffff' [...] Me recoge el contenido de los archivos CSS. ¿Puedes evitar eso? Si es un archivo CSS, no irá más allá.

Scanning 
http://instant.htb 
http://instant.htb/css/default.css 
http://instant.htb/downloads/instant.apk 
http://instant.htb/img/blog-1.jpg 
http://instant.htb/img/blog-2.jpg 
http://instant.htb/img/blog-3.jpg 
http://instant.htb/img/logo.png 
http://instant.htb/index.html 
http://instant.htb/index.html/css/default.css http://instant.htb/index.html/img/blog-1.jpg http://instant.htb/index.html/img/blog-2.jpg http://instant.htb/index.html/img/blog-3.jpg http://instant.htb/index.html/img/logo.png http://instant.htb/index.html/index.html http://instant.htb/index.html/js/scripts.js http://instant.htb/index.html/mailto:support@instant.htb http://instant.htb/js/scripts.js http://instant.htb/mailto:support@instant.htb 

Folders found 
http://instant.htb 
http://instant.htb/css 
http://instant.htb/downloads 
http://instant.htb/img 
http://instant.htb/index.html 
http://instant.htb/index.html/css 
http://instant.htb/index.html/img 
http://instant.htb/index.html/js 
http://instant.htb/js 

index.html suele ser el enlace por defecto de la página principal. Sin embargo, en este output lo ha tratado de manera distinta. Haz que los enlaces que dirijan a /index.html, /index.htm, /index.php y demás index comunes no se muestren, porque ya han sido escaneados en su enlace original.