Lab 11: Web Application Reconnaissance

Objective

Perform systematic web application reconnaissance against a live vulnerable server from Kali Linux. You will:

  1. Fingerprint the technology stack from HTTP headers using nmap and whatweb

  2. Enumerate hidden directories and files with gobuster β€” finding admin panels, backups, and config files

  3. Read robots.txt and sitemap.xml to discover intentionally hidden paths

  4. Access sensitive files left exposed: .env, .git/config, backup.sql, phpinfo.php

  5. Reach unauthenticated endpoints for user data, internal config, and admin panels

  6. Build a complete recon report mapping the full attack surface

All phases run from the Kali attacker container against the victim Flask server β€” no simulation, every result is a real HTTP response.


Background

Reconnaissance is the first phase of every penetration test (PTES β€” Penetration Testing Execution Standard, OWASP Testing Guide). Professional attackers spend 60–80% of their engagement time in reconnaissance before touching anything.

Why recon matters:

  • Headers reveal framework and version β†’ maps directly to known CVEs

  • robots.txt is a roadmap of what the developer tried to hide

  • .git/config exposes the private repository URL β€” often git clone-able

  • backup.sql from a web root has ended careers (and companies)

  • An unauthenticated /api/users endpoint is data breach #1 in any assessment

Real-world examples:

  • 2021 Twitch breach (135GB) β€” git repo history accessible via misconfigured S3 bucket discovered through recon

  • 2019 GraphQL introspection β€” leaving introspection enabled on production APIs lets attackers map every query, mutation, and data type β€” found trivially with gobuster

  • WordPress version fingerprinting β€” X-Generator: WordPress/5.8 maps to 200+ known CVEs; attackers automate this with WPScan in under 60 seconds


Architecture

Time

45 minutes

Prerequisites

  • Docker installed and running

Tools

Tool
Container
Purpose

nmap

Kali

Port scan + HTTP header scripts

whatweb

Kali

Technology stack fingerprinting

gobuster

Kali

Directory and file enumeration

curl

Kali

Manual HTTP requests, header inspection

python3

Kali

Parse JSON responses, build recon report


Lab Instructions

Step 1: Environment Setup β€” Launch the Victim Server

πŸ“Έ Verified Output:


Step 2: Launch the Kali Attacker Container

Set your target and confirm connectivity:


Step 3: Service Fingerprinting β€” nmap + HTTP Scripts

πŸ“Έ Verified Output:

πŸ’‘ nmap's http-headers script pulls every response header in one scan. Notice the server is sending contradictory headers β€” it claims to be Apache, PHP, WordPress, AND ASP.NET simultaneously. In a real engagement, mismatched headers like this indicate a reverse proxy in front of a different backend. The X-Debug-Info header is especially dangerous: it confirms the environment is production, leaks the version (2.3.1), and gives a build date β€” all useful for CVE matching.


Step 4: Technology Stack Fingerprinting β€” whatweb

πŸ“Έ Verified Output:

πŸ“Έ Verified Output:


Step 5: Directory Enumeration β€” gobuster

πŸ“Έ Verified Output:

πŸ’‘ gobuster's -x flag is critical for web recon. Developers leave backup files (backup.sql, config.php.bak), environment files (.env), and debug pages (phpinfo.php) in the web root during development and forget to remove them. Status: 200 means the file is readable by anyone. Status: 301/302 means it redirects somewhere interesting. Status: 403 means it exists but is blocked β€” worth noting for later bypass attempts.


Step 6: Read robots.txt and sitemap.xml

πŸ“Έ Verified Output:

πŸ’‘ robots.txt is the attacker's cheat sheet. The Disallow entries are literally a list of "please don't look here" β€” which means they are always the first things an attacker looks at. /.git/ being disallowed means a developer likely committed the .git directory to the web root; git clone http://victim/ would download the entire source history. Never use robots.txt as a security control β€” it only works on compliant crawlers (which attackers are not).


Step 7: Access Sensitive Exposed Files

πŸ“Έ Verified Output:


Step 8: Unauthenticated API Endpoints

πŸ“Έ Verified Output:


Step 9: Build the Recon Report

πŸ“Έ Verified Output:


Step 10: Cleanup


Attack Surface Summary

Finding
Severity
Impact

.env exposed

CRITICAL

All secrets: DB password, AWS key, JWT secret

backup.sql in web root

CRITICAL

Plaintext passwords for all users

/admin unauthenticated

CRITICAL

Full admin panel β€” user management, data export

/api/internal/config public

CRITICAL

DB host, Redis host, all internal addresses

.git/config exposed

HIGH

Private repo URL β€” source code accessible

phpinfo.php live

HIGH

DB connection string, document root, PHP config

/api/users unauthenticated

HIGH

Full user list with emails and roles

Version headers

MEDIUM

Direct CVE matching to known vulnerabilities


Remediation

Finding
Fix

Version disclosure headers

Remove Server, X-Powered-By, X-Generator at nginx/Apache level

.env in web root

Move to parent directory above web root; block \.env at web server: location ~ /\.env { deny all; }

.git in web root

Add to .gitignore at deploy time; block via nginx: location ~ /\.git { deny all; }

Backup files in web root

Never keep *.sql, *.bak in web root; use secure off-site storage

Unauthenticated endpoints

All /api/ endpoints require JWT validation; /admin behind IP allowlist + MFA

Verbose error pages

Return {"error": "Internal Server Error", "id": "ERR-XXXX"} β€” log details server-side only

phpinfo.php

Delete from production entirely; never deploy debug pages to production

Further Reading

Last updated