Robots.txt

robots-txt

A friend of mine was doing some web development for a company in a fairly tightly regulated industry and wanted me to do some testing of it, to see if I could spot anything he had missed.

I did the usual stuff — changing URL parameters, putting letters and negative numbers into amount fields, etc. Everything seemed to check out.

At one point, though, I decided to check the site’s robot.txt file to see if there was anything interesting. Most of it was pretty mundane CMS-related stuff. I had almost given up on finding anything interesting when I went to the last directory mentioned in the robots.txt file that I hadn’t been to yet — the Images directory.

Though there were a few folders that seemed admin-related, they were either empty or I didn’t have permissions. Dang.

I did, however, find a subdirectory called “Orders” (or something similar)…

The folder contained a massive amount of PDFs and images, with full order details and customer data. It had name, address, DOB, SSN, and other sensitive of information. There weren’t any payment details, from what I saw, but it was still more than enough for identity theft.

Apparently, no one at the company thought to secure that folder because it wasn’t referenced anywhere directly by the website or app. It had been forgotten about. But, hey, who’s going to manually browse through a site’s directories, right…?

It was good that the robots.txt was there to make sure (or at least politely request) that search engines do not included that directory in their indexes, anyone who spent a few minutes poking around the site like I did could’ve found the same information.

Shortly after this was discovered and reported, the company revamped the way they handle storage of order information and made it completely unavailable to the webserver. They also apparently took my suggestion to no longer allow directory listings.

Share Your Thought