How to Find and Extract All URLs from a Website Using Olostep Maps API and Streamlit
## Introduction
When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the **first critical tasks** is finding all the URLs on a website.
While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there's a **faster, modern way** : using **Olostep Maps API**.
In this guide, we'll:
* Introduce the challenge of URL discovery
* Show how to build a **live Streamlit app** to scrape all URLs
* Compare it with traditional techniques (like sitemap.xml and robots.txt)
* Provide complete runnable Python code
> **Target Audience:** Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.
## Why Extract All URLs?
Finding every page on a website can help you:
* **Analyze site structure** (for SEO)
* **Scrape website content** efficiently
* **Find hidden gems** like orphan pages
* **Monitor website changes**
* **Prepare data** for AI agents and automation
## Traditional Methods (Before Olostep)
### 1. Sitemaps (XML Files)
Webmasters often create XML sitemaps to help Google index their sites. Here's an example:
<urlset>
<url>
<loc>https://example.com</loc>
</url>
<url>
<loc>https://example.com/about</loc>
</url>
</urlset>
To find sitemaps:
* Visit `/sitemap.xml` (e.g., https://example.com/sitemap.xml)
* Check `/robots.txt` (it usually links to the sitemap)
Other possible sitemap locations:
* `/sitemap.xml.gz`
* `/sitemap_index.xml`
* `/sitemap.php`
You can also Google:
site:example.com filetype:xml
**Problems:**
* Some websites don't maintain updated sitemaps.
* Not all pages may be listed.
* Dynamic websites (heavy JavaScript) often leave out many pages.
### 2. Robots.txt
Example:
User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin
Good for finding disallowed URLs and sitemap links, but again **not comprehensive**.
## The Modern Solution: Olostep Maps API
โ
Find **up to 100,000 URLs** in seconds.
โ
No need to manually find sitemap or robots.txt.
โ
Simple API call.
โ
No server maintenance or IP bans.
๐ Full code Gist
Let's **build a full Streamlit app** to demo this!
## ๐ ๏ธ Full Project: Website URL Extractor with Olostep Maps API + Streamlit
### 1. Install Requirements
pip install streamlit requests
### 2. Python Code
import streamlit as st
import requests
import json
def fetch_urls(target_url, api_key):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = {"url": target_url}
response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
if response.status_code == 200:
return response.json()
else:
st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
return None
st.title("๐ Website URL Scraper")
st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")
api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")
if st.button("Find URLs"):
if api_key and url_to_scrape:
with st.spinner("Fetching URLs..."):
data = fetch_urls(url_to_scrape, api_key)
if data:
urls = data.get("urls", [])
st.success(f"โ
Found {len(urls)} URLs!")
for idx, u in enumerate(urls, start=1):
st.markdown(f"{idx}. {u}")
st.download_button(
"๐ Download URLs as Text File",
data="\n".join(urls),
file_name="discovered_urls.txt",
mime="text/plain"
)
## ๐ธ Example Output
โ
Found 35 URLs from `https://docs.olostep.com`
๐ฅ Saved as `discovered_urls.txt`
## โก Why Olostep Maps API Beats Traditional Methods
Feature | Sitemap/Robots.txt | SEO Spider | Olostep Maps
---|---|---|---
Instant Response | โ | โ | โ
Handles JS-heavy Sites | โ | โ ๏ธ (Partial) | โ
Handles Big Sites | โ | โ (Limit) | โ
No Setup Needed | โ | โ | โ
Easy Pagination | โ | โ | โ
## ๐ Conclusion
Using Olostep Maps API + a few lines of Streamlit code, you can build powerful **website discovery tools** in minutes.
No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.
โ
Super fast
โ
Reliable
โ
Perfect for Growth Engineering, SEO, Scraping, and Automation.
## ๐ Ready to try?
Register at ๐ Olostep.com and start building your own data pipelines today!
**Written by:**
**Mohammad Ehsan Ansari**
Growth Engineer @ Olostep
Happy scraping! ๐