Website Scraping

Sift can scrape property listings from your estate agency website and automatically import them into your inventory. It detects pagination, extracts UK-specific property fields (price, bedrooms, postcode, tenure, sq ft), and keeps your listings up to date.

What URLs Can I Scrape?

Rightmove, Zoopla, and OnTheMarket explicitly prohibit automated scraping in their terms of service. Rightmove's Terms of Use (clause 5.2) state: "use bots, crawlers, scrapers or other automated programs or means to access or collect data." Zoopla's public listings API has been retired and is available only under a commercial agreement.

The recommended approach is to scrape your own estate agency website. This is always permitted and gives you complete control over your property data.

Source type	Example URL	Permission required
Your own website (recommended)	`https://your-agency.co.uk/properties-for-sale/`	None — you own the data
WordPress + Property Hive	`https://your-domain.co.uk/properties/?status=for-sale`	None
Vebra Alto site	`https://your-agency.co.uk/search/`	None
Rightmove (with consent)	`https://www.rightmove.co.uk/...`	You must confirm you have Rightmove's permission
Zoopla (with consent)	`https://www.zoopla.co.uk/...`	You must confirm you have Zoopla's permission
OnTheMarket (with consent)	`https://www.onthemarket.com/...`	You must confirm you have OTM's permission

If you have a data licence or written permission from a portal, you can enable portal scraping by setting source type to rightmove_with_consent, zoopla_with_consent, or onthemarket_with_consent and checking the I confirm I have permission box.

By checking this box you declare that you accept the portal's terms of service risk. Sift does not verify your licence — you are responsible for compliance.

Fields Extracted from UK Sites

Sift's UK parser recognises the standard formats used by Property Hive, Vebra Alto, Reapit Web, and bespoke agency sites:

Field	Format recognised	Example
Price	`£450,000` / `£1,250 pcm`	Sale price or monthly rent
Purpose	`for sale`, `pcm`/`per month` suffix	Buy or rent
Bedrooms	`3 bed`, `3 bedroom`, `3BR`	Integer
Area (sq ft)	`1,200 sq ft`, `1200 sqft`	Stored as sq ft; sq m derived automatically
Postcode	`SW1A 2AA`, `M1 1AE`	Validated and normalised
Tenure	`Freehold`, `Leasehold`, `Share of Freehold`	Stored as `freehold`, `leasehold`, `share_of_freehold`
Property type	`flat`, `house`, `bungalow`, `maisonette`, `detached`, `semi-detached`	Normalised

Fields that can't be extracted from the page structure fall through to an LLM extraction step automatically.

Creating a Scraping Profile

In the dashboard sidebar, click Sources.

Click the Add Profile or Scrape Website button.

Paste the URL of a property listings page — your agency website's "properties for sale" or "properties to let" search results page.

Tip

Use a URL that shows multiple listings on one page, not a single property detail page. Example: https://your-agency.co.uk/properties-for-sale/

Choose how often Sift should re-scrape this source:

Manual — only when you trigger it (all plans)
Daily — once per day (Scale plan only)
Weekly — once per week (Scale plan only)

Automatic scheduling (daily/weekly) requires a Scale plan.

Click Save. The profile appears in your scraping profiles list.

Running a Scrape

Click the Run button on any scraping profile to start a scrape. Sift will:

Fetch the first page of your listings
Detect pagination type (URL-based or JavaScript-based)
Navigate through pages (up to the configured max)
Extract property data using UK-format recognition, falling back to AI extraction for unusual layouts
Import unique properties — duplicates are skipped using URL matching and content similarity

Pagination Detection

Sift automatically detects how a website paginates its listings:

Type	How It Works	Example
URL-based	Follows `?page=2`, `?page=3` or `/page/2/` links	Most agency websites
JavaScript	Clicks "Next" buttons using browser automation	Single-page apps, React-rendered sites
None detected	Only the first page is scraped	Single-page listings

You don't need to configure pagination — Sift figures it out automatically.

For JavaScript-based pagination (common on Reapit Web and modern bespoke sites), Sift uses Firecrawl's browser automation to click through pages. This is slower but handles sites that don't use URL-based pagination.

Viewing Results

After a scrape completes, click the profile to see:

Pages fetched — how many pages were scraped
Properties found — total listings extracted
New properties — listings that weren't already in your database
Pagination type — what method was detected

Scrape Frequency

Frequency	When	Plan Required
Manual	You click "Run"	All plans
Daily	Every 24 hours automatically	Scale
Weekly	Every 7 days automatically	Scale

Automatic scraping (daily/weekly) requires a Scale plan. On Starter and Growth plans, you can only trigger scrapes manually.

Tips for Better Results

Use filtered search URLs — scraping a pre-filtered page (e.g., "3-bed houses for sale in Manchester") gives more relevant results than scraping your entire site root
Start with fewer pages — test with 2–3 pages first to verify extraction quality before scraping 10+ pages
Listings pages only — use your /properties-for-sale/ or /properties-to-let/ URL, not your homepage
Check postcodes — if your listings include full postcodes (e.g. SW1A 2AA), Sift extracts and validates them automatically for area-matching queries

Website Scraping

Website Scraping

What URLs Can I Scrape?

Fields Extracted from UK Sites

Creating a Scraping Profile

Running a Scrape

Viewing Results

Scrape Frequency

Tips for Better Results

Troubleshooting

Next Steps

Upload Excel Files

Manage Properties

On this page

Website Scraping

Only one page was scraped

Properties are missing tenure or postcode

Properties are missing data

The scrape is slow or times out

Upload Excel Files

Manage Properties

On this page