SiftDocs
Property Sources

Website Scraping

Automatically import UK property listings from your estate agency website

Website Scraping

Sift can scrape property listings from your estate agency website and automatically import them into your inventory. It detects pagination, extracts UK-specific property fields (price, bedrooms, postcode, tenure, sq ft), and keeps your listings up to date.

What URLs Can I Scrape?

Rightmove, Zoopla, and OnTheMarket explicitly prohibit automated scraping in their terms of service. Rightmove's Terms of Use (clause 5.2) state: "use bots, crawlers, scrapers or other automated programs or means to access or collect data." Zoopla's public listings API has been retired and is available only under a commercial agreement.

The recommended approach is to scrape your own estate agency website. This is always permitted and gives you complete control over your property data.

Source typeExample URLPermission required
Your own website (recommended)https://your-agency.co.uk/properties-for-sale/None — you own the data
WordPress + Property Hivehttps://your-domain.co.uk/properties/?status=for-saleNone
Vebra Alto sitehttps://your-agency.co.uk/search/None
Rightmove (with consent)https://www.rightmove.co.uk/...You must confirm you have Rightmove's permission
Zoopla (with consent)https://www.zoopla.co.uk/...You must confirm you have Zoopla's permission
OnTheMarket (with consent)https://www.onthemarket.com/...You must confirm you have OTM's permission

If you have a data licence or written permission from a portal, you can enable portal scraping by setting source type to rightmove_with_consent, zoopla_with_consent, or onthemarket_with_consent and checking the I confirm I have permission box.

By checking this box you declare that you accept the portal's terms of service risk. Sift does not verify your licence — you are responsible for compliance.

Fields Extracted from UK Sites

Sift's UK parser recognises the standard formats used by Property Hive, Vebra Alto, Reapit Web, and bespoke agency sites:

FieldFormat recognisedExample
Price£450,000 / £1,250 pcmSale price or monthly rent
Purposefor sale, pcm/per month suffixBuy or rent
Bedrooms3 bed, 3 bedroom, 3BRInteger
Area (sq ft)1,200 sq ft, 1200 sqftStored as sq ft; sq m derived automatically
PostcodeSW1A 2AA, M1 1AEValidated and normalised
TenureFreehold, Leasehold, Share of FreeholdStored as freehold, leasehold, share_of_freehold
Property typeflat, house, bungalow, maisonette, detached, semi-detachedNormalised

Fields that can't be extracted from the page structure fall through to an LLM extraction step automatically.

Creating a Scraping Profile

In the dashboard sidebar, click Sources.

Click the Add Profile or Scrape Website button.

Paste the URL of a property listings page — your agency website's "properties for sale" or "properties to let" search results page.

Tip

Use a URL that shows multiple listings on one page, not a single property detail page. Example: https://your-agency.co.uk/properties-for-sale/

Choose how often Sift should re-scrape this source:

  • Manual — only when you trigger it (all plans)
  • Daily — once per day (Scale plan only)
  • Weekly — once per week (Scale plan only)
Automatic scheduling (daily/weekly) requires a Scale plan.

Click Save. The profile appears in your scraping profiles list.

Running a Scrape

Click the Run button on any scraping profile to start a scrape. Sift will:

  1. Fetch the first page of your listings
  2. Detect pagination type (URL-based or JavaScript-based)
  3. Navigate through pages (up to the configured max)
  4. Extract property data using UK-format recognition, falling back to AI extraction for unusual layouts
  5. Import unique properties — duplicates are skipped using URL matching and content similarity

Pagination Detection

Sift automatically detects how a website paginates its listings:

TypeHow It WorksExample
URL-basedFollows ?page=2, ?page=3 or /page/2/ linksMost agency websites
JavaScriptClicks "Next" buttons using browser automationSingle-page apps, React-rendered sites
None detectedOnly the first page is scrapedSingle-page listings

You don't need to configure pagination — Sift figures it out automatically.

For JavaScript-based pagination (common on Reapit Web and modern bespoke sites), Sift uses Firecrawl's browser automation to click through pages. This is slower but handles sites that don't use URL-based pagination.

Viewing Results

After a scrape completes, click the profile to see:

  • Pages fetched — how many pages were scraped
  • Properties found — total listings extracted
  • New properties — listings that weren't already in your database
  • Pagination type — what method was detected

Scrape Frequency

FrequencyWhenPlan Required
ManualYou click "Run"All plans
DailyEvery 24 hours automaticallyScale
WeeklyEvery 7 days automaticallyScale
Automatic scraping (daily/weekly) requires a Scale plan. On Starter and Growth plans, you can only trigger scrapes manually.

Tips for Better Results

  • Use filtered search URLs — scraping a pre-filtered page (e.g., "3-bed houses for sale in Manchester") gives more relevant results than scraping your entire site root
  • Start with fewer pages — test with 2–3 pages first to verify extraction quality before scraping 10+ pages
  • Listings pages only — use your /properties-for-sale/ or /properties-to-let/ URL, not your homepage
  • Check postcodes — if your listings include full postcodes (e.g. SW1A 2AA), Sift extracts and validates them automatically for area-matching queries

Troubleshooting

Next Steps

On this page