Website Scraping
Automatically import UK property listings from your estate agency website
Website Scraping
Sift can scrape property listings from your estate agency website and automatically import them into your inventory. It detects pagination, extracts UK-specific property fields (price, bedrooms, postcode, tenure, sq ft), and keeps your listings up to date.
What URLs Can I Scrape?
Rightmove, Zoopla, and OnTheMarket explicitly prohibit automated scraping in their terms of service. Rightmove's Terms of Use (clause 5.2) state: "use bots, crawlers, scrapers or other automated programs or means to access or collect data." Zoopla's public listings API has been retired and is available only under a commercial agreement.
The recommended approach is to scrape your own estate agency website. This is always permitted and gives you complete control over your property data.
| Source type | Example URL | Permission required |
|---|---|---|
| Your own website (recommended) | https://your-agency.co.uk/properties-for-sale/ | None — you own the data |
| WordPress + Property Hive | https://your-domain.co.uk/properties/?status=for-sale | None |
| Vebra Alto site | https://your-agency.co.uk/search/ | None |
| Rightmove (with consent) | https://www.rightmove.co.uk/... | You must confirm you have Rightmove's permission |
| Zoopla (with consent) | https://www.zoopla.co.uk/... | You must confirm you have Zoopla's permission |
| OnTheMarket (with consent) | https://www.onthemarket.com/... | You must confirm you have OTM's permission |
Portal Scraping — Consent Required
If you have a data licence or written permission from a portal, you can enable portal scraping by setting source type to rightmove_with_consent, zoopla_with_consent, or onthemarket_with_consent and checking the I confirm I have permission box.
By checking this box you declare that you accept the portal's terms of service risk. Sift does not verify your licence — you are responsible for compliance.
Fields Extracted from UK Sites
Sift's UK parser recognises the standard formats used by Property Hive, Vebra Alto, Reapit Web, and bespoke agency sites:
| Field | Format recognised | Example |
|---|---|---|
| Price | £450,000 / £1,250 pcm | Sale price or monthly rent |
| Purpose | for sale, pcm/per month suffix | Buy or rent |
| Bedrooms | 3 bed, 3 bedroom, 3BR | Integer |
| Area (sq ft) | 1,200 sq ft, 1200 sqft | Stored as sq ft; sq m derived automatically |
| Postcode | SW1A 2AA, M1 1AE | Validated and normalised |
| Tenure | Freehold, Leasehold, Share of Freehold | Stored as freehold, leasehold, share_of_freehold |
| Property type | flat, house, bungalow, maisonette, detached, semi-detached | Normalised |
Fields that can't be extracted from the page structure fall through to an LLM extraction step automatically.
Creating a Scraping Profile
In the dashboard sidebar, click Sources.
Click the Add Profile or Scrape Website button.
Paste the URL of a property listings page — your agency website's "properties for sale" or "properties to let" search results page.
Tip
https://your-agency.co.uk/properties-for-sale/Choose how often Sift should re-scrape this source:
- Manual — only when you trigger it (all plans)
- Daily — once per day (Scale plan only)
- Weekly — once per week (Scale plan only)
Click Save. The profile appears in your scraping profiles list.
Running a Scrape
Click the Run button on any scraping profile to start a scrape. Sift will:
- Fetch the first page of your listings
- Detect pagination type (URL-based or JavaScript-based)
- Navigate through pages (up to the configured max)
- Extract property data using UK-format recognition, falling back to AI extraction for unusual layouts
- Import unique properties — duplicates are skipped using URL matching and content similarity
Pagination Detection
Sift automatically detects how a website paginates its listings:
| Type | How It Works | Example |
|---|---|---|
| URL-based | Follows ?page=2, ?page=3 or /page/2/ links | Most agency websites |
| JavaScript | Clicks "Next" buttons using browser automation | Single-page apps, React-rendered sites |
| None detected | Only the first page is scraped | Single-page listings |
You don't need to configure pagination — Sift figures it out automatically.
Viewing Results
After a scrape completes, click the profile to see:
- Pages fetched — how many pages were scraped
- Properties found — total listings extracted
- New properties — listings that weren't already in your database
- Pagination type — what method was detected
Scrape Frequency
| Frequency | When | Plan Required |
|---|---|---|
| Manual | You click "Run" | All plans |
| Daily | Every 24 hours automatically | Scale |
| Weekly | Every 7 days automatically | Scale |
Tips for Better Results
- Use filtered search URLs — scraping a pre-filtered page (e.g., "3-bed houses for sale in Manchester") gives more relevant results than scraping your entire site root
- Start with fewer pages — test with 2–3 pages first to verify extraction quality before scraping 10+ pages
- Listings pages only — use your
/properties-for-sale/or/properties-to-let/URL, not your homepage - Check postcodes — if your listings include full postcodes (e.g.
SW1A 2AA), Sift extracts and validates them automatically for area-matching queries