No description
Find a file
2026-03-16 09:00:00 +01:00
src Remove notification for final scrap 2026-03-16 09:00:00 +01:00
vm-scrapper Update README 2026-03-15 12:45:16 +01:00
web Show duration in short description 2026-03-15 22:09:59 +01:00
.gitignore Remove old code. Add notify 2026-03-10 18:52:29 +01:00
Cargo.lock Sort table 2026-03-11 00:24:12 +01:00
Cargo.toml Sort table 2026-03-11 00:24:12 +01:00
default.env Store hidden and rate as fields 2026-03-16 08:56:19 +01:00
deploy.sh Prepare deploy 2026-03-10 23:15:30 +01:00
Dockerfile Prepare deploy 2026-03-10 23:15:30 +01:00
README.md Calculate durations from point of interest 2026-03-15 21:47:21 +01:00
rust-toolchain Prepare deploy 2026-03-10 23:15:30 +01:00
scrap-immo.iml Mark ads without updates as hidden 2026-03-11 08:19:09 +01:00

Scrap Immo

This repo is a scrapper to keep us up to date with the prices of new houses in our region.

It is made of two components:

  • vm-scrapper: a virtual machine running Firefox with the Greasemonkey extension with a script to grab full-page HTML data from the search response pages
  • scrap-immo: a webserver in Rust that receives the scrapped HTML, parses that data, organizes it, sends notifications and presents it all as pages

VM Scrapper

In my setup, I have a homelab that will continuously run a virtual machine. But this step is optional if, for example, you decide to run Firefox in your own local machine.

My setup is:

  • homelab "host": Ubuntu Server 24.04
    • guest "vm": Ubuntu 24.04
      • Firefox
        • Greasemonkey
    • scrap-immo server
sudo apt update
sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients virtinst ovmf
sudo systemctl enable --now libvirtd
sudo systemctl status libvirtd

I've noticed a problem: libvirtd runs a dnsmasq that conflicts with my pihole. To fix this, I've configured pihole to only listen to the "public" interface with:

Environment = FTLCONF_dns_listeningMode=BIND  
Environment = FTLCONF_dns_interface=enp3s0

Now back to the server config:

# Download ISO
sudo mkdir -p /var/lib/libvirt/isos
cd /var/lib/libvirt/isos
sudo wget https://releases.ubuntu.com/24.04.4/ubuntu-24.04.4-desktop-amd64.iso

# Prepare disk
sudo mkdir -p /var/lib/libvirt/images

# Create VM
sudo virt-install \
  --name scrap-immo \
  --memory 4096 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/scrap-immo.qcow2,size=30,format=qcow2 \
  --os-variant ubuntu24.04 \
  --cdrom /var/lib/libvirt/isos/ubuntu-24.04.4-desktop-amd64.iso \
  --network network=default,model=virtio \
  --graphics vnc,listen=127.0.0.1 \
  --video virtio \
  --cpu host \
  --boot uefi

# In my desktop, create a SSH tunnel
ssh -L 5900:127.0.0.1:5900 sitegui@192.168.1.51
# Then open Remmina and connect as VNC to localhost:5900

# Manually install Firefox

Management:

# Shutdown and start as a normal machine
# Good: no memory and no CPU will be used when off
# Bad: on start you will need to manually open Firefox
sudo virsh shutdown scrap-immo
sudo virsh start scrap-immo

# Freeze and unfreeze into the disk
# Good: no memory and no CPU will be used when off
# Bad: turning on and off requires reading/writing the memory contents to disk
sudo virsh managedsave scrap-immo
sudo virsh start scrap-immo

# Freeze and unfreeze into the memory
# Good: no CPU will be used when off, turning on and off is very fast
# Bad: memory continues to be used
sudo virsh suspend scrap-immo
sudo virsh resume scrap-immo

Preparing scrapper Firefox

  1. add Greasemonkey to Firefox
  2. navigate to Leboncoin in a new tab and do your desired search
  3. add a new script and copy the source from this script
  4. change the variables at the top of the file to your setup
  5. refresh the page to let the script be injected and start running

Run server

  1. Install Rust
  2. Run cargo run --release
  3. Open http://localhost:8080

TODO: OSRM

# Prepare cartograhy
wget https://download.geofabrik.de/europe/france/pays-de-la-loire-latest.osm.pbf
podman run -v "${PWD}:/data" docker.io/osrm/osrm-backend osrm-extract -p /opt/bicycle.lua /data/pays-de-la-loire-latest.osm.pbf
podman run -v "${PWD}:/data" docker.io/osrm/osrm-backend osrm-partition /data/pays-de-la-loire-latest.osrm
podman run -v "${PWD}:/data" docker.io/osrm/osrm-backend osrm-customize /data/pays-de-la-loire-latest.osrm

# Run OSRM service
podman run -p 5000:5000 -v "${PWD}:/data" docker.io/osrm/osrm-backend osrm-routed --algorithm mld /data/pays-de-la-loire-latest.osrm

Develop server locally

cargo install watchexec-cli
watchexec --restart --socket 8080 --interactive --debounce 1s -- cargo run

Deploy

Set git alias with:

git config alias.deploy '!git push && ssh -4 sitegui@ssh.sitegui.dev ./deploy sitegui/scrap-immo'

then simply run git deploy to push and deploy.

TODO

  • move rate and hidden into dynamic fields
  • save events into the database