Distributed, Fault-tolerant Web Crawling with RasPi


Distributed, Fault-tolerant Web Crawling with RasPi

classic project

Authors

Abstract

The project aims at building a distributed web crawling infrastructure, whose functionality is to fetch publications' metadata from the APICe online repository, provided a set of keywords are given for searching.

As far as non-functional properties are concerned, the infrastructure should be:

  • distributed and open, that is, any number of web crawlers may be deployed on any number of networked machines, possibly even at run-time
  • fault-tolerant to disconnections and crashes, that is, both disconnections and crashes should be  detected as soon as possible and (ii) properly managed – e.g., crawling tasks of the disconnected/crashed crawler re-assigned –
  • resource-efficient, that is, the infrastructure should be able to execute smoothly on resource-constrained devices—e.g., RasPi systems

Usage of either  the TuCSoN middleware for coordinating crawlers, or (ii) the JADE framework for programming crawlers is mandatory. Usage of both is considered a plus.

References

Outcomes

Courses / Personal

Course

— a.y.

2015/2016

— credits

9

— cycle

2nd cycle

— language

wit.gif

Teachers

— professor

Andrea Omicini

Context

— university

Alma Mater Studiorum-Università di Bologna

— campus

Cesena

— department / faculty / school

DISI

— 2nd cycle

8614 Ingegneria e scienze informatiche 

URLs & IDs

AMS Page
Course Timetable

— course ID

58260

Partita IVA: 01131710376 — Copyright © 2008–2023 APICe@DISI – PRIVACY