Distributed, Fault-tolerant Web Crawling with RasPi

Elisabetta Ramilli • Riccardo Benedetti

abstract

The project aims at building a distributed web crawling infrastructure, whose functionality is to fetch publications' metadata from the APICe online repository, provided a set of keywords are given for searching.

As far as non-functional properties are concerned, the infrastructure should be:

distributed and open, that is, any number of web crawlers may be deployed on any number of networked machines, possibly even at run-time
fault-tolerant to disconnections and crashes, that is, both disconnections and crashes should be detected as soon as possible and (ii) properly managed – e.g., crawling tasks of the disconnected/crashed crawler re-assigned –
resource-efficient, that is, the infrastructure should be able to execute smoothly on resource-constrained devices—e.g., RasPi systems

Usage of either the TuCSoN middleware for coordinating crawlers, or (ii) the JADE framework for programming crawlers is mandatory. Usage of both is considered a plus.

keywords

WebCrawler,MAS,Distributed,Fault-tolerance,Resource-efficient, Raspberry Pi, Tucson, Jade, T4J

references

Laboratory lessons on TuCSoN and JADE
TuCSoN4JADE integration library: https://bitbucket.org/smariani/tucson4jade/wiki/Home
JADE
- home: http://jade.tilab.com
- add-ons page: http://jade.tilab.com/download/add-ons/
- 3rd party contributions page: http://jade.tilab.com/download/third-party-contributions/

outcomes

final report