Distributed, Fault-tolerant Web Crawling with RasPi

   page       attach   
abstract

The project aims at building a distributed web crawling infrastructure, whose functionality is to fetch publications' metadata from the APICe online repository, provided a set of keywords are given for searching.

As far as non-functional properties are concerned, the infrastructure should be:

  • distributed and open, that is, any number of web crawlers may be deployed on any number of networked machines, possibly even at run-time
  • fault-tolerant to disconnections and crashes, that is, both disconnections and crashes should be  detected as soon as possible and (ii) properly managed – e.g., crawling tasks of the disconnected/crashed crawler re-assigned –
  • resource-efficient, that is, the infrastructure should be able to execute smoothly on resource-constrained devices—e.g., RasPi systems

Usage of either  the TuCSoN middleware for coordinating crawlers, or (ii) the JADE framework for programming crawlers is mandatory. Usage of both is considered a plus.

keywords
WebCrawler,MAS,Distributed,Fault-tolerance,Resource-efficient, Raspberry Pi, Tucson, Jade, T4J
references
outcomes