abstract
The project aims at building a distributed web crawling infrastructure, whose functionality is to fetch publications' metadata from the APICe online repository, provided a set of keywords are given for searching.
As far as non-functional properties are concerned, the infrastructure should be:
- distributed and open, that is, any number of web crawlers may be deployed on any number of networked machines, possibly even at run-time
- fault-tolerant to disconnections and crashes, that is, both disconnections and crashes should be detected as soon as possible and (ii) properly managed – e.g., crawling tasks of the disconnected/crashed crawler re-assigned –
- resource-efficient, that is, the infrastructure should be able to execute smoothly on resource-constrained devices—e.g., RasPi systems
Usage of either the TuCSoN middleware for coordinating crawlers, or (ii) the JADE framework for programming crawlers is mandatory. Usage of both is considered a plus.
keywords
WebCrawler,MAS,Distributed,Fault-tolerance,Resource-efficient, Raspberry Pi, Tucson, Jade, T4J
references
outcomes