Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 677 Bytes

File metadata and controls

15 lines (11 loc) · 677 Bytes

#XYWYCrawler, crawler in action!

Description: This application is used to collect data from a website ( question list by day ) which records is more more than 100 million , so it necessary to take some strategies to ensure that all the data can been crawled in an accepted time. The strategies taken are as following:

Strategies

  • Multithreading
  • Multiprocessing
  • Redis as the task queue
  • RPC to share the message source
  • DBHelper to keep a connections pool
  • Message consumer running 4 machines

FAQ

Welcome to contact me @ hit_oak_tree@126.com to discuss this question together.