Scrapping consists in retrieve information through the standard web. Bots are programs or scripts (little pieces of executable code) with an specific purpose. Bots for scrapping used to be named crawlers or even spiders (because they move over the “web”). Crawling or scrapping are used like synonyms.
[googlebot is the most famous crawler in the world; it caches almost all webpages in the world]
Scrape data from internet and index it would be a part of Big Data and Data Analytics. i.e. It’s useful for social sentiment analysis, used in investment.
Python is a multi-purpose scripting language. I use it often. Python makes easy some actions which are quite complex in other programming languages.
Automation is like music for my ears. Yesterday I was trying to learn some phrasal verbs and I wanted a list of them. I founded a list with thousands of phrasal verbs with their corresponding description link.
How to retrieve the whole list? Easy, if I take the webpage I can parse the HTML getting only the verbs. I did it. Once done, I went far away: my program followed the thousands links to retrieve their meanings, examples and some notes. Now I have a huge list with all the information required in a tabular way and I could search for them off-line and filter separately if it is international, american or british english.
It only took me a few minutes to write the code, and a while during retrieve information from +2300 description pages. Now, I have it forever in a little file.
- You can download the complete phrasal verb list in ods format.
- You can see my scripting code I made.