![]() |
Web scraper robot, illustration. |
Good morning! I hope you're in good shape and healthy. What I'm going to do now with Python is create a simple web scraper targeting this blog (my own blog) to extract the post datetime, title, and article snippet. Let's go, it will be fun!
Designing The Script
The intention is simple, I want to know which posts from "https://myportfolioreview13.blogspot.com/" have been published, along with their datetime and title, as well as a snippet of the article. This way, if any of them interest me, I will know which ones to read.
The scraped data will be saved in JSON file format for easy distribution, and I might want to process that data later in another program. Who knows?
![]() |
Scraped data will be saved in JSON file format. |
ScrapBS4 Class
Using the powerful libraries 'BeautifulSoup' and 'requests', I have created a simple script designed to scrape basic data from a specified URL. This script allows me to efficiently extract information, such as titles, links, and snippets of text, making the data collection process straightforward and effective.
Why should I use a class in my code? Utilizing a class allows me to encapsulate related data and functionality, promoting better organization and reusability of my code. By defining a class, I can create multiple instances of objects that share common properties and methods, which simplifies the development process and enhances maintainability. Moreover, using classes facilitates the implementation of object-oriented programming principles, such as inheritance and polymorphism, making my code more modular and easier to understand.
![]() |
Class of ScrapBS4. |
Looking For Data
A simple function designed to find and extract data from a given HTML link. This function, find_data, leverages the BeautifulSoup library, which is commonly used for web scraping in Python.
![]() |
The function to look/find the data. |
Debugging!!!
Now its time for debugging! The use of the if __name__ == '__main__': construct in Python scripts is a powerful way to control the execution of code. It allows developers to isolate debugging and testing processes while ensuring that the core functionality of their programs remains intact and reusable. In the context of web scraping, this design pattern not only enhances code clarity but also facilitates efficient data extraction from online sources.
![]() |
Debugging code. |
Finally, Resulted Data
As I mentioned earlier, the resulting data will be in JSON file format, structured as follows:
![]() |
The resulted data in JSON file format. |
End...
Thank you very much for visiting my blog! I hope you have a wonderful day, and I look forward to seeing you in the next article.