![]() |
Web scraper illustration |
Hello everyone! Welcome to my blog. I hope you are all in good health. Today, I’m
excited to introduce a new tool: an image downloader that offers flexible XPath configuration. This means you can
easily adapt the program to work with different web pages.
While it’s a simple program, it has been
incredibly helpful for me when I need to save a large number of images from the web. The automation is highly
efficient, as it saves me the time of manually opening and saving each image. I can focus on other tasks while the
program continues to crawl and download images.
Let me show you how it works—let's get
started!
Program Design
![]() |
Program design |
For this simple program, I chose not to use the MVC design pattern. Instead, I kept it
as a single file without any external modules to maintain simplicity and save time on the design process.
Here, you can see the storage class, which contains the default configuration and several methods.
![]() |
Storage class |
JSON Standard For XPath Values
The initial idea behind writing the code was simplicity. However, I realized that once
I convert the program to binary, it will be impossible to change the XPath for different web pages.
Using The requests_html Module As The Page Scraper
I can achieve the same result using Beautiful Soup (bs4) along with the etree class, but requests_html is simpler and works directly without the need for etree.
![]() |
Implementation of requests_html in the Code |
Progam Interface
As I mentioned earlier, I prefer a text-based interface because it is highly efficient. It allows programmers to focus solely on the code and the functionality of the program without getting distracted, ultimately saving time and effort.
![]() |
Program interface |
Debugging Time
I believe it took me just half a day to create this program, including research and related tasks. It didn't require much time because the program doesn't need validations, an MVC structure, or complex logic.
For example I want to download all images from this post: https://myportfolioreview13.blogspot.com/2024/10/linux-os-hardening-your-ssh-server.html,
![]() |
Download process |
This is the process for automatically downloading images one by one. The results will be saved in the default location of the program.
![]() |
The images are all downloaded |
The Program Is Converted To Binary For Easy Distribution
By converting it to binary, I can use it on another VM or device without needing to install any additional modules, as all the necessary modules are already bundled with the binary program.
![]() |
Program in binary format |
You can watch the complete demo in the video below:
End...
That's all for now. I'm grateful for this opportunity to share with you. I hope to see you again soon. Good night!