.st0{fill:#FFFFFF;}

Scaping Web Data Using Power BI 

 August 22, 2020

By  Christopher Bird

If you have not heard of web scrapping, it is the term used to extract data from websites. You may have visited a website and they kindly provide data tables for you to download. On other occasions, you may have thought. As you manually copied and pasted the data into your database or spreadsheet there must be a better way.

Warning: Before extracting any data, check you won't be breaking any copyright infringement rules. 

Fortunately, there is an array of software tools available. 

Sequentum (desktop or server) aimed at the enterprise market. Pricing you need to contact Sequentum. I have no experience with this software. The fact they will not provide pricing on their website suggests it is not for everyone. 

Data Miner, a Chrome browser extension. It is free if you scrape below 500 pages a month. I have used this tool and found it to be intuitive and extracts the data required without too much effort. 

Dexio is a suite of software to meet all your extraction transform, load (ETL) needs. Again, frightened to provide pricing and I have no experience of using Dexio. 

UniPath provides robotic (algorithm) automation  process software. An example job candidate submits their resume, a robot automates resume data extraction. Most end-users should be able to use this software. 

Web Scraper is a free browser extension with paid options. To collect web data, it has a point and clicks interface. They call this a scraper. Once you have configured a scraper and ran your scraper you can download the data in CSV format. 

Web scraping R (open source) used by data scientists. R requires coding knowledge. I have used this with Microsoft Power BI and found R to be robust. R objective is to process large data sets. This you just could not do with other software other than Python.

Web scraping in Python (open source). Again a professional data extraction tool mostly used by data scientists.

To use either R or Python you would have to invest time learning how. Without question R and Python are the most robust solutions but not for everyone. 

Web scraping with Power BI, Power Query

As an example, let’s suppose I wanted to extract data from Amazon. I seek books to do with ‘collect combine and transform data.’ I require the book title. Author, the number of reviews, average rating and published date.

Power Query

Open Microsoft Power BI. Click the Get Data icon to reveal the drop-down and select web. 

Web scraping

Paste the URL

https://www.amazon.co.uk/s?k=collect+combine+and+transform+data&i=stripbooks&crid=2O64DMA4JS8G8&sprefix=Collect+Combine+and+%2Caps%2C158&ref=nb_sb_ss_fb_1_20

Amazon Book URL

You should see the next window. Click ‘Add Table Using Examples.’

MS Power BI

You should see the example window below. 

Web scraping

Tip: Maximise the window.

In the top half of the window you should be able to scroll through the list to view the available books.

In the bottom half of the window. On the left you should see a table with one column. Named ‘column 1.’ Double click the ‘column 1.’ Rename the column to ‘Title.’ Below the title row click the empty row. Type the first few letters of the book title. In this case ‘Learn Microsoft PowerApps.’ You should not need to type as many letters. Before the title becomes obvious to you. Click book title to add to the column. Add the next book title. On rare occasions you may need to add another book title. To have the column fully populate with webpage book titles.

MS Power Query table

On the right next to ‘Title’ column is a thin column with an asterisk. Click to add another column. Name the column ‘Author.’ Repeat the previous steps you did to add book title but this time author name. Repeat the same steps to add ‘Average Rating’ and ‘Release Date.’ 

You should find the table rows self-populate. 

Once you completed click ‘OK.’ 

data mining

Power BI should open with the table populated. If you do not see the table. On the left hand of the screen you should see three icons. Click the table icon see below. 

Data Mining

You can now manipulate the data to suit your needs see below. 

Conclusion

Power Query offers you a lot of flexibility and for many Power Query is all they will ever need. 


__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"ee969":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"ee969":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"ee969":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"ee969":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"ee969":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"ee969":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article

Christopher Bird


Building your own Power App, BI solution, or automated workflow can be a mind-blowing experience. It can also be a nightmare. Particularly when you begin with a blank screen. My advice, get professional help as and when you need it. That's what successful people do.

Christopher Bird

related posts:



{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Get in touch

>