Python Project for Data Engineer
The aim is to extract the data from a website and transform the data as per your requirements. There are many ways in which you can get data from a website depending on your requirements. You can use web scraping frameworks such as Beautiful Soup or you can use APIs to make a call and get the data.
In this module, you will have a chance to use a currency exchange API. You will extract the data in the form of JSON, transform your data using Pandas in such a way that the exchange rates will be in reference to British Pounds instead of US Dollars.
You will also get a chance to use Beautiful Soup, requests frameworks to perform web scraping on a Wikipedia page.
Implementing web scraping
In this section, you will be using pandas, bs4 and requests libraries to get information from a Wikipedia page that contains the list of largest banks by market capitalization in the form of a table. You will get the table, and you will use pandas to perform any necessary transformations. Let us see the steps involved.
Step 1
Installing the required libraries. Since you will be performing this lab in a IBM lab environment, these libraries will be pre-installed. I also wanted to try this on my personal laptop. So, in order to have most of the libraries that a Data Engineer uses, I installed Anaconda to make my life easier. Alternatively, you could pip install individual libraries.