Python Project for Data Engineer

Module Overview

Krishna
4 min readApr 30, 2022

The aim is to extract the data from a website and transform the data as per your requirements. There are many ways in which you can get data from a website depending on your requirements. You can use web scraping frameworks such as Beautiful Soup or you can use APIs to make a call and get the data.

In this module, you will have a chance to use a currency exchange API. You will extract the data in the form of JSON, transform your data using Pandas in such a way that the exchange rates will be in reference to British Pounds instead of US Dollars.

You will also get a chance to use Beautiful Soup, requests frameworks to perform web scraping on a Wikipedia page.

Implementing web scraping

In this section, you will be using pandas, bs4 and requests libraries to get information from a Wikipedia page that contains the list of largest banks by market capitalization in the form of a table. You will get the table, and you will use pandas to perform any necessary transformations. Let us see the steps involved.

Step 1

Installing the required libraries. Since you will be performing this lab in a IBM lab environment, these libraries will be pre-installed. I also wanted to try this on my personal laptop. So, in order to have most of the libraries that a Data Engineer uses, I installed Anaconda to make my life easier. Alternatively, you could pip install individual libraries.

--

--

Krishna
Krishna

Responses (2)