Web Scraping with Python (Pandas and Matplotlib)

Best companies in Germany – Revenue & Market Cap

Overview

Data Source
: Wikipedia page listing major German companies by revenue and Global Rank lists for Market Cap.
ETL Process: Extraction of company data, followed by transformation for cleaning and formatting, particularly handling special characters and financial figures.

Features

Data Cleaning: Conversion of string-formatted financial data into numeric types, handling negative values and special characters.
Data Visualization: Graphical representation of key data points, like profits, with features such as bar graphs displaying top companies.


I used Jupyter Notebooks that serves as a practical tool for data extraction and visualization, demonstrating essential Python techniques for data analysis.

Because I wanted to keep this Jupyter notebook format, the content is not adapted to mobile views.

Revenue


The revenue number is the income a company generates before any expenses are taken out. Therefore, when a company has top-line growth, the company is experiencing an increase in gross sales or revenue.

Both revenue and net income are useful in determining the financial strength of a company, but they are not interchangeable. Revenue only indicates how effective a company is at generating sales and revenue and does not take into consideration operating efficiencies which could have a dramatic impact on the bottom line. Reference

Market Cap


Market cap—or market capitalization—refers to the total value of all a company’s shares of stock. It is calculated by multiplying the price of a stock by its total number of outstanding shares. For example, a company with 20 million shares selling at $50 a share would have a market cap of $1 billion. Reference

SQL

Data Cleansing & Transformation


This SQL query is designed to retrieve and present specific financial information about German companies from two tables in your database. Here’s a breakdown of what each part of the query does:

  1. SELECT Clause: This is the part of the query that specifies the columns to be displayed in the final result. The columns included are:
    • t1.Name: The name of the company.
    • CAST(t2.revenue_eur AS FLOAT) AS Revenue_EUR: The revenue in euros, converted to a float type.
    • CAST(t1.market_cap_eur AS FLOAT) AS Market_Cap_EUR: The market capitalization in euros, converted to a float type.
    • CAST(t2.revenue_usd AS FLOAT) AS Revenue_USD: The revenue in USD, converted to a float type.
    • CAST(t1.market_cap_usd AS FLOAT) AS Market_Cap_USD: The market capitalization in USD, converted to a float type.
    • ROUND(CAST(t2.revenue_eur AS FLOAT) / NULLIF(CAST(t1.market_cap_eur AS FLOAT), 0), 2) AS RATIO_EUR: A calculated ratio of revenue to market cap in euros, rounded to two decimal places. The NULLIF function is used to avoid division by zero.
    • ROUND(CAST(t2.revenue_usd AS FLOAT) / NULLIF(CAST(t1.market_cap_usd AS FLOAT), 0), 2) AS RATIO_USD: A similar ratio as above, but for USD.
    • t2.Headquarters AS HQ: The location of the company’s headquarters.
    • t2.Industry: The industry in which the company operates.
  2. FROM Clause: This specifies the source tables of the data.
    • [filipedb].[dbo].[companies_marketcap] AS t1: Refers to a table containing market capitalization data, aliased as t1.
    • [filipedb].[dbo].[companies_revenue] AS t2: Refers to a table containing revenue data, aliased as t2.
  3. JOIN Clause: This part of the query combines the two tables t1 and t2 based on a common column.
    • ON t1.NAME = t2.NAME: This condition specifies that the join should be made where the NAME column in table t1 matches the NAME column in table t2. Essentially, it pairs the market cap and revenue data for the same company.

In summary, this query joins two tables, companies_marketcap and companies_revenue, based on the company name.

It then selects and processes certain financial data, including revenue and market cap in both EUR and USD, along with calculated ratio, and information about the headquarters and industry of each company. The result is a comprehensive dataset that combines your web-scraped data into a meaningful format for analysis.

SELECT 
    t1.Name,
    CAST(t2.revenue_eur AS FLOAT) AS Revenue_EUR,
    CAST(t1.market_cap_eur AS FLOAT) AS Market_Cap_EUR,
    CAST(t2.revenue_usd AS FLOAT) AS Revenue_USD,
    CAST(t1.market_cap_usd AS FLOAT) AS Market_Cap_USD,
    ROUND(CAST(t2.revenue_eur AS FLOAT) / NULLIF(CAST(t1.market_cap_eur AS FLOAT), 0), 2) AS RATIO,
    t2.Headquarters AS HQ,
    t2.Industry
FROM 
    [filipedb].[dbo].[companies_marketcap] AS t1
    JOIN [filipedb].[dbo].[companies_revenue] AS t2 ON t1.NAME = t2.NAME
ORDER BY RATIO DESC;

Result Ordered by Ratio