Data,
read carefully.

I'm Shoug Aljedani — a data analyst who turns messy tables and noisy logs into clear answers. I work in Python, SQL, Tableau, and Power BI.

Toolkit

Languages · Analysis · Visual

Languages 01

  • Python
  • SQL (Postgres)

Analysis 02

  • pandas
  • NumPy
  • scikit-learn
  • SciPy

Visual 03

  • Tableau
  • Power BI
  • Matplotlib
  • Seaborn

Also 04

  • Excel
  • NLTK
  • Jupyter
  • Git

Projects

Five selected case studies
01Python · pandas

US Bikeshare Explorer

Python pandas NumPy

Problem

Bikeshare data for Chicago, New York, and Washington lived in three separate CSVs and answered no question on its own. The brief: build an interactive tool that lets a non-technical user pick a city, a month, and a day, and get back the patterns hidden in millions of rides.

Approach

I built a Python program around pandas. It parses each city's CSV, converts Start Time into datetime, and engineers month, weekday, and hour columns up front so every later query is one filter away. The user is walked through three prompts; the program then computes most-common travel times, top stations and station pairs, total and mean trip duration, and rider-type breakdowns. A paged-display loop lets the user inspect raw rows five at a time.

Code structure

def get_filters(): # prompts user for city, month, day def load_data(city, month, day): # read CSV, build time features, apply filters def time_stats(df): # most common month / day / hour def station_stats(df): # top start, end, and trip pairs def trip_duration_stats(df): # total + mean travel time def user_stats(df): # user type / gender / birth-year breakdown def display_data(df): # paged raw-row preview, 5 rows at a time

Insight

Usage shape varies sharply by city. The two larger systems show clean commuter peaks at 8am and 5pm on weekdays; Washington's hour distribution is flatter and weekend-heavy, consistent with tourist use. The most common station pairs cluster around financial districts on weekdays and around parks on weekends — the same bikes serve two different cities depending on the day.

02SQL · Postgres

DVD Rental Database Investigation

SQL Window Functions PostgreSQL Reporting

Problem

Two business questions from a fictional video-rental chain: what are families actually renting, and how do the two stores compare month-on-month? The data lived across six normalised tables — rentals, inventory, films, categories, staff, stores.

Approach

I wrote a series of PostgreSQL queries joining the six tables and filtering to six family-friendly categories (Animation, Children, Classics, Comedy, Family, Music). A NTILE(4) window function bucketed each film's rental duration into quartiles, then a second outer query rolled the per-film result up into a category-by-quartile count table. A separate query used DATE_PART to compare per-store rentals across months in summer 2005.

Insight

Animation led family rentals at 1,166 films; Music was the lowest at 830. Family and Animation both took the longest rental durations (Q4) — these are the films households keep for a week. Between the two stores, the difference in any given month was tiny (≈25–98 rentals), but both stores dropped roughly 900 rentals from July to August — a seasonal signal worth investigating, not a store-quality difference.

03Tableau

US Flight Delays, Mapped

Tableau Geospatial Dashboarding Accessibility

Problem

One year of US domestic flight data (2015) — thousands of routes, hundreds of airports, every airline. The viewer wants to know, in under a minute, where delays cluster, which carriers cancel the most, and when.

Approach

I built a three-view Tableau dashboard. The first is a US map where each state is a circle: size encodes departure delay, colour encodes arrival delay, so the eye gets both dimensions at once. The second is a horizontal bar chart of arrival vs. departure delay per airline. The third splits cancellations into a pie by carrier and a month-by-month bar — the same data, two granularities. The palette is drawn from a colour-blind-safe ramp so the encoding survives a deuteranopic viewer.

Insight

Three carriers — Southwest, Atlantic Southeast, and American Eagle — account for the bulk of cancellations. Cancellations are sharply seasonal: February alone had 1,058 cancelled flights, more than any summer month. Departure and arrival delays correlate strongly at the state level but diverge for a handful of southern airports where ground delays don't translate into late arrivals.

View on Tableau Public
04Power BI

Imports & Exports Dashboard

Power BI Bilingual Executive Reporting

Problem

A trade business needed one screen showing where their imports come from, what they sell, how revenue moves year-on-year, and how much of each invoice is still outstanding. The deck had to read natively in Arabic and be usable by executives who don't write SQL.

Approach

I built the dashboard in Power BI with cross-filtering wired between every visual: a country pie chart, a bar of top countries by net revenue, an annual trend line, and a 100%-stacked bar splitting paid vs. outstanding amounts by category. A detail table breaks revenue down by product type. The whole deck is right-to-left, Arabic-typeset, with sliceable category buttons across the top.

Insight

Total revenue across the period was ~SAR 320M spread over nine export destinations. China led on both volume and revenue. Home appliances had the cleanest payment cycle — only 33% outstanding versus 36% for office supplies. Annual revenue peaked in 2017 and dropped sharply in 2018; that one fact is the single most important thing on the dashboard, and it's where the eye lands first by design.

05scikit-learn

Predicting Sales from Ad Spend

Machine Learning Linear Regression scikit-learn

Problem

A 200-observation dataset of TV, radio, and newspaper ad budgets paired with weekly sales. The question was simple but the kind that decides budgets: which channel actually moves the needle per dollar?

Approach

I split the data 75/25 with train_test_split, fit a multiple linear regression on the three ad channels, and evaluated on the held-out set with RMSE. I read the coefficients in their original units, not standardised — because in this case the units are dollars and the audience cares about them.

Insight

The model fit cleanly (RMSE ≈ 1.40 on sales values ranging roughly 3–25). The coefficient on radio (0.179) was an order of magnitude larger than TV (0.003) and roughly 4× newspaper (0.047). Read plainly: an extra dollar of radio budget is associated with much more incremental sales than the same dollar elsewhere — at least within the range observed. Worth A/B-testing in a real campaign before reallocating.