How to handle missing values in a dataset using pandas.
Data Before Processing
Name
Age
Salary
Join_Date
Alice
25
50000
2023-01-01
(missing)
NaN
60000
(missing)
Charlie
35
NaN
2021-07-30
David
-1
45000
2020-12-20
Eve
NaN
70000
(missing)
Code
import pandas as pd
import numpy as np
# Sample datadata ={"Name":["Alice",None,"Charlie","David","Eve"],"Age":[25,None,35,-1,None],"Salary":[50000,60000,None,45000,70000],"Join_Date":["2023-01-01",None,"2021-07-30","2020-12-20",None],}df = pd.DataFrame(data)# Handling missing values## 1. Fill missing values in strings with "Unknown"df["Name"]= df["Name"].fillna("Unknown")## 2. Fill missing values in Age with the mean (treat -1 as missing)df["Age"]= df["Age"].replace(-1, np.nan)df["Age"]= df["Age"].fillna(df["Age"].mean())## 3. Fill missing values in Salary with the mediandf["Salary"]= df["Salary"].fillna(df["Salary"].median())## 4. Fill missing dates with a specific default datedf["Join_Date"]= pd.to_datetime(df["Join_Date"])# Convert to datetimedf["Join_Date"]= df["Join_Date"].fillna(pd.Timestamp("2022-01-01"))# Display the resultprint(df)