Compute the number of unique entries in df.

return len(df.column_name.unique())

Search for cells with specific value

df[df['column_name'] == 'name']

Calculate the total number of missing values (they are NaN) in df.

df.isnull().sum().sum()

Combine 2 arrays

res_3 = np.concatenate((res_1, res_2))

Correct misspelled names

df_cars['name'] = df_cars['name'].str.replace('chevroelt|chevrolet|chevy','chevrolet')

Replace NaN value

df_cars.horsepower = df_cars.horsepower.str.replace('?','NaN').astype(float)

Fill missing value

meanhp = df_cars['horsepower'].mean()
df_cars['horsepower'] = df_cars['horsepower'].fillna(meanhp)

Create Dummy Variables

Values like ‘america’ cannot be read into an equation. So we create 3 simple true or false columns with titles equivalent to “Is this car America?”, “Is this care European?” and “Is this car Asian?”. These will be used as independent variables without imposing any kind of ordering between the three regions. Let’s apply the below code.

cData = pd.get_dummies(df_cars, columns=['origin'])
cData