Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am not from a data scientist background. I am from a statistical background. I am trying to work on data science with python on predicting 'will customers skip there scheduled appointment or not'. The data set which I have consisted of 10,000 observations, one of the columns is Neighborhood and it has different leaves in it. I wanted to remove those levels which appear less than 50 times in the column Neighborhood. I wanted to remove them because there are only a few levels that appear less than 50 times and these are misleading my model.

I try to achieve my desired results from the past few hours but I am not able to get the results that I wanted.

This is the code which I used to remove the levels in Neighborhood column

my_df = my_df.drop(my_df["Neighbourhood"].value_counts() < 50, axis = 0)

But i am getting the error:

KeyError: '[False False ...  True  True] not found in axis'

Can anyone help me solve this?

1 Answer

0 votes
by (36.8k points)

I have provided the code below, where I used the operator .loc to select the rows based on certain conditions. Here I am using the condition Neighbourhood with a high count.

counts = my_df['Neighborhood'].value_counts()

new_df = my_df.loc[my_df['Neighborhood'].isin(counts.index[counts > 50])]

.loc is a function used to access a particular group of rows or columns by using the name of the column.

.isin() is used to check each and every value of the data frame satisfies the condition inside the 'isin' function or not.

You can refer to the link Data Science where you can learn Data science from scratch and also can build projects with the help of a trainer.

Browse Categories

...