Selection Bias
Selection bias is the bias that gets introduced when true randomization in a trial is not achieved. This leads to selecting a sample of the population that is looking to be studied, which is not the true representation of a population.
An example of selection bias can be explained through the following example:
Let us assume, a company was looking to determine what percentage of people in the city like watching movies in cinema halls. For understanding this, the company will have to run a survey and perform an analysis of the survey results to determine this value. Therefore, they decide to run the survey in a Cinema Hall close to their company. They gather about 1000 responses and believe this number is more than enough for them to be able to tell with certainty the percentage of people who like watching movies in Cinema Halls.
That might be true, a survey of 1000 people is a pretty good number. However, the problem here is all of those 1000 people whose views were collected on the issue in question were people who were watching a movie in a cinema hall. Therefore, it is very clear that their views are biased towards liking movies in Cinema Halls. The bigger picture of this issue is that people who go to Cinema Halls to watch movies is not an accurate representation of the whole city's population.
The issue of selection bias can present itself in various scenarios and it is almost impossible to get a perfect sample of items/data that represent the whole population. The aim is to remove as much of the selection bias as possible by trying to introduce as much of the population's features into the sample population.
Fact: Patterns in the dataset is usually not the same. It is hard to generalize a pattern from the dataset. This means that this data does not reflect the nature of the whole population, which is why we learn wrong patterns from data that leads to misleading conclusions.
Last updated