> For the complete documentation index, see [llms.txt](https://akm5630.gitbook.io/understanding-causal-inference/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://akm5630.gitbook.io/understanding-causal-inference/bias/selection-bias.md).

# Selection Bias

**Selection bias** is the bias that gets introduced when true randomization in a trial is not achieved. This leads to selecting a sample of the population that is looking to be studied, which is not the true representation of a population.

* **An example of selection bias** can be explained through the following example:

Let us assume, a company was looking to determine what percentage of people in the city like watching movies in cinema halls. For understanding this, the company will have to run a survey and perform an analysis of the survey results to determine this value. Therefore, they decide to run the survey in a Cinema Hall close to their company. They gather about 1000 responses and believe this number is more than enough for them to be able to tell with certainty the percentage of people who like watching movies in Cinema Halls.

That might be true, a survey of 1000 people is a pretty good number. However, the problem here is all of those 1000 people whose views were collected on the issue in question were people who were watching a movie in a cinema hall. Therefore, it is very clear that their views are biased towards liking movies in Cinema Halls. The bigger picture of this issue is that people who go to Cinema Halls to watch movies is not an accurate representation of the whole city's population.

The issue of selection bias can present itself in various scenarios and it is almost impossible to get a perfect sample of items/data that represent the whole population. The aim is to remove as much of the selection bias as possible by trying to introduce as much of the population's features into the sample population.

**Fact**: Patterns in the dataset is usually not the same. It is hard to generalize a pattern from the dataset. This means that this data does not reflect the nature of the whole population, which is why we learn wrong patterns from data that leads to misleading conclusions.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://akm5630.gitbook.io/understanding-causal-inference/bias/selection-bias.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
