Page 4

Semester 2: Probability and Distributions

Random Experiment and Probability: Sample space, types of events, laws of probability, Bayes theorem
Random Experiment and Probability
- Sample Space
  Sample space refers to the set of all possible outcomes of a random experiment. It can be finite or infinite depending on the nature of the experiment. For instance, if a die is rolled, the sample space is {1, 2, 3, 4, 5, 6}. If two coins are tossed, the sample space is {HH, HT, TH, TT}. Sample space is denoted by S.
- Types of Events
  Events can be classified into several types: 1) Simple Events - Events that consist of a single outcome, such as rolling a 3 on a die. 2) Compound Events - Events that consist of multiple outcomes, such as rolling an even number. 3) Independent Events - Events where the outcome of one does not affect the other. 4) Dependent Events - Events where the outcome of one affects the other.
- Laws of Probability
  The laws of probability outline the rules that govern the likelihood of events occurring. Key rules include: 1) The Probability of an event is always between 0 and 1. 2) The sum of the probabilities of all possible outcomes in a sample space is 1. 3) For mutually exclusive events, the probability of either event occurring is the sum of their individual probabilities.
- Bayes Theorem
  Bayes theorem relates the conditional and marginal probabilities of random events. It provides a way to update the probability of a hypothesis based on new evidence. The formula is P(A|B) = [P(B|A) * P(A)] / P(B), where P(A|B) is the probability of event A given event B occurs.
Random variables: Discrete and continuous, distribution functions, expectation, moment generating functions
Random variables and distributions
- Random Variables
  Random variables are variables whose possible values are numerical outcomes of a random phenomenon. They can be classified into two main types: discrete and continuous.
- Discrete Random Variables
  Discrete random variables can take on a countable number of values. Examples include the number of heads in a series of coin tosses or the number of students in a classroom. The probability mass function (PMF) defines the probabilities for discrete variables.
- Continuous Random Variables
  Continuous random variables can take any value within a given range. Examples include height, weight or temperature. The probability density function (PDF) describes the distribution of continuous variables.
- Distribution Functions
  Distribution functions describe the probabilities of a random variable. For discrete variables, the cumulative distribution function (CDF) is used, which accumulates probabilities up to a certain value. For continuous variables, the CDF is the integral of the PDF.
- Expectation
  The expectation or expected value of a random variable is a measure of the center of the distribution. It is calculated by summing the products of each possible value and its corresponding probability for discrete variables, and by integrating the product of the variable and the PDF for continuous variables.
- Moment Generating Functions
  Moment generating functions (MGFs) are used to summarize all moments of a probability distribution. They help in calculating the expected value, variance, and higher moments. The MGF of a random variable is defined as the expected value of e^(tX), where t is a parameter and X is the random variable.
Discrete and Continuous Distributions: Binomial, Poisson, Normal, Exponential and related properties
Discrete and Continuous Distributions
- Binomial Distribution
  The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. It is characterized by two parameters: n (the number of trials) and p (the probability of success). The probability mass function is given by P(X=k) = (n choose k) * p^k * (1-p)^(n-k). This distribution is discrete.
- Poisson Distribution
  The Poisson distribution is used for modeling the number of events that occur in a fixed interval of time or space, under certain conditions (i.e., events occur independently). It is characterized by a single parameter λ (lambda), which represents the average number of events in the interval. The probability mass function is P(X=k) = (e^(-λ) * λ^k) / k!. This distribution is also discrete.
- Normal Distribution
  The normal distribution is a continuous probability distribution characterized by its symmetric bell-shaped curve. It is defined by two parameters: mean (μ) and standard deviation (σ). The probability density function is given by f(x) = (1 / (σ√(2π))) * e^(-0.5 * ((x-μ)/σ)^2). Notably, many phenomena in nature tend to follow this distribution due to the Central Limit Theorem.
- Exponential Distribution
  The exponential distribution is a continuous distribution used to model the time between events in a Poisson process. It is characterized by its rate parameter λ. The probability density function is f(x) = λ * e^(-λx) for x ≥ 0. This distribution is often used in survival analysis and reliability studies.
- Related Properties
  Various properties are associated with these distributions, such as their expectations and variances. For instance, the expected value of a binomial distribution is E(X) = n * p, and its variance is Var(X) = n * p * (1-p). The normal distribution has E(X) = μ and Var(X) = σ^2. Understanding these properties is crucial for statistical analysis.
Central Limit Theorem and Confidence Intervals: Hypothesis testing basics and classification
Central Limit Theorem and Confidence Intervals: Hypothesis testing basics and classification
- Central Limit Theorem (CLT)
  The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution, provided the samples are independent and identically distributed.
- Importance of CLT in Statistics
  CLT is crucial for making inferences about population parameters. It allows statisticians to use normal probability models for hypothesis testing and constructing confidence intervals even when the population distribution is not known.
- Confidence Intervals
  A confidence interval is a range of values, derived from a data set, that is likely to contain the value of a population parameter. It is expressed at a confidence level, such as 95% or 99%.
- Calculating Confidence Intervals
  To calculate a confidence interval for a population mean, use the formula: CI = x̄ ± z*(σ/√n), where x̄ is the sample mean, z is the z-score corresponding to the desired confidence level, σ is the population standard deviation, and n is the sample size.
- Hypothesis Testing Basics
  Hypothesis testing is a statistical method used to make decisions about population parameters based on sample data. It involves formulating a null hypothesis and an alternative hypothesis, conducting a test, and making a decision based on the p-value or confidence interval.
- Types of Hypothesis Tests
  Common types of hypothesis tests include t-tests, chi-square tests, and ANOVA. Each test has its own assumptions and is used in different contexts depending on the data and research questions.
- Classification in the Context of Hypothesis Testing
  Classification involves assigning items to predefined categories based on their features. In the context of hypothesis testing, machine learning algorithms can be used to classify data and test hypotheses about the relationships between variables.
Small Sample Tests: t-tests, F-tests, Chi-square tests for goodness of fit, independence, and homogeneity
Small Sample Tests: t-tests, F-tests, Chi-square tests for goodness of fit, independence, and homogeneity
- t-tests
  t-tests are used to compare means between two groups when the sample size is small. There are different types of t-tests: independent t-test, paired t-test, and one-sample t-test. The independent t-test compares means from two different groups, the paired t-test compares means from the same group at different times, and the one-sample t-test compares the sample mean to a known value.
- F-tests
  F-tests are used to compare the variances between two or more groups. In the context of small sample sizes, it often serves to determine if the group's variances are significantly different from one another. The F-distribution is utilized, and the test statistic is the ratio of the two variances.
- Chi-square tests
  Chi-square tests are non-parametric tests used to assess the association between categorical variables. The chi-square goodness of fit test determines if a sample distribution fits a population distribution. The chi-square test for independence assesses whether two categorical variables are independent or related. Chi-square tests can also determine homogeneity among different samples.
- Applications and Considerations
  In small sample tests, it is crucial to verify that the assumptions of the tests are met, such as normality and homogeneity of variance. The small sample size can affect the reliability and validity of the tests. It is often recommended to use non-parametric tests when sample sizes are extremely small or when assumptions are violated.

Page 4

Semester 2: Probability and Distributions

Random Experiment and Probability: Sample space, types of events, laws of probability, Bayes theorem

Random Experiment and Probability

Sample Space

Types of Events

Laws of Probability

Bayes Theorem

Random variables: Discrete and continuous, distribution functions, expectation, moment generating functions

Random variables and distributions

Random Variables

Discrete Random Variables

Continuous Random Variables

Distribution Functions

Expectation

Moment Generating Functions

Discrete and Continuous Distributions: Binomial, Poisson, Normal, Exponential and related properties

Discrete and Continuous Distributions

Binomial Distribution

Poisson Distribution

Normal Distribution

Exponential Distribution

Related Properties

Central Limit Theorem and Confidence Intervals: Hypothesis testing basics and classification

Central Limit Theorem and Confidence Intervals: Hypothesis testing basics and classification

Central Limit Theorem (CLT)

Importance of CLT in Statistics

Confidence Intervals

Calculating Confidence Intervals

Hypothesis Testing Basics

Types of Hypothesis Tests

Classification in the Context of Hypothesis Testing

Small Sample Tests: t-tests, F-tests, Chi-square tests for goodness of fit, independence, and homogeneity

Small Sample Tests: t-tests, F-tests, Chi-square tests for goodness of fit, independence, and homogeneity

t-tests

F-tests

Chi-square tests

Applications and Considerations

Probability and Distributions

M.Sc. Data Analytics

Probability and Distributions

2

Periyar University

23PDA04 Core 4