Page 11

Semester 4: M.Sc. Biotechnology Syllabus 2023-2024

Statistics Scope collection, classification, tabulation of Statistical Data Diagrammatic representation graphs graph drawing graph paper plotted curve Sampling method and standard errors random sampling use of random numbers expectation of sample estimates means confidence limits standard errors variance
Statistics Scope and Collection
- Collection of Statistical Data
  The process of gathering quantitative and qualitative data from various sources. Techniques include surveys, experiments, and observational studies. Effective collection ensures accurate analysis.
- Classification of Data
  Data is organized into categories for analysis. The two main types are qualitative (categorical) and quantitative (numerical). Classification enhances data understanding and aids in comparison.
- Tabulation of Data
  The systematic arrangement of data in tables to summarize and facilitate analysis. Tables can be one-way or two-way, and they provide a clear overview of the data.
- Diagrammatic Representation
  Visual tools like graphs and charts are used to illustrate data relationships and trends. This method simplifies complex information for better comprehension.
- Graphs and Graph Drawing
  Graphs such as bar graphs, histograms, and pie charts visually represent data. Proper scaling and labeling are essential for accuracy.
- Graph Paper and Plotted Curves
  Graph paper is used for accurately plotting data points. Plotted curves help in visualizing trends within the data.
- Sampling Methods
  Sampling techniques allow data collection from a subset of the population. Major types include random sampling, systematic sampling, and stratified sampling.
- Standard Errors
  Standard error measures the accuracy of sample estimates. It indicates the extent to which a sample statistic is expected to vary from the population parameter.
- Random Sampling
  A technique where every individual has an equal chance of being selected. This method reduces bias and improves the representativeness of the sample.
- Use of Random Numbers
  Random numbers are utilized in sampling to ensure that selections are unbiased. Tools and software can generate random numbers for various applications.
- Expectation of Sample Estimates
  Estimation involves determining a population parameter from sample data. The expectation value provides a central tendency measure for sample estimates.
- Means and Confidence Limits
  The mean is a measure of central tendency, while confidence limits provide a range within which the true population parameter lies, based on sample data.
- Variance
  Variance measures the dispersion of a set of values. It quantifies how far each number in the dataset is from the mean and from each other.
Correlation and regression correlation table coefficient of correlation Z transformation regression relation between regression and correlation. Probability Markov chains applications Probability distributions Binomial Gaussian distribution and negative binomial, compound and multinomial distributions Poisson distribution
Correlation and Regression in Biostatistics
- Correlation Coefficient
  The correlation coefficient measures the strength and direction of the linear relationship between two variables. Ranges from -1 to 1. A value close to 1 indicates a strong positive correlation, while a value close to -1 indicates a strong negative correlation.
- Coefficient of Correlation (r)
  The coefficient of correlation (denoted as r) quantifies the degree of correlation between two variables. It is calculated using the formula r = cov(X,Y) / (σX * σY), where cov is covariance and σ is the standard deviation.
- Z Transformation in Correlation
  Z transformation standardizes data by converting scores to a common scale with a mean of 0 and standard deviation of 1. This is useful for comparing scores from different distributions.
- Regression Analysis
  Regression analysis is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. It helps in predicting outcomes and understanding relationships.
- Relation between Correlation and Regression
  While correlation measures the strength of a relationship, regression assesses how well one variable predicts another. Correlation does not imply causation, whereas regression is used to make causal inferences.
- Probability Concepts
  Probability is the measure of the likelihood that an event will occur. It forms the basis for statistical inference and decision-making.
- Markov Chains
  Markov chains are stochastic processes involving transitions from one state to another on a state space. They are used in various applications such as queueing theory and genetics.
- Probability Distributions
  Probability distributions describe how the values of a random variable are distributed. They are crucial in understanding the behavior of random variables.
- Binomial Distribution
  The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials. It is characterized by parameters n (number of trials) and p (probability of success).
- Gaussian Distribution
  The Gaussian distribution, or normal distribution, is symmetric and defined by its mean and standard deviation. Many biological phenomena and measurement errors follow this distribution.
- Negative Binomial Distribution
  The negative binomial distribution represents the number of failures before a specified number of successes occurs in a sequence of Bernoulli trials.
- Compound Distributions
  Compound distributions are formed by summing random variables, allowing for modeling of more complex phenomena where the total result depends on multiple sources.
- Multinomial Distribution
  The multinomial distribution generalizes the binomial distribution for scenarios with more than two possible outcomes. It is used in categorical data analysis.
- Poisson Distribution
  The Poisson distribution models the number of events that occur in a fixed interval of time or space. It is useful for count-based data, especially in biology and ecology.
Normal distribution graphic representation. frequency curve and its characteristics measures of central value, dispersion, coefficient of variation and methods of computation Basis of Statistical Inference Sampling Distribution Standard error Testing of hypothesis Null Hypothesis Type I and Type II errors
Normal distribution graphic representation
- Normal Distribution Overview
  Normal distribution is a continuous probability distribution characterized by its bell-shaped curve. It is defined by two parameters: mean and standard deviation.
- Graphic Representation
  The graphic representation of a normal distribution shows a symmetric curve centered around the mean. The area under the curve represents the total probability of the distribution.
- Frequency Curve
  A frequency curve is a smooth curve that represents the frequency of data points in a dataset, allowing for visual identification of trends within a normal distribution.
- Characteristics
- Measures of Central Value
  In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution.
- Measures of Dispersion
  Standard deviation is a key measure of dispersion in normal distribution, indicating how much data varies from the mean.
- Coefficient of Variation
  The coefficient of variation is the ratio of the standard deviation to the mean, useful for comparing variability between different datasets.
- Methods of Computation
  Methods include calculating mean and standard deviation directly from data, or using software for larger datasets.
- Basis of Statistical Inference
  Normal distribution forms the foundation for many statistical inference methods and is crucial for hypothesis testing.
- Sampling Distribution
  The sampling distribution is the distribution of sample means. When samples are taken from a population, the sampling distribution of the mean can often be approximated by a normal distribution.
- Standard Error
  The standard error measures how far the sample mean of the data is likely to be from the true population mean, decreasing as sample size increases.
- Testing of Hypothesis
  Statistical hypothesis testing utilizes normal distribution to determine the validity of a null hypothesis based on sample data.
- Null Hypothesis
  The null hypothesis is a statement that there is no effect or difference, serving as the basis for statistical testing.
- Type I and Type II Errors
Tests of significance for large and small samples based on Normal, t, z distributions with regard to mean, variance, proportions and correlation coefficient chi-square test of goodness of fit contingency tables c2 test for independence of two attributes Fisher and Behrens d test 22 table testing heterogeneity r X c table chi-square test in genetic experiments partition X 2 Emersons method
Tests of significance for large and small samples
- Normal Distribution
  The normal distribution is a continuous probability distribution characterized by its bell-shaped curve. Tests of significance using this distribution involve z-tests for large sample sizes (n > 30) to determine if sample means significantly differ from population means.
- t Distribution
  The t distribution is used for smaller sample sizes (n ≤ 30) and is characterized by its heavier tails compared to the normal distribution. t-tests are employed to compare sample means and assess significance in mean differences.
- Z Distribution
  Z tests are applicable when the population variance is known. The method involves calculating the z statistic to test hypotheses about population means or proportions.
- Variance Testing
  Tests like F-test assess the equality of variances from two populations. It is crucial in ANOVA tests to determine if means from different groups have significant deviations.
- Proportions
  Proportion tests, such as z-tests for proportions, evaluate if the observed proportion in a sample significantly differs from a hypothesized proportion.
- Correlation Coefficient
  The significance of the correlation coefficient (r) can be evaluated using statistical tests that determine if the observed correlation reflects a true relationship in the population.
- Chi-square Test of Goodness of Fit
  This test evaluates how well observed categorical data fit an expected distribution. It compares the frequencies of observed categories to those expected.
- Contingency Tables
  Contingency tables summarize the relationship between two categorical variables. The chi-square test assesses independence between these variables.
- Chi-square Test for Independence
  This procedure determines whether two categorical variables are independent by comparing observed frequencies with expected frequencies.
- Fisher's Exact Test
  Utilized for small sample sizes, Fisher's Exact Test provides a method for determining the significance of the association between two categorical variables.
- Behrens-Fisher Problem
  This challenge arises when comparing means from two different populations with unknown and possibly unequal variances. Adjustments through specific tests help overcome this.
- Heterogeneity Testing
  Utilized to assess whether observed differences across groups or studies are greater than would be expected by chance. Relevant for meta-analysis.
- X² Test in Genetic Experiments
  The chi-square test is useful in genetics for comparing expected frequencies with observed frequencies in trait inheritance.
- Emerson's Method
  A statistical method for genetic data analysis that accounts for complex inheritance patterns and provides a framework for hypothesis testing.
Tests of significance t tests F tests Analysis of variance one way classification Two way classification, CRD, RBD, LSD. Spreadsheets Data entry mathematical functions statistical function Graphics display printing spreadsheets use as a database word processes databases statistical analysis packages graphicspresentation packages
Tests of Significance
Used to determine if there is a significant difference between the means of two groups.
- Independent samples t-test
- Paired samples t-test
Commonly used in medical research, psychology, and other fields to compare groups.
Used to compare variances between two populations and assess the overall fit of a model.
Frequently used in ANOVA and regression analysis.
A statistical method used to determine if there are significant differences between the means of three or more groups.
Involves one independent variable or factor.
Used when comparing means across multiple groups.
Involves two independent variables or factors.
Enables understanding of interaction effects between factors.
An experimental design where subjects are randomly assigned to different treatments.
Ensures that treatment effects can be attributed to the treatments themselves.
An experimental design that blocks similar experimental units and randomizes treatments within blocks.
Used to reduce variability among experimental units.
A post-hoc test used after ANOVA to determine which means are significantly different.
Helps in identifying specific group differences.
Input of data into a spreadsheet application.
Excel, Google Sheets.
Functions to perform basic arithmetic and calculations.
SUM, AVERAGE, COUNT.
Functions that provide statistical analyses.
AVERAGE, STDEV, TTEST.
Visualization tools within spreadsheets.
Charts, graphs, and plots.
Outputting spreadsheet data in printed form.
Reports, presentations.
Utilizing spreadsheets for database functions.
Storing, sorting, and analyzing data.
Creating and editing text documents.
Reports, documentation, and essays.
Structured collections of data.
Used for data management and retrieval.
Software specifically designed for statistical analysis.
SPSS, R, SAS.
Tools for creating visual representations of data.
PowerPoint, Canva.

Page 11

Semester 4: M.Sc. Biotechnology Syllabus 2023-2024

Statistics Scope and Collection

Collection of Statistical Data

Classification of Data

Tabulation of Data

Diagrammatic Representation

Graphs and Graph Drawing

Graph Paper and Plotted Curves

Sampling Methods

Standard Errors

Random Sampling

Use of Random Numbers

Expectation of Sample Estimates

Means and Confidence Limits

Variance

Correlation and Regression in Biostatistics

Correlation Coefficient

Coefficient of Correlation (r)

Z Transformation in Correlation

Regression Analysis

Relation between Correlation and Regression

Probability Concepts

Markov Chains

Probability Distributions

Binomial Distribution

Gaussian Distribution

Negative Binomial Distribution

Compound Distributions

Multinomial Distribution

Poisson Distribution

Normal distribution graphic representation

Normal Distribution Overview

Graphic Representation

Frequency Curve

Characteristics

Measures of Central Value

Measures of Dispersion

Coefficient of Variation

Methods of Computation

Basis of Statistical Inference

Sampling Distribution

Standard Error

Testing of Hypothesis

Null Hypothesis

Type I and Type II Errors

Tests of significance for large and small samples

Normal Distribution

t Distribution

Z Distribution

Variance Testing

Proportions

Correlation Coefficient

Chi-square Test of Goodness of Fit

Contingency Tables

Chi-square Test for Independence

Fisher's Exact Test

Behrens-Fisher Problem

Heterogeneity Testing

X² Test in Genetic Experiments

Emerson's Method

Tests of Significance

Used to determine if there is a significant difference between the means of two groups.

Commonly used in medical research, psychology, and other fields to compare groups.

Used to compare variances between two populations and assess the overall fit of a model.

Frequently used in ANOVA and regression analysis.

A statistical method used to determine if there are significant differences between the means of three or more groups.

Involves one independent variable or factor.

Used when comparing means across multiple groups.

Involves two independent variables or factors.

Enables understanding of interaction effects between factors.

An experimental design where subjects are randomly assigned to different treatments.

Ensures that treatment effects can be attributed to the treatments themselves.

An experimental design that blocks similar experimental units and randomizes treatments within blocks.

Used to reduce variability among experimental units.

A post-hoc test used after ANOVA to determine which means are significantly different.

Helps in identifying specific group differences.

Input of data into a spreadsheet application.

Excel, Google Sheets.

Functions to perform basic arithmetic and calculations.