Page 11
Semester 4: M.Sc. Biotechnology Syllabus 2023-2024
Statistics Scope collection, classification, tabulation of Statistical Data Diagrammatic representation graphs graph drawing graph paper plotted curve Sampling method and standard errors random sampling use of random numbers expectation of sample estimates means confidence limits standard errors variance
Statistics Scope and Collection
Collection of Statistical Data
The process of gathering quantitative and qualitative data from various sources. Techniques include surveys, experiments, and observational studies. Effective collection ensures accurate analysis.
Classification of Data
Data is organized into categories for analysis. The two main types are qualitative (categorical) and quantitative (numerical). Classification enhances data understanding and aids in comparison.
Tabulation of Data
The systematic arrangement of data in tables to summarize and facilitate analysis. Tables can be one-way or two-way, and they provide a clear overview of the data.
Diagrammatic Representation
Visual tools like graphs and charts are used to illustrate data relationships and trends. This method simplifies complex information for better comprehension.
Graphs and Graph Drawing
Graphs such as bar graphs, histograms, and pie charts visually represent data. Proper scaling and labeling are essential for accuracy.
Graph Paper and Plotted Curves
Graph paper is used for accurately plotting data points. Plotted curves help in visualizing trends within the data.
Sampling Methods
Sampling techniques allow data collection from a subset of the population. Major types include random sampling, systematic sampling, and stratified sampling.
Standard Errors
Standard error measures the accuracy of sample estimates. It indicates the extent to which a sample statistic is expected to vary from the population parameter.
Random Sampling
A technique where every individual has an equal chance of being selected. This method reduces bias and improves the representativeness of the sample.
Use of Random Numbers
Random numbers are utilized in sampling to ensure that selections are unbiased. Tools and software can generate random numbers for various applications.
Expectation of Sample Estimates
Estimation involves determining a population parameter from sample data. The expectation value provides a central tendency measure for sample estimates.
Means and Confidence Limits
The mean is a measure of central tendency, while confidence limits provide a range within which the true population parameter lies, based on sample data.
Variance
Variance measures the dispersion of a set of values. It quantifies how far each number in the dataset is from the mean and from each other.
Correlation and regression correlation table coefficient of correlation Z transformation regression relation between regression and correlation. Probability Markov chains applications Probability distributions Binomial Gaussian distribution and negative binomial, compound and multinomial distributions Poisson distribution
Correlation and Regression in Biostatistics
Correlation Coefficient
The correlation coefficient measures the strength and direction of the linear relationship between two variables. Ranges from -1 to 1. A value close to 1 indicates a strong positive correlation, while a value close to -1 indicates a strong negative correlation.
Coefficient of Correlation (r)
The coefficient of correlation (denoted as r) quantifies the degree of correlation between two variables. It is calculated using the formula r = cov(X,Y) / (σX * σY), where cov is covariance and σ is the standard deviation.
Z Transformation in Correlation
Z transformation standardizes data by converting scores to a common scale with a mean of 0 and standard deviation of 1. This is useful for comparing scores from different distributions.
Regression Analysis
Regression analysis is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. It helps in predicting outcomes and understanding relationships.
Relation between Correlation and Regression
While correlation measures the strength of a relationship, regression assesses how well one variable predicts another. Correlation does not imply causation, whereas regression is used to make causal inferences.
Probability Concepts
Probability is the measure of the likelihood that an event will occur. It forms the basis for statistical inference and decision-making.
Markov Chains
Markov chains are stochastic processes involving transitions from one state to another on a state space. They are used in various applications such as queueing theory and genetics.
Probability Distributions
Probability distributions describe how the values of a random variable are distributed. They are crucial in understanding the behavior of random variables.
Binomial Distribution
The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials. It is characterized by parameters n (number of trials) and p (probability of success).
Gaussian Distribution
The Gaussian distribution, or normal distribution, is symmetric and defined by its mean and standard deviation. Many biological phenomena and measurement errors follow this distribution.
Negative Binomial Distribution
The negative binomial distribution represents the number of failures before a specified number of successes occurs in a sequence of Bernoulli trials.
Compound Distributions
Compound distributions are formed by summing random variables, allowing for modeling of more complex phenomena where the total result depends on multiple sources.
Multinomial Distribution
The multinomial distribution generalizes the binomial distribution for scenarios with more than two possible outcomes. It is used in categorical data analysis.
Poisson Distribution
The Poisson distribution models the number of events that occur in a fixed interval of time or space. It is useful for count-based data, especially in biology and ecology.
Normal distribution graphic representation. frequency curve and its characteristics measures of central value, dispersion, coefficient of variation and methods of computation Basis of Statistical Inference Sampling Distribution Standard error Testing of hypothesis Null Hypothesis Type I and Type II errors
Normal distribution graphic representation
Normal Distribution Overview
Normal distribution is a continuous probability distribution characterized by its bell-shaped curve. It is defined by two parameters: mean and standard deviation.
Graphic Representation
The graphic representation of a normal distribution shows a symmetric curve centered around the mean. The area under the curve represents the total probability of the distribution.
Frequency Curve
A frequency curve is a smooth curve that represents the frequency of data points in a dataset, allowing for visual identification of trends within a normal distribution.
Characteristics
Measures of Central Value
In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution.
Measures of Dispersion
Standard deviation is a key measure of dispersion in normal distribution, indicating how much data varies from the mean.
Coefficient of Variation
The coefficient of variation is the ratio of the standard deviation to the mean, useful for comparing variability between different datasets.
Methods of Computation
Methods include calculating mean and standard deviation directly from data, or using software for larger datasets.
Basis of Statistical Inference
Normal distribution forms the foundation for many statistical inference methods and is crucial for hypothesis testing.
Sampling Distribution
The sampling distribution is the distribution of sample means. When samples are taken from a population, the sampling distribution of the mean can often be approximated by a normal distribution.
Standard Error
The standard error measures how far the sample mean of the data is likely to be from the true population mean, decreasing as sample size increases.
Testing of Hypothesis
Statistical hypothesis testing utilizes normal distribution to determine the validity of a null hypothesis based on sample data.
Null Hypothesis
The null hypothesis is a statement that there is no effect or difference, serving as the basis for statistical testing.
Type I and Type II Errors
Tests of significance for large and small samples based on Normal, t, z distributions with regard to mean, variance, proportions and correlation coefficient chi-square test of goodness of fit contingency tables c2 test for independence of two attributes Fisher and Behrens d test 22 table testing heterogeneity r X c table chi-square test in genetic experiments partition X 2 Emersons method
Tests of significance for large and small samples
Normal Distribution
The normal distribution is a continuous probability distribution characterized by its bell-shaped curve. Tests of significance using this distribution involve z-tests for large sample sizes (n > 30) to determine if sample means significantly differ from population means.
t Distribution
The t distribution is used for smaller sample sizes (n ≤ 30) and is characterized by its heavier tails compared to the normal distribution. t-tests are employed to compare sample means and assess significance in mean differences.
Z Distribution
Z tests are applicable when the population variance is known. The method involves calculating the z statistic to test hypotheses about population means or proportions.
Variance Testing
Tests like F-test assess the equality of variances from two populations. It is crucial in ANOVA tests to determine if means from different groups have significant deviations.
Proportions
Proportion tests, such as z-tests for proportions, evaluate if the observed proportion in a sample significantly differs from a hypothesized proportion.
Correlation Coefficient
The significance of the correlation coefficient (r) can be evaluated using statistical tests that determine if the observed correlation reflects a true relationship in the population.
Chi-square Test of Goodness of Fit
This test evaluates how well observed categorical data fit an expected distribution. It compares the frequencies of observed categories to those expected.
Contingency Tables
Contingency tables summarize the relationship between two categorical variables. The chi-square test assesses independence between these variables.
Chi-square Test for Independence
This procedure determines whether two categorical variables are independent by comparing observed frequencies with expected frequencies.
Fisher's Exact Test
Utilized for small sample sizes, Fisher's Exact Test provides a method for determining the significance of the association between two categorical variables.
Behrens-Fisher Problem
This challenge arises when comparing means from two different populations with unknown and possibly unequal variances. Adjustments through specific tests help overcome this.
Heterogeneity Testing
Utilized to assess whether observed differences across groups or studies are greater than would be expected by chance. Relevant for meta-analysis.
X² Test in Genetic Experiments
The chi-square test is useful in genetics for comparing expected frequencies with observed frequencies in trait inheritance.
Emerson's Method
A statistical method for genetic data analysis that accounts for complex inheritance patterns and provides a framework for hypothesis testing.
Tests of significance t tests F tests Analysis of variance one way classification Two way classification, CRD, RBD, LSD. Spreadsheets Data entry mathematical functions statistical function Graphics display printing spreadsheets use as a database word processes databases statistical analysis packages graphicspresentation packages
Tests of Significance
Used to determine if there is a significant difference between the means of two groups.
Independent samples t-test
Paired samples t-test
Commonly used in medical research, psychology, and other fields to compare groups.
Used to compare variances between two populations and assess the overall fit of a model.
Frequently used in ANOVA and regression analysis.
A statistical method used to determine if there are significant differences between the means of three or more groups.
Involves one independent variable or factor.
Used when comparing means across multiple groups.
Involves two independent variables or factors.
Enables understanding of interaction effects between factors.
An experimental design where subjects are randomly assigned to different treatments.
Ensures that treatment effects can be attributed to the treatments themselves.
An experimental design that blocks similar experimental units and randomizes treatments within blocks.
Used to reduce variability among experimental units.
A post-hoc test used after ANOVA to determine which means are significantly different.
Helps in identifying specific group differences.
Input of data into a spreadsheet application.
Excel, Google Sheets.
Functions to perform basic arithmetic and calculations.
SUM, AVERAGE, COUNT.
Functions that provide statistical analyses.
AVERAGE, STDEV, TTEST.
Visualization tools within spreadsheets.
Charts, graphs, and plots.
Outputting spreadsheet data in printed form.
Reports, presentations.
Utilizing spreadsheets for database functions.
Storing, sorting, and analyzing data.
Creating and editing text documents.
Reports, documentation, and essays.
Structured collections of data.
Used for data management and retrieval.
Software specifically designed for statistical analysis.
SPSS, R, SAS.
Tools for creating visual representations of data.
PowerPoint, Canva.
