Page 7

Semester 3: M.Sc. Biotechnology Syllabus 2023-2024

Database concepts, Introduction to internet and its application, Introduction to bioinformatics, Protein and nucleotide databases, Information retrieval from biological databases, Sequence alignment and database searching-similarity searches using BLAST and FASTA
M.Sc. Biotechnology Syllabus 2023-2024
M.Sc. Biotechnology
Core Paper-7 BIOINFORMATICS
3
Periyar University
BIOINFORMATICS
- Database Concepts
  - Definition of Database
    A database is an organized collection of data that can be easily accessed, managed, and updated.
  - Types of Databases
    Includes relational databases, NoSQL databases, graph databases, and object-oriented databases.
  - Database Management System (DBMS)
    Software that interacts with end users, applications, and the database.
  - Normalization and Data Integrity
    Techniques to reduce data redundancy and maintain data integrity.
- Introduction to Internet and its Application
  - Definition of Internet
    A global network of interconnected computers that communicate through standardized protocols.
  - Internet Protocols
    Includes TCP/IP, HTTP, FTP, and others that facilitate data exchange.
  - Applications of Internet
    Email, social media, e-commerce, online education, and information retrieval.
- Introduction to Bioinformatics
  - Definition and Scope
    Combines biology, computer science, and mathematics to analyze biological data.
  - Applications of Bioinformatics
    Used in genomics, proteomics, drug discovery, and personalized medicine.
- Protein and Nucleotide Databases
  - Types of Biological Databases
    Includes sequence databases, structural databases, and functional databases.
  - Popular Databases
    Examples include GenBank, UniProt, and PDB.
- Information Retrieval from Biological Databases
  - Retrieval Techniques
    Methods include keyword search, sequence search, and structural search.
  - Tools for Retrieval
    BLAST, FASTA, and other bioinformatics tools used for data retrieval and analysis.
- Sequence Alignment and Database Searching
  - Importance of Sequence Alignment
    Used to identify similarities and differences between biological sequences.
  - BLAST and FASTA
    Tools for comparing sequences to databases to find similar sequences.
  - Similarity Searches
    Used to assess evolutionary relationships and functional predictions of sequences.
Artificial Intelligence Introduction to biological neural network, motivation for artificial neural network ANN, Big data analysis - DNA RNA protein sequence or structure data, gene expression data, protein-protein interaction PPI data, pathway data and gene ontology GO data
Artificial Intelligence in Biotechnology
- Introduction to Biological Neural Networks
  Biological neural networks are composed of interconnected neurons that process and transmit information through electrochemical signals. They function through synapses, where neurotransmitters are released and received, enabling communication between neurons. The understanding of these networks provides insight into their structure and function, which forms a basis for developing artificial neural networks.
- Motivation for Artificial Neural Networks (ANN)
  The development of artificial neural networks is inspired by the functioning of biological neural networks. ANN aim to replicate the learning, adaptation, and generalization capabilities of biological systems. Applications in fields such as pattern recognition, classification, and data prediction highlight the importance of ANNs. They allow for processing complex data and contribute to advancements in artificial intelligence technologies.
- Big Data Analysis in Biotechnology
  Big data analysis involves handling vast amounts of information generated in biological research. It includes diverse datasets like DNA, RNA, and protein sequences, gene expressions, and pathways. Data analysis facilitates understanding biological processes, disease mechanisms, and the discovery of new drugs. Leveraging big data through bioinformatics enhances the accuracy and efficiency of research.
- DNA, RNA, and Protein Sequence Analysis
  Analyzing DNA, RNA, and protein sequences provides critical information on genetic makeup and functional expressions. Techniques such as sequence alignment and similarity searches identify relationships and evolutionary patterns. This analysis is fundamental for understanding gene functions, mutations, and their implications in biotechnology.
- Gene Expression Data
  Gene expression data reveal the activity levels of genes under various conditions, helping researchers understand biological responses and regulatory mechanisms. Techniques like microarrays and RNA sequencing are used for quantifying expression levels and comparing them across different samples.
- Protein-Protein Interaction (PPI) Data
  PPI data are essential for understanding cellular functions and pathways. Interactions between proteins are critical for various biological processes. Tools for predicting and analyzing PPI networks help elucidate the roles of proteins in pathways and identify potential therapeutic targets.
- Pathway Data
  Pathway data outline the biochemical processes involving multiple molecular entities within a cell. By analyzing these pathways, researchers can identify the roles of specific genes and proteins in disease states, providing insights for targeted therapies and drug development.
- Gene Ontology (GO) Data
  Gene Ontology data standardize the representation of gene and gene product attributes across species. GO terms provide insight into the biological processes, cellular components, and molecular functions associated with genes, facilitating effective communication and data comparison in bioinformatics.
Sequence alignment basics, match, mismatch, similarity, scoring an alignment, gap penalty, protein vs DNA alignments, Dot-matrix alignment, pairwise alignment. Global and local alignment algorithms, multiple sequence alignment-progressive alignment and Iterative alignment algorithms, consensus sequence, patterns and profiles, Database searching Pairwise alignment based rigorous algorithm
Sequence alignment basics
- Introduction to Sequence Alignment
  Sequence alignment is a method used in bioinformatics to arrange sequences of DNA, RNA, or protein to identify regions of similarity. These similarities may indicate functional, structural, or evolutionary relationships.
- Match, Mismatch, and Similarity
  In sequence alignment, a match occurs when two aligned positions are identical. A mismatch occurs when they are different. Similarity considers a broader context, where some mismatches may still indicate functional or evolutionary relevance.
- Scoring an Alignment
  Alignments can be scored using various metrics that account for matches, mismatches, and gaps. Each match is given a positive score, while mismatches and gaps have penalties, affecting the overall alignment score.
- Gap Penalty
  Gap penalties are crucial in scoring alignments, representing the cost of introducing gaps in the alignment. Different strategies for gap penalties exist, including constant penalties and affine penalties where opening and extending gaps have separate costs.
- Protein vs DNA Alignments
  Protein alignments often rely on scoring matrices like BLOSUM and PAM due to the complexity of amino acid substitutions, while DNA alignments generally use simpler scoring methods based on nucleotide matches.
- Dot-Matrix Alignment
  Dot-matrix alignment is a graphical method for visualizing sequence similarity. It involves plotting two sequences on a matrix and marking matches, allowing for a quick visual comparison.
- Pairwise Alignment
  Pairwise alignment compares two sequences, determining the best way to align them. Algorithms for pairwise alignment include dynamic programming methods such as Needleman-Wunsch for global alignment and Smith-Waterman for local alignment.
- Global and Local Alignment Algorithms
  Global alignment aims to align sequences across their entire length, while local alignment focuses on finding the highest scoring subalignments within the sequences. Each method has its applications depending on the sequences being analyzed.
- Multiple Sequence Alignment
  Multiple sequence alignment (MSA) extends pairwise alignment to compare three or more sequences simultaneously. Algorithms include progressive alignment methods like Clustal and iterative methods like T-Coffee.
- Consensus Sequence
  A consensus sequence is derived from multiple aligned sequences, representing the most common base or amino acid at each position. It provides a summary of the genetic information from the aligned sequences.
- Patterns and Profiles
  Patterns represent conserved sequences, while profiles can describe the statistical properties of a set of aligned sequences. These tools help identify functional motifs in large databases.
- Database Searching
  Database searching involves using sequence alignment algorithms to identify similar sequences within large biological databases, facilitating tasks like gene identification and functional annotation.
- Rigorous Pairwise Alignment Algorithms
  Rigorous algorithms for pairwise alignment often involve dynamic programming and backtracking, enabling the identification of optimal alignments. They are foundational for many bioinformatics applications.
Bioinformatics for genome sequencing, EST Clustering and analyses, Finding genes in prokaryotic and eukaryotic genomes, Regulatory sequence analysis, Bioinformatics for Genome maps and markers, Bioinformatics for understanding Genome Variation. protein databank and the PDB Sum-SCOP, CATH, DALI and HSSP Visualization of molecular structures- RasMol and Pymol Protein secondary structure prediction, Fold Recognition Transmembrane topology prediction
Bioinformatics for Genome Sequencing
Genome sequencing involves determining the complete DNA sequence of an organism's genome. Techniques like Sanger sequencing, next-generation sequencing (NGS), and third-generation sequencing are widely used.
EST clustering involves grouping together ESTs that are similar, helped in gene discovery and quantitative expression analysis. This is critical for understanding gene expression in various tissues and developmental stages.
Gene finding algorithms identify gene locations in genomic sequences. Prokaryotic genomes are generally simpler and smaller, making gene identification more straightforward compared to eukaryotic genomes, which contain introns and are more complex.
This involves studying sequences that control the expression of genes. Key elements include promoters, enhancers, silencers, and transcription factor binding sites.
Genome maps provide information on the physical and functional organization of genomes. Bioinformatics tools help annotate genomes and identify genetic markers, facilitating studies in genetics and genomics.
Bioinformatics aids in analyzing variations within genomes, including single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variations. This understanding is crucial for disease studies and personalized medicine.
The Protein Data Bank (PDB) is a repository for 3D structures of proteins. Tools like Sum-SCOP, CATH, DALI, and HSSP help classify and analyze protein structures.
Software such as RasMol and PyMOL allows visualization of protein structures in three dimensions. This is vital for understanding structural biology and the function of proteins.
Predicting secondary structures like alpha-helices and beta-sheets is essential for understanding protein function and folding. Various algorithms and machine learning models are employed for predictions.
Fold recognition predicts the three-dimensional structure of a protein based on its sequence. It is useful when homologous structures are known but the protein of interest has no close homologs.
This is concerned with predicting the orientation and number of transmembrane segments in membrane proteins. Accurate prediction is important for understanding membrane protein functions.
Molecular visualization tools. Rasmol, Chime and Spdb viewer. Structure analysis tools. VAST and DALI, Structural biology - Homology modeling, Bioinformatics for micro array designing and transcriptional profiling, Bioinformatics for metabolic reconstruction, Bioinformatics for phylogenetic analysis
Molecular Visualization Tools and Bioinformatics in Biotechnology
- Molecular Visualization Tools
  Molecular visualization tools help in the graphical representation of molecular structures. They are essential for understanding the configurations and behaviors of biomolecules. Tools like Rasmol, Chime, and SPDB Viewer are popular in the field.
- Rasmol
  Rasmol is a molecular visualization program that displays proteins, nucleic acids, and other molecular structures. It allows users to manipulate and rotate the molecule for better spatial understanding.
- Chime
  Chime is a browser plugin that enables the visualization of molecular structures. It supports interactive 3D views and is widely used for educational purposes and presentations.
- SPDB Viewer
  SPDB Viewer is a web-based molecular visualization tool allowing users to visualize protein structures interactively. It offers a range of features, including molecular surface representation and the ability to analyze ligand binding.
- Structure Analysis Tools
  Structure analysis tools assess the similarities and differences between biological structures. VAST and DALI are notable for their role in comparing protein structures.
- VAST
  VAST (Vector Alignment Search Tool) is used to identify structural similarities between proteins by analyzing the 3D geometry of their structures.
- DALI
  DALI (Distance-matrix Alignment) evaluates structural similarity by comparing distance matrices of protein structures, facilitating the understanding of evolutionary relationships.
- Structural Biology
  Structural biology focuses on understanding the molecular structure and functions of biomolecules. Techniques like homology modeling are essential for predicting the structure of unknown proteins.
- Homology Modeling
  Homology modeling involves building a 3D model of a protein based on its similarity to a known structure. It is fundamental in computational biology to study protein function.
- Bioinformatics for Microarray Designing
  Bioinformatics plays a crucial role in designing microarrays, which are used to analyze gene expression patterns across different conditions.
- Transcriptional Profiling
  Transcriptional profiling involves the analysis of gene expression levels in various conditions, helping in the understanding of cellular responses and gene regulation.
- Bioinformatics for Metabolic Reconstruction
  Metabolic reconstruction in bioinformatics allows for the modeling of metabolic pathways in organisms, aiding in the understanding of metabolism and potential biotechnological applications.
- Bioinformatics for Phylogenetic Analysis
  Phylogenetic analysis in bioinformatics utilizes genetic data to construct evolutionary trees, helping scientists understand the evolutionary relationships among species.
Medical application of Bioinformatics. Disease genes, Drug Discovery. History. Steps in drug discovery. Target Identification. Target Validation. QSAR. Lead Identification. Preclinical pharmacology and toxicology. ADME. Drug designing. Rational drug design. Computer aided drug design. Ligand based approach. Target based approach
Medical Applications of Bioinformatics
- Disease Genes
  Bioinformatics plays a crucial role in identifying and analyzing disease-related genes. By comparing genomic data from healthy and affected individuals, researchers can pinpoint mutations that contribute to diseases. This helps in understanding the genetic basis of disorders and can lead to the development of targeted therapies.
- Drug Discovery
  The drug discovery process is significantly enhanced by bioinformatics through the analysis of biological data. It allows for the identification of potential drug targets and the design of effective compounds that can interact with these targets.
- History
  The integration of bioinformatics into medicine began in the late 20th century, with advancements in genomics and proteomics. The Human Genome Project significantly accelerated the use of bioinformatics tools in medical research.
- Steps in Drug Discovery
  The drug discovery process typically follows several key steps including target identification, target validation, lead identification, and preclinical development.
- Target Identification
  This initial step involves determining a biological molecule that plays a key role in a disease process. Bioinformatics algorithms help sift through data to identify suitable targets such as proteins or genes.
- Target Validation
  Once a potential target is identified, it must be validated to ensure its role in the disease. This often involves further experimentation and the use of computational models.
- QSAR (Quantitative Structure-Activity Relationship)
  QSAR models correlate chemical structure with biological activity, helping in predicting the effects of new compounds based on their molecular structure.
- Lead Identification
  In this stage, potential drug candidates are identified that show promising activity against the validated target. High-throughput screening and computational docking are commonly used techniques.
- Preclinical Pharmacology and Toxicology
  This step evaluates the drug candidates' efficacy and safety in laboratory settings before they can proceed to clinical trials. Bioinformatics tools can model interactions and predict toxicity.
- ADME (Absorption, Distribution, Metabolism, Excretion)
  Understanding how a drug is absorbed, distributed, metabolized, and excreted is crucial. Bioinformatics helps in modeling these processes, providing insights into drug behavior in the body.
- Drug Designing
  The design of new drugs involves creating compounds based on biological data. Modern techniques are heavily reliant on computer simulations to optimize drug candidates.
- Rational Drug Design
  This approach uses the knowledge of molecular biology and biochemistry to design drugs that are specifically targeted to their action sites.
- Computer-Aided Drug Design (CADD)
  CADD involves using computational methods to facilitate the drug design process, improving efficiency in developing new therapeutic agents.
- Ligand-Based Approach
  This approach relies on the knowledge of existing drug-like compounds to develop new drugs by modifying their structures to improve efficacy and safety.
- Target-Based Approach
  In this method, the design and discovery of drugs are informed by understanding the drug targets on a molecular level, which is essential for creating more effective treatments.

Page 7

Semester 3: M.Sc. Biotechnology Syllabus 2023-2024

Database concepts, Introduction to internet and its application, Introduction to bioinformatics, Protein and nucleotide databases, Information retrieval from biological databases, Sequence alignment and database searching-similarity searches using BLAST and FASTA

M.Sc. Biotechnology Syllabus 2023-2024

M.Sc. Biotechnology

Core Paper-7 BIOINFORMATICS

3

Periyar University

BIOINFORMATICS

Database Concepts

Definition of Database

Types of Databases

Database Management System (DBMS)

Normalization and Data Integrity

Introduction to Internet and its Application

Definition of Internet

Internet Protocols

Applications of Internet

Introduction to Bioinformatics

Definition and Scope

Applications of Bioinformatics

Protein and Nucleotide Databases

Types of Biological Databases

Popular Databases

Information Retrieval from Biological Databases

Retrieval Techniques

Tools for Retrieval

Sequence Alignment and Database Searching

Importance of Sequence Alignment

BLAST and FASTA

Similarity Searches

Artificial Intelligence Introduction to biological neural network, motivation for artificial neural network ANN, Big data analysis - DNA RNA protein sequence or structure data, gene expression data, protein-protein interaction PPI data, pathway data and gene ontology GO data

Artificial Intelligence in Biotechnology

Introduction to Biological Neural Networks

Motivation for Artificial Neural Networks (ANN)

Big Data Analysis in Biotechnology

DNA, RNA, and Protein Sequence Analysis

Gene Expression Data

Protein-Protein Interaction (PPI) Data

Pathway Data

Gene Ontology (GO) Data

Sequence alignment basics

Introduction to Sequence Alignment

Match, Mismatch, and Similarity

Scoring an Alignment

Gap Penalty

Protein vs DNA Alignments

Dot-Matrix Alignment

Pairwise Alignment

Global and Local Alignment Algorithms

Multiple Sequence Alignment

Consensus Sequence

Patterns and Profiles

Database Searching

Rigorous Pairwise Alignment Algorithms

Bioinformatics for Genome Sequencing

Genome sequencing involves determining the complete DNA sequence of an organism's genome. Techniques like Sanger sequencing, next-generation sequencing (NGS), and third-generation sequencing are widely used.

EST clustering involves grouping together ESTs that are similar, helped in gene discovery and quantitative expression analysis. This is critical for understanding gene expression in various tissues and developmental stages.

Gene finding algorithms identify gene locations in genomic sequences. Prokaryotic genomes are generally simpler and smaller, making gene identification more straightforward compared to eukaryotic genomes, which contain introns and are more complex.

This involves studying sequences that control the expression of genes. Key elements include promoters, enhancers, silencers, and transcription factor binding sites.

Genome maps provide information on the physical and functional organization of genomes. Bioinformatics tools help annotate genomes and identify genetic markers, facilitating studies in genetics and genomics.

Bioinformatics aids in analyzing variations within genomes, including single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variations. This understanding is crucial for disease studies and personalized medicine.

The Protein Data Bank (PDB) is a repository for 3D structures of proteins. Tools like Sum-SCOP, CATH, DALI, and HSSP help classify and analyze protein structures.

Software such as RasMol and PyMOL allows visualization of protein structures in three dimensions. This is vital for understanding structural biology and the function of proteins.

Predicting secondary structures like alpha-helices and beta-sheets is essential for understanding protein function and folding. Various algorithms and machine learning models are employed for predictions.

Fold recognition predicts the three-dimensional structure of a protein based on its sequence. It is useful when homologous structures are known but the protein of interest has no close homologs.

This is concerned with predicting the orientation and number of transmembrane segments in membrane proteins. Accurate prediction is important for understanding membrane protein functions.

Molecular Visualization Tools and Bioinformatics in Biotechnology

Molecular Visualization Tools

Rasmol

Chime

SPDB Viewer

Structure Analysis Tools

VAST

DALI

Structural Biology

Homology Modeling

Bioinformatics for Microarray Designing

Transcriptional Profiling

Bioinformatics for Metabolic Reconstruction