Page 7
Semester 3: M.Sc. Biotechnology Syllabus 2023-2024
Database concepts, Introduction to internet and its application, Introduction to bioinformatics, Protein and nucleotide databases, Information retrieval from biological databases, Sequence alignment and database searching-similarity searches using BLAST and FASTA
M.Sc. Biotechnology Syllabus 2023-2024
M.Sc. Biotechnology
Core Paper-7 BIOINFORMATICS
3
Periyar University
BIOINFORMATICS
Database Concepts
Definition of Database
A database is an organized collection of data that can be easily accessed, managed, and updated.
Types of Databases
Includes relational databases, NoSQL databases, graph databases, and object-oriented databases.
Database Management System (DBMS)
Software that interacts with end users, applications, and the database.
Normalization and Data Integrity
Techniques to reduce data redundancy and maintain data integrity.
Introduction to Internet and its Application
Definition of Internet
A global network of interconnected computers that communicate through standardized protocols.
Internet Protocols
Includes TCP/IP, HTTP, FTP, and others that facilitate data exchange.
Applications of Internet
Email, social media, e-commerce, online education, and information retrieval.
Introduction to Bioinformatics
Definition and Scope
Combines biology, computer science, and mathematics to analyze biological data.
Applications of Bioinformatics
Used in genomics, proteomics, drug discovery, and personalized medicine.
Protein and Nucleotide Databases
Types of Biological Databases
Includes sequence databases, structural databases, and functional databases.
Popular Databases
Examples include GenBank, UniProt, and PDB.
Information Retrieval from Biological Databases
Retrieval Techniques
Methods include keyword search, sequence search, and structural search.
Tools for Retrieval
BLAST, FASTA, and other bioinformatics tools used for data retrieval and analysis.
Sequence Alignment and Database Searching
Importance of Sequence Alignment
Used to identify similarities and differences between biological sequences.
BLAST and FASTA
Tools for comparing sequences to databases to find similar sequences.
Similarity Searches
Used to assess evolutionary relationships and functional predictions of sequences.
Artificial Intelligence Introduction to biological neural network, motivation for artificial neural network ANN, Big data analysis - DNA RNA protein sequence or structure data, gene expression data, protein-protein interaction PPI data, pathway data and gene ontology GO data
Artificial Intelligence in Biotechnology
Introduction to Biological Neural Networks
Biological neural networks are composed of interconnected neurons that process and transmit information through electrochemical signals. They function through synapses, where neurotransmitters are released and received, enabling communication between neurons. The understanding of these networks provides insight into their structure and function, which forms a basis for developing artificial neural networks.
Motivation for Artificial Neural Networks (ANN)
The development of artificial neural networks is inspired by the functioning of biological neural networks. ANN aim to replicate the learning, adaptation, and generalization capabilities of biological systems. Applications in fields such as pattern recognition, classification, and data prediction highlight the importance of ANNs. They allow for processing complex data and contribute to advancements in artificial intelligence technologies.
Big Data Analysis in Biotechnology
Big data analysis involves handling vast amounts of information generated in biological research. It includes diverse datasets like DNA, RNA, and protein sequences, gene expressions, and pathways. Data analysis facilitates understanding biological processes, disease mechanisms, and the discovery of new drugs. Leveraging big data through bioinformatics enhances the accuracy and efficiency of research.
DNA, RNA, and Protein Sequence Analysis
Analyzing DNA, RNA, and protein sequences provides critical information on genetic makeup and functional expressions. Techniques such as sequence alignment and similarity searches identify relationships and evolutionary patterns. This analysis is fundamental for understanding gene functions, mutations, and their implications in biotechnology.
Gene Expression Data
Gene expression data reveal the activity levels of genes under various conditions, helping researchers understand biological responses and regulatory mechanisms. Techniques like microarrays and RNA sequencing are used for quantifying expression levels and comparing them across different samples.
Protein-Protein Interaction (PPI) Data
PPI data are essential for understanding cellular functions and pathways. Interactions between proteins are critical for various biological processes. Tools for predicting and analyzing PPI networks help elucidate the roles of proteins in pathways and identify potential therapeutic targets.
Pathway Data
Pathway data outline the biochemical processes involving multiple molecular entities within a cell. By analyzing these pathways, researchers can identify the roles of specific genes and proteins in disease states, providing insights for targeted therapies and drug development.
Gene Ontology (GO) Data
Gene Ontology data standardize the representation of gene and gene product attributes across species. GO terms provide insight into the biological processes, cellular components, and molecular functions associated with genes, facilitating effective communication and data comparison in bioinformatics.
Sequence alignment basics, match, mismatch, similarity, scoring an alignment, gap penalty, protein vs DNA alignments, Dot-matrix alignment, pairwise alignment. Global and local alignment algorithms, multiple sequence alignment-progressive alignment and Iterative alignment algorithms, consensus sequence, patterns and profiles, Database searching Pairwise alignment based rigorous algorithm
Sequence alignment basics
Introduction to Sequence Alignment
Sequence alignment is a method used in bioinformatics to arrange sequences of DNA, RNA, or protein to identify regions of similarity. These similarities may indicate functional, structural, or evolutionary relationships.
Match, Mismatch, and Similarity
In sequence alignment, a match occurs when two aligned positions are identical. A mismatch occurs when they are different. Similarity considers a broader context, where some mismatches may still indicate functional or evolutionary relevance.
Scoring an Alignment
Alignments can be scored using various metrics that account for matches, mismatches, and gaps. Each match is given a positive score, while mismatches and gaps have penalties, affecting the overall alignment score.
Gap Penalty
Gap penalties are crucial in scoring alignments, representing the cost of introducing gaps in the alignment. Different strategies for gap penalties exist, including constant penalties and affine penalties where opening and extending gaps have separate costs.
Protein vs DNA Alignments
Protein alignments often rely on scoring matrices like BLOSUM and PAM due to the complexity of amino acid substitutions, while DNA alignments generally use simpler scoring methods based on nucleotide matches.
Dot-Matrix Alignment
Dot-matrix alignment is a graphical method for visualizing sequence similarity. It involves plotting two sequences on a matrix and marking matches, allowing for a quick visual comparison.
Pairwise Alignment
Pairwise alignment compares two sequences, determining the best way to align them. Algorithms for pairwise alignment include dynamic programming methods such as Needleman-Wunsch for global alignment and Smith-Waterman for local alignment.
Global and Local Alignment Algorithms
Global alignment aims to align sequences across their entire length, while local alignment focuses on finding the highest scoring subalignments within the sequences. Each method has its applications depending on the sequences being analyzed.
Multiple Sequence Alignment
Multiple sequence alignment (MSA) extends pairwise alignment to compare three or more sequences simultaneously. Algorithms include progressive alignment methods like Clustal and iterative methods like T-Coffee.
Consensus Sequence
A consensus sequence is derived from multiple aligned sequences, representing the most common base or amino acid at each position. It provides a summary of the genetic information from the aligned sequences.
Patterns and Profiles
Patterns represent conserved sequences, while profiles can describe the statistical properties of a set of aligned sequences. These tools help identify functional motifs in large databases.
Database Searching
Database searching involves using sequence alignment algorithms to identify similar sequences within large biological databases, facilitating tasks like gene identification and functional annotation.
Rigorous Pairwise Alignment Algorithms
Rigorous algorithms for pairwise alignment often involve dynamic programming and backtracking, enabling the identification of optimal alignments. They are foundational for many bioinformatics applications.
Bioinformatics for genome sequencing, EST Clustering and analyses, Finding genes in prokaryotic and eukaryotic genomes, Regulatory sequence analysis, Bioinformatics for Genome maps and markers, Bioinformatics for understanding Genome Variation. protein databank and the PDB Sum-SCOP, CATH, DALI and HSSP Visualization of molecular structures- RasMol and Pymol Protein secondary structure prediction, Fold Recognition Transmembrane topology prediction
Bioinformatics for Genome Sequencing
Genome sequencing involves determining the complete DNA sequence of an organism's genome. Techniques like Sanger sequencing, next-generation sequencing (NGS), and third-generation sequencing are widely used.
EST clustering involves grouping together ESTs that are similar, helped in gene discovery and quantitative expression analysis. This is critical for understanding gene expression in various tissues and developmental stages.
Gene finding algorithms identify gene locations in genomic sequences. Prokaryotic genomes are generally simpler and smaller, making gene identification more straightforward compared to eukaryotic genomes, which contain introns and are more complex.
This involves studying sequences that control the expression of genes. Key elements include promoters, enhancers, silencers, and transcription factor binding sites.
Genome maps provide information on the physical and functional organization of genomes. Bioinformatics tools help annotate genomes and identify genetic markers, facilitating studies in genetics and genomics.
Bioinformatics aids in analyzing variations within genomes, including single nucleotide polymorphisms (SNPs), insertions, deletions, and structural variations. This understanding is crucial for disease studies and personalized medicine.
The Protein Data Bank (PDB) is a repository for 3D structures of proteins. Tools like Sum-SCOP, CATH, DALI, and HSSP help classify and analyze protein structures.
Software such as RasMol and PyMOL allows visualization of protein structures in three dimensions. This is vital for understanding structural biology and the function of proteins.
Predicting secondary structures like alpha-helices and beta-sheets is essential for understanding protein function and folding. Various algorithms and machine learning models are employed for predictions.
Fold recognition predicts the three-dimensional structure of a protein based on its sequence. It is useful when homologous structures are known but the protein of interest has no close homologs.
This is concerned with predicting the orientation and number of transmembrane segments in membrane proteins. Accurate prediction is important for understanding membrane protein functions.
Molecular visualization tools. Rasmol, Chime and Spdb viewer. Structure analysis tools. VAST and DALI, Structural biology - Homology modeling, Bioinformatics for micro array designing and transcriptional profiling, Bioinformatics for metabolic reconstruction, Bioinformatics for phylogenetic analysis
Molecular Visualization Tools and Bioinformatics in Biotechnology
Molecular Visualization Tools
Molecular visualization tools help in the graphical representation of molecular structures. They are essential for understanding the configurations and behaviors of biomolecules. Tools like Rasmol, Chime, and SPDB Viewer are popular in the field.
Rasmol
Rasmol is a molecular visualization program that displays proteins, nucleic acids, and other molecular structures. It allows users to manipulate and rotate the molecule for better spatial understanding.
Chime
Chime is a browser plugin that enables the visualization of molecular structures. It supports interactive 3D views and is widely used for educational purposes and presentations.
SPDB Viewer
SPDB Viewer is a web-based molecular visualization tool allowing users to visualize protein structures interactively. It offers a range of features, including molecular surface representation and the ability to analyze ligand binding.
Structure Analysis Tools
Structure analysis tools assess the similarities and differences between biological structures. VAST and DALI are notable for their role in comparing protein structures.
VAST
VAST (Vector Alignment Search Tool) is used to identify structural similarities between proteins by analyzing the 3D geometry of their structures.
DALI
DALI (Distance-matrix Alignment) evaluates structural similarity by comparing distance matrices of protein structures, facilitating the understanding of evolutionary relationships.
Structural Biology
Structural biology focuses on understanding the molecular structure and functions of biomolecules. Techniques like homology modeling are essential for predicting the structure of unknown proteins.
Homology Modeling
Homology modeling involves building a 3D model of a protein based on its similarity to a known structure. It is fundamental in computational biology to study protein function.
Bioinformatics for Microarray Designing
Bioinformatics plays a crucial role in designing microarrays, which are used to analyze gene expression patterns across different conditions.
Transcriptional Profiling
Transcriptional profiling involves the analysis of gene expression levels in various conditions, helping in the understanding of cellular responses and gene regulation.
Bioinformatics for Metabolic Reconstruction
Metabolic reconstruction in bioinformatics allows for the modeling of metabolic pathways in organisms, aiding in the understanding of metabolism and potential biotechnological applications.
Bioinformatics for Phylogenetic Analysis
Phylogenetic analysis in bioinformatics utilizes genetic data to construct evolutionary trees, helping scientists understand the evolutionary relationships among species.
Medical application of Bioinformatics. Disease genes, Drug Discovery. History. Steps in drug discovery. Target Identification. Target Validation. QSAR. Lead Identification. Preclinical pharmacology and toxicology. ADME. Drug designing. Rational drug design. Computer aided drug design. Ligand based approach. Target based approach
Medical Applications of Bioinformatics
Disease Genes
Bioinformatics plays a crucial role in identifying and analyzing disease-related genes. By comparing genomic data from healthy and affected individuals, researchers can pinpoint mutations that contribute to diseases. This helps in understanding the genetic basis of disorders and can lead to the development of targeted therapies.
Drug Discovery
The drug discovery process is significantly enhanced by bioinformatics through the analysis of biological data. It allows for the identification of potential drug targets and the design of effective compounds that can interact with these targets.
History
The integration of bioinformatics into medicine began in the late 20th century, with advancements in genomics and proteomics. The Human Genome Project significantly accelerated the use of bioinformatics tools in medical research.
Steps in Drug Discovery
The drug discovery process typically follows several key steps including target identification, target validation, lead identification, and preclinical development.
Target Identification
This initial step involves determining a biological molecule that plays a key role in a disease process. Bioinformatics algorithms help sift through data to identify suitable targets such as proteins or genes.
Target Validation
Once a potential target is identified, it must be validated to ensure its role in the disease. This often involves further experimentation and the use of computational models.
QSAR (Quantitative Structure-Activity Relationship)
QSAR models correlate chemical structure with biological activity, helping in predicting the effects of new compounds based on their molecular structure.
Lead Identification
In this stage, potential drug candidates are identified that show promising activity against the validated target. High-throughput screening and computational docking are commonly used techniques.
Preclinical Pharmacology and Toxicology
This step evaluates the drug candidates' efficacy and safety in laboratory settings before they can proceed to clinical trials. Bioinformatics tools can model interactions and predict toxicity.
ADME (Absorption, Distribution, Metabolism, Excretion)
Understanding how a drug is absorbed, distributed, metabolized, and excreted is crucial. Bioinformatics helps in modeling these processes, providing insights into drug behavior in the body.
Drug Designing
The design of new drugs involves creating compounds based on biological data. Modern techniques are heavily reliant on computer simulations to optimize drug candidates.
Rational Drug Design
This approach uses the knowledge of molecular biology and biochemistry to design drugs that are specifically targeted to their action sites.
Computer-Aided Drug Design (CADD)
CADD involves using computational methods to facilitate the drug design process, improving efficiency in developing new therapeutic agents.
Ligand-Based Approach
This approach relies on the knowledge of existing drug-like compounds to develop new drugs by modifying their structures to improve efficacy and safety.
Target-Based Approach
In this method, the design and discovery of drugs are informed by understanding the drug targets on a molecular level, which is essential for creating more effective treatments.
