Large Data Sets

Datasets collect the following:

  • DNA sequences
  • Protein Sequences
  • Protein Structure
1 of 16


  • Gather data 
  • Store data
  • Merge meaningful data
  • Extract information to prove findings 
2 of 16


Applications in the 21st Century 

  • Decipher geneomes 
  • Medicine 
  • Agriculture
  • Enviroment 
  • Synthetic biology 

Bioinformatics has replaced old fashioned labs, saving time and money

3 of 16


  • Primary - DNA sequences 
  • Secondary - results of analysis 

This information allows for the creation of cloud web servers such as GenBank, EBI-EMBLE and DDBJ. These companies exchange data daily. 

4 of 16

Protein sequences

Protein sequences are stored as 1 letter codes, The sequenese fold to form structufe. there are now over 100,000 structures. Programs can now predict and visualise molecules. 

5 of 16

Next Generation Sequencing

Identify genes and predict structures 

6 of 16


Complete structures:

  • name 
  • protein sequences 
  • protein function & structure 

Unknown/ incomplete structures:

  • similar structure
  • common ancestors 

BLAST is commonly used to repidly compare new structures 

7 of 16


Used to discover more abount genes. 

  • Location
  • Function
  • Expression 
  • Similar proteins
  • Disease relationships 
  • Literature
8 of 16


A common genome browser. Primary sequences are used to calculate or predict protein infromation simple characteristics such as weight. Sexondary sequences are used to indentify patterns, related proteins and enzyme active sites. 

9 of 16

Comparing protein Sequences

Used to compare how sequences changes during evolution. 

10 of 16

Sequence Alighment

Databases line up sequences to identify similar features in colums

  • Not aligned 
  • Aligned
  • Alternnative alignments 

Examples ClustalW2 and Cluster Omega

11 of 16

Applications of Bioinformatics

Determining sequences - shotgun sequencing 

  • Copy DNA lots of times 
  • Chop up the copies
  • Assemble fragments when they overlap

Meaning from genome sequence - gene annotation

  • Where are the genes
  • What do they do 

This is done rapidly and on a large scale 

12 of 16

Functional Annotation

You need a large number of proteins, from this you can predict 3D structures

13 of 16

Genome Sequences

Sequences from a single genome are useful for descovering

  • Nessesities of life 
  • Control economically important traits of crops 
  • Personalised medicine 

Only 1% of our genes have not been found in other species.

14 of 16

1000 Genomes Project

An extensive cataogue of genetic variation

15 of 16

Drug discovery

It costs on average $800 million to develop a drug. All currently marketed drugs target around 500 gene products, Bioinformatocs can help to evaulate new potential targets and antibiotics. This information is easily accessed by pharamceutical companies. When designing a drug there are programs which "dock" together the proteins and drugs, this is done through high-throughput screeing.  

16 of 16


No comments have yet been made

Similar Biology resources:

See all Biology resources »See all Molecular Genetics resources »