- Created by: ellenrobinson
- Created on: 07-02-19 17:29
Large Data Sets
Datasets collect the following:
- DNA sequences
- Protein Sequences
- Protein Structure
- Gather data
- Store data
- Merge meaningful data
- Extract information to prove findings
Applications in the 21st Century
- Decipher geneomes
- Synthetic biology
Bioinformatics has replaced old fashioned labs, saving time and money
- Primary - DNA sequences
- Secondary - results of analysis
This information allows for the creation of cloud web servers such as GenBank, EBI-EMBLE and DDBJ. These companies exchange data daily.
Protein sequences are stored as 1 letter codes, The sequenese fold to form structufe. there are now over 100,000 structures. Programs can now predict and visualise molecules.
Next Generation Sequencing
Identify genes and predict structures
- protein sequences
- protein function & structure
Unknown/ incomplete structures:
- similar structure
- common ancestors
BLAST is commonly used to repidly compare new structures
Used to discover more abount genes.
- Similar proteins
- Disease relationships
A common genome browser. Primary sequences are used to calculate or predict protein infromation simple characteristics such as weight. Sexondary sequences are used to indentify patterns, related proteins and enzyme active sites.
Comparing protein Sequences
Used to compare how sequences changes during evolution.
Databases line up sequences to identify similar features in colums
- Not aligned
- Alternnative alignments
Examples ClustalW2 and Cluster Omega
Applications of Bioinformatics
Determining sequences - shotgun sequencing
- Copy DNA lots of times
- Chop up the copies
- Assemble fragments when they overlap
Meaning from genome sequence - gene annotation
- Where are the genes
- What do they do
This is done rapidly and on a large scale
You need a large number of proteins, from this you can predict 3D structures
Sequences from a single genome are useful for descovering
- Nessesities of life
- Control economically important traits of crops
- Personalised medicine
Only 1% of our genes have not been found in other species.
1000 Genomes Project
An extensive cataogue of genetic variation
It costs on average $800 million to develop a drug. All currently marketed drugs target around 500 gene products, Bioinformatocs can help to evaulate new potential targets and antibiotics. This information is easily accessed by pharamceutical companies. When designing a drug there are programs which "dock" together the proteins and drugs, this is done through high-throughput screeing.