Bioinformatics

?
  • Created by: rosieevie
  • Created on: 22-01-17 19:33

Uses of Bioinformatics

Bioinformatics - storage, manipulation and analysis of biological sequences e.g. DNA, proteins

  • Sequence databases and analytical tools
  • Genome assembly, gene finding and gene structure
  • Evolutionary relationships

Computers required because of:

  • Quantity of info - (complexity of species doesn't equal number of base pairs)
  • Complexity of analysis - shotgun sequencing
    • Chop chromosome into fragments ~500bp - managable
    • Areas of overlaping sequence form contigs - continous sequences
    • Consensus sequence (agreed) worked out
    • Can fail if multiple near-identical sections 
  • Time - compare human genome to original consensus sequence (reference)
1 of 9

Mapping the Genome

Genome = genes + "junk" DNA

Genes located by mapping known gene sequences

Gene characteristics:

  • Identify repeats e.g. satellites - not genes (too simple and reptitive)
  • Locate ORFs, promoters - genes
2 of 9

Open Reading Frames (ORFs)

ORFs - potential to be translated (continous stretch of codons without a stop codon)

Genes can be on either strand of DNA and read in either direction

Longer ORF = more like its a gene

Introns chop up ORFs = harder to distinguish coincidental fragments

CDS - gene coding sequence (same as ORF)

Promoter - characteristic sequence and positioning e.g. TATA box

Find ORF and look upstream for promoters to see if it actually a gene

3 of 9

Gene Discoveries

  • More than one copy for some genes e.g. salivary amylase

Junk DNA:

  • Pseudogenes - look like gene that has mutated and no longer codes
  • Viral genes
  • Processed RNA genes - mRNA converted to dsDNA then incorperated into chromosomes - no intrones and poly-A tail regio
  • ~4000 coincidental ORFS - look like gene but not

1.5% genome is coding

4 of 9

Sequencing Genomes

  • Genome reflects digital sequence - all bases same distance apart, have position to position alignments, and limited set of matches (4 nucleotides/20 amino acids)
  • Genomes get scoring match (% identity) - 1 point each match/0 mismatch
  • Best alignment highest score
  • Allowing gaps improves aligment and score 
  • Justification - insertion/deletion mutations or mRNA = alternatively spliced exon
  • Multiple related sequences compared and aligned to deduce ancestral sequence
5 of 9

Homology

Homology - common ancestor study

Orthologuses - same gene in different species

Paralogues - gene duplication within species = new functions evolve

Homologues - common ancestor

Genome similarity may not be reflected in gene similarity

6 of 9

Family Trees

Genetic family trees - compare orthologues

"Junk" DNA family trees - can be better = aren't affected by natural selection (change over time)

  • Retroviral DNA inserts in genome - number and where they are
  • Inherited mutations/new mutations

Single nucleotide polymorphisms (SNPs) - particular genome postitions that vary

7 of 9

Bioinformatics Sequence Comparissons

Detect:

  • Coding sequences changes (different protein - different function)
  • Gene duplication (new protein - new function)
  • Regulatory region changes (how much protein is made)

Humans - 100 genes w/ nonsense mutations without consequence (av. person 30)

49 regions between us and chimps that are different

However many different regulatory regions = genes aren't different, expression is

8 of 9

Searching Databases

Return highest scoring alignments of query sequence 

e.g. BLAST - Basic Local Alignment Search Tool

Bits score - similarity score (higher = better)

Expect score - number of matches expected by chance

* = Match

: = Conservative (change but similar biological properties)

Blank ( ) = Non-conservative (missense mutation)

9 of 9

Comments

No comments have yet been made

Similar zoology resources:

See all zoology resources »See all Bioinformatics resources »