Research Guides: BIO372H5: Molecular Biology: Sequences

Proteins

Search these websites for a specific amino acid sequence and for information on specific proteins.

UniProt Knowledge Database
This is an annotated, protein sequence database. Annotations include information such as amino acid sequences, disease information or biological processes that the protein is involved in.
NCBI Protein Databases
Prosite
Database of protein domains, families and functional sites.
Reference Sequence (RefSeq)
A comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq genomes are copies of selected assembled genomes available in GenBank.

Nucleotides

The International Nucleotide Sequence Database Collaboration (INSDC) is a joint operation of three different centers: European Bioinformatics Institute (EBI), the DNA Databank of Japan (DDBJ) and NIH GenBank that operates out of the National Center of Biotechnology Information (NCBI).

Nucleotide
This is a collection of nucleotide sequences gathered from sources like Protein Data Bank, GenBank and RefSeq.
GenBank Sample Record
Click on any link in this sample record to see a detailed description of that data element or field.

Genes

Gene
Gene is a database of known and predicted genes, focusing on completely sequenced genomes and genes/genomes that are currently being researched. Gene integrates information from a wide range of species.
BioGene
A web browser enabled tool to learn about gene function. Enter a gene symbol or gene name, for example "CDK4" or "cyclin dependent kinase 4" and BioGene will retrieve its gene function and references into its function. BioGene was produced in affiliation with the Computational Biology Center at Memorial Sloan-Kettering Cancer Center with primary information from Entrez Gene at the NCBI.

Genetic Code

A few examples of genetic code apps for different platforms. These are NOT identical apps.

microRNA

miRDB
An online database for miRNA target prediction and functional annotations. miRDB hosts predicted miRNA targets in five species: human, mouse, rat, dog and chicken.
miRBase
A searchable database of published miRNA sequences and annotation.

FASTA

Most sequence tools require a FASTA format of the sequence to run their tests. This text-based format uses single-letter codes to represent base pairs or amino acids. The first line of the FASTA, called the header, is descriptive and always starts with a >. The remainder of the format consists of the sequence. Click here for a short description of the format and a list of the single letter codes.

Sequencing Tools

BLAST a Sequence
The Basic Local Alignment Search Tool (BLAST) allows you to compare a sequence with all other sequences in their database. A BLAST search obtains sequences similar to the one entered in the search query, along with the statistical significance of each match.
Ensembl Exon View
Ensembl is a genome browser that allows users to look at transcripts of eukaryotic organisms and their introns/exons.
Open Reading Frame Finder & Restriction Enzyme Cutter
This tool finds both the open reading frame of a nucleotide sequence, and the restriction enzymes that may be used to cut this fragment.
BRENDA
An enzyme information system representing one of the most comprehensive enzyme repositories.

Sequence Alignment Tools

Multiple Sequence Alignment (MSA) Tools
MSA is the alignment of three or more biological sequences (protein or nucleic acid) of similar length. From the output, homology can be inferred and the evolutionary relationships between the sequences studied.
Pairwise Sequence Alignment Tools
Pairwise Sequence Alignment is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences (protein or nucleic acid).