Difference between revisions of "Blitvak Week 11"

From LMU BioDB 2015
Jump to: navigation, search
(added efflux definition)
(added figure results)
Line 78: Line 78:
 
=====Sequence and Annotation Deposit=====
 
=====Sequence and Annotation Deposit=====
 
*Complete genomic sequence of ''B. cenocepacia'' strain J2315 was placed in the EMBL database (accession numbers: AM747720, AM747721, AM747722, and AM747723)
 
*Complete genomic sequence of ''B. cenocepacia'' strain J2315 was placed in the EMBL database (accession numbers: AM747720, AM747721, AM747722, and AM747723)
 +
====Figure/Table Results====
 +
'''FIG. 1''': Complete genome of J2315 is comprised of three circular chromosomes and a plasmid. Chromosomes are of sizes 3,870,082, 3,217,062, and 875,977 bp; plasmid is 92,661 bp. There exist several relatively large RODs and genomic islands (many other ''B. cenocepacia'' strains lack orthologues with respect to the genomic islands). Chromosome 1 seems to have a moderate number of genomic islands and too many RODs, Chromosome 2 appears to have the smallest number and size with respect to the genomic islands, Chromosome 3 seems to have the largest number of genomic islands and RODs (and, consequently, the smallest percent of orthologous genes compared to the other strains and to ''Ralstonia solanacearum'').
 +
'''FIG. 2''': Chromosomes 2 and 3 contain a greater number (proportion) of coding sequences that have an accessory role (involved with functions like horizontal gene transfer and protective response) or with an unknown function. Chromosome 1 has a greater proportion of coding sequences that are related to core cell functions, such as cell division/chromosome replication, macromolecule/amino acid/nucleotide biosynthesis (genes related to metabolism and division).
 +
'''FIG. 3''': Number/percent of orthologous coding sequences in the J2315 was greatest in groups that were more taxonomically related to J2315. 78 to 63% of total coding sequences in BCC members,  56 to 50% in other ''Burkholderia'' species, and 37% of all coding sequences in ''Ralstonia solanacearum''. Chromosome 1 has the highest degree of conservation, chromosome 2 a little less, and chromosome 3 has the lowest degree of conservation.
 +
'''TABLE 1''':8,055,782 total bp; chromosome 1 largest, chromosome 2 is similar to size but is smaller, chromosome 3 is the smallest (also exists a small plasmid). G+C content % is similar between all four replicons (smallest in plasmid, followed by chromosome 1). There are 7,261 total coding sequences (85.9% of DNA involved with coding). Plasmid has the smallest average gene length. Chromosome 1 possess the vast majority of tRNA related genes (66, compared to the 6 and 2 of chromosome 2 and 3, respectively). Chromosome 1 also holds the majority of IS elements (vast), pseudogenes/partial genes (moderate majority), and of miscellaneous RNA (vast majority).
 +
'''TABLE 2''': 15 genomic islands (most and largest appear on chromosome 1); many genomic islands also have IS elements which are integrated into various sites in the bacterial genome (all islands, except one, contain one putative integrase). Many islands are prophages/have phage origins and most are miscellaneous islands. Many resistances are linked to the cenocepacia island (antibiotic/arsenic resistance, along with stress response coding sequences).
 +
'''TABLE 3''': Variety of virulence functions are encoded in the J2315 genome; several genes related to the virulence functions are absent from other strains of ''B. cenocepacia''. Cable pilus coding sequences are unique to J2315 compared to other strains; five BuHA family proteins are also unique to J2315
 +
'''TABLE 4''': Many drug resistance determinants target unknown antibiotics/antimicrobial compounds. Some coding sequences are strain-specific (J2315 has elevated drug resistance, compared to other strains). Six families of transport systems were identified (ABC, MFS, MATE, RND, SMR, and fusaric acid resistance family proteins)
 +
'''TABLE 5''': Many virulence determinants found in  other '''B. cenocepacia''' strains were found to be pseudogenes in J2315
 +
'''TABLE 6''': All tested ET12 strains possess pseudogenes which disrupt cepacian capsule functions and pyochelin biosynthesis. Only J2315 has a pseudogene at BCAL3517 (T2SS) with a 110 bp deletion. O antigen is also interrupted in J2315 and similarly in BCC0016 (which is an ET12 strain), all others are uninterrupted or with no product. BCAL3223 is also interrupted (differently from the K56-2 strain).

Revision as of 08:26, 15 November 2015

Initial Project Work

Work done on 11/10

  • A possible MOD covering B. cenocepacia was found: Burkholderia Genome Database
  • A test search was conducted for 'Burkholderia cenocepacia J2315

11/12

  • The possible MOD was accessed and a test search was conducted:

Test search BL.png

  • It was noticed that most "locus tags" were in the format BCAL####
  • Some of the tags included an A at the end
  • It was noticed that instead of BCAL, some genes were BCAM or BCAS
  • Another test search was performed to observe the total number of genes with an ID that begins with "BCA":

BL testSearch 2.png

    • This searched yielded 7341 records (potentially, 7341 different genes)
  • A similar search was conducted using the term "BCAL", which should cover all of genes with IDs that start with "BCAL"
    • This search yielded 3603 records
  • Another search was done using "BCAM"
    • This search yielded 2859 records
  • A search was done which covered the term "BCAS"
    • This search yielded 779 records
  • The last three searches were summed, yielding 7241 records (100 are unaccounted for)
  • An advanced search was done in order to find the 100 records with an unknown starting pattern:

Final TestSearch BL.png

    • It was found that the last 100 records started with pBCA (100 results, only)
    • In this last search, it was noticed that most of the tags only had 3 numbers and that some ended with a lowercase a
  • Reviewing the website, it was noticed that "%" is interpreted as wildcard by the database. A search was then done using the term "BCA%a" in order to see the number of records that ended with an "A"; it was noticed that some of these results included a lower-case r before the numbers (looking at the other columns in the search result, such as "Product name", it was realized that the tags that included the r corresponded to genes encoding for different types of tRNA (which correspond to the different amino acids)
  • In the latest news section of the database it was stated that there is an updated beta version of the MOD available (http://beta.burkholderia.com/); this updated website will be used in future work

Preparation for Genome Paper Presentation

Unfamiliar Biology Terms from the Manuscript

  • Saprophyte: An organism that absorbs, feeds upon, or grows upon decaying organic matter or waste (matter could be originally from animal or plant sources)
  • Orthologous gene/Ortholog: A gene that is found within two or more species that likely originated within a common ancestor (the genes, if orthologous, can be traced back to a common ancestor)
  • CDSs: abbreviated form of "coding sequences" (the region of DNA that is actually transcribed and translated to protein or functional RNA)
  • Replicon: Unit of DNA that contains a DNA replication origin, a termination point, and the potential for self-replication OR a linear or circular segment of DNA/RNA which replicates (sequentially) as a unit
  • Concatenation (in relation to genetics): The joining of two DNA fragments (in a lab, physically, or in software)
  • Genomic Island: Gene clusters that likely originated due to horizontal gene transfer (>8 kb in size) in bacterial/archaeal genomes. Genomic islands encode for genes that are notable adaptations with environmental and medical interest (these genes play an important role in the evolution, or population change, of such microbes with the islands)
  • Mobile genetic elements: small mobile sequences of DNA which can replicate and insert themselves at random sites within chromosomes (also known as a transposon). In bacteria, MGEs come in simple (only code for the genes needed for insertion) and complex (contains genes in addition to what's needed for insertion) forms.
  • Rearrangement: Structural change of a chromosome that leads to a change in the loci order
  • Prophage: Genome of a lysogenic bacteriophage that has come to be incorporated into the chromosome of the bacterial host (prophage is replicated along with the host chromosome)
  • Efflux system: An active transport system (localized in the cytoplasmic membrane) with the purpose of moving substrate in and out of a bacterial cell (e.g. antibiotics)
  • Peritrichous: Related to cilia/appendage organs projecting from around the cell; uniform distribution of flagella over a cell
  • Fimbrial: Short filamentous projection in a bacterial cell that is used for adherence to other bacterial cells or to animal cells (not for motility)

Article Outline

Link to Article: http://jb.asm.org/content/191/1/261.long

Importance of the work

  • B. cenocepacia is a very clinically relevant part of the B. cepacia complex (BCC), which is a group of hardy (high degree of antibiotic resistance) gram-negative bacteria that typically reside in water or soil (18 different species, with some being plant or human pathogens). B. cenocepacia is an opportunistic pathogen which causes lung infections in CF (cystic fibrosis) patients; infection by B. cenocepacia is extremely difficult to treat due to a high level of antibiotic resistance, and thus, infection is tied to increased levels of mortality and a decline in the functioning of the lung. The manuscript covered the genome of B. cenocepacia J2315, which is a member of a recently emerged (1990s) epidemic lineage of B. cenocepacia that was extremely transmissible (especially between people with CF); this epidemic lineage is known as the ET12 epidemic strain, that is a part of the IIIA subgroup of B. cenocepacia (subgroups were found, phylogenetically, through the analysis of the recA gene). IIIA strains, unlike those associated with the other subgroups, are rarely encountered in a natural environment, suggesting that the strains have strongly adapted to a host-associated pathogen lifestyle (versus that of a soil saprophyte). There also exist many virulence markers that are encountered more frequently with IIIA strains than with other subgroups; the ET12 isolates, additionally, are known to have a cable pilus which permits binding to molecules within the host environment, such as mucins (which are abundant in the lung). J2315, specifically, is an isolate derived from a CF patient and it exhibits strong levels of antibiotic resistance; it is a member of the ET12 lineage which is a part of the IIIA subgroup. The value of the genomic analysis of J2315 lies in the fact that it will give some elucidation regarding the factors responsible for the success of the strain (via CF patient infection); genomic analysis will also help explain how the members of the ET12 lineage adapted, recently, to holding a niche via human infection (instead of holding a niche in the soil, as a soil saprophyte). In short, J2315 represents a unique and extremely significant pathogen in the realm of CF treatment as it possesses properties that allow it thrive even further in the lung environment than other related strains/subgroups; genomic analysis will produce something that will serve as an essential resource for future investigations into J2315 and the disease that is caused by Burkholderia cenocepacia.

Methods Employed in the Study

Sequencing
  • Used strains of B. cenocepacia in this study: K56-2, BC7, LMG 13307 (BCC0162), CEP0791 (BCC0077), LMG 13320 (BCC0179), FC0504 (BCC0313), LMG 18827 (BCC0016), BCC1261, CEP0826 (BCC0222).
  • J2315 was grown via broth culture and was harvested through centrifugation. Bacterial pellets were suspended in a solution designed for cell lysis; the lysate was then incubated and the DNA was purified (via protein and polysacharride precipitation, which was later removed by centrifugation). DNA was collected from the lysate through ethanol precipitation
    • Note: Protocol for DNA extraction was not directly covered in the genome paper; another paper was cited with these methods, which is: Identification and characterization of a novel DNA marker associated with epidemic Burkholderia cepacia strains recovered from patients with cystic fibrosis (authors stated "DNA was extracted exactly as described previously")
  • Sequence data were derived from the creation of genomic shotgun libraries (m13mp18 and pUC18 libraries); the shotgun sequencing led to 215,165 end sequences (which represents 11.9 fold coverage).
  • Sequence was annotated using Artemis software and initial coding sequence predictions were done through the use of software (Orpheous, Glimmer2, and Easygene). The predictions made by the software were combined and they were further refined using comparisons to nonredundant protein databases via BLAST/FASTA software, positional base preference methods, and codon usage analysis
  • The whole DNA sequence, using all 6 possible reading frames, was also compared against UniProt, via BLASTX, to improve the quality of previous work (purpose was to identify any possible coding sequences that were missed to earlier work)
  • Protein structural motifs were identified through the use of Pfam and Prosite, transmembrane domains were found through TMHMM; signal sequences were identified through the use of SignalP version 2.0
  • Stable RNAs and tRNAs were identified through the use of Rfam and tRNAscan-SE, respectively
  • rRNAs were identified through the use of BLASTN alignment with defined rRNAs from EMBL nucleotide database
Genome Sequence Comparison
  • The J2315 genome was compared to B. vietnamensis strain G4, B. ambifaria strain AMMD, Ralstonia solanacearum strain GMI1000, B. thailandensis strain E264, B. mallei strain ATCC 23344, B. pseudomallei strain K96243, B. contaminans strain 383, B. xenovorans strain LB400, and B. cenocepacia strains AU1054 and HI2424
  • Artemis Comparison Tool was used to support the comparison of genome sequences; it allowed the visualization of TBLASTX and BLASTN comparisons.
  • FASTA, with manual curation, was utilized to identify orthologous proteins as "reciprocal best matches"
  • Inactivating mutations in pseudogenes were checked against the original sequencing data
PCR
  • PCR amplification was conducted using the primers BCAL3517 (annealing temp. 63 to 68°C), BCAL3223 (60 to 68°C), BCAL3125 (60°C), BCAM2228 (68°C), and BCAM0856 (68°C)
  • PCR was done using Platinum Pfx DNA polymerase with 1/10 enhancer solution
  • Amplification: First 94°C for 10 min, then 40 cycles of 94°C for 30 seconds, and 68°C for 1 min per kilobase, then a final extension of 10 min at 68°C
Sequence and Annotation Deposit
  • Complete genomic sequence of B. cenocepacia strain J2315 was placed in the EMBL database (accession numbers: AM747720, AM747721, AM747722, and AM747723)

Figure/Table Results

FIG. 1: Complete genome of J2315 is comprised of three circular chromosomes and a plasmid. Chromosomes are of sizes 3,870,082, 3,217,062, and 875,977 bp; plasmid is 92,661 bp. There exist several relatively large RODs and genomic islands (many other B. cenocepacia strains lack orthologues with respect to the genomic islands). Chromosome 1 seems to have a moderate number of genomic islands and too many RODs, Chromosome 2 appears to have the smallest number and size with respect to the genomic islands, Chromosome 3 seems to have the largest number of genomic islands and RODs (and, consequently, the smallest percent of orthologous genes compared to the other strains and to Ralstonia solanacearum). FIG. 2: Chromosomes 2 and 3 contain a greater number (proportion) of coding sequences that have an accessory role (involved with functions like horizontal gene transfer and protective response) or with an unknown function. Chromosome 1 has a greater proportion of coding sequences that are related to core cell functions, such as cell division/chromosome replication, macromolecule/amino acid/nucleotide biosynthesis (genes related to metabolism and division). FIG. 3: Number/percent of orthologous coding sequences in the J2315 was greatest in groups that were more taxonomically related to J2315. 78 to 63% of total coding sequences in BCC members, 56 to 50% in other Burkholderia species, and 37% of all coding sequences in Ralstonia solanacearum. Chromosome 1 has the highest degree of conservation, chromosome 2 a little less, and chromosome 3 has the lowest degree of conservation. TABLE 1:8,055,782 total bp; chromosome 1 largest, chromosome 2 is similar to size but is smaller, chromosome 3 is the smallest (also exists a small plasmid). G+C content % is similar between all four replicons (smallest in plasmid, followed by chromosome 1). There are 7,261 total coding sequences (85.9% of DNA involved with coding). Plasmid has the smallest average gene length. Chromosome 1 possess the vast majority of tRNA related genes (66, compared to the 6 and 2 of chromosome 2 and 3, respectively). Chromosome 1 also holds the majority of IS elements (vast), pseudogenes/partial genes (moderate majority), and of miscellaneous RNA (vast majority). TABLE 2: 15 genomic islands (most and largest appear on chromosome 1); many genomic islands also have IS elements which are integrated into various sites in the bacterial genome (all islands, except one, contain one putative integrase). Many islands are prophages/have phage origins and most are miscellaneous islands. Many resistances are linked to the cenocepacia island (antibiotic/arsenic resistance, along with stress response coding sequences). TABLE 3: Variety of virulence functions are encoded in the J2315 genome; several genes related to the virulence functions are absent from other strains of B. cenocepacia. Cable pilus coding sequences are unique to J2315 compared to other strains; five BuHA family proteins are also unique to J2315 TABLE 4: Many drug resistance determinants target unknown antibiotics/antimicrobial compounds. Some coding sequences are strain-specific (J2315 has elevated drug resistance, compared to other strains). Six families of transport systems were identified (ABC, MFS, MATE, RND, SMR, and fusaric acid resistance family proteins) TABLE 5: Many virulence determinants found in other B. cenocepacia strains were found to be pseudogenes in J2315 TABLE 6: All tested ET12 strains possess pseudogenes which disrupt cepacian capsule functions and pyochelin biosynthesis. Only J2315 has a pseudogene at BCAL3517 (T2SS) with a 110 bp deletion. O antigen is also interrupted in J2315 and similarly in BCC0016 (which is an ET12 strain), all others are uninterrupted or with no product. BCAL3223 is also interrupted (differently from the K56-2 strain).