Imagine you are a research organization that works with data files in some specialized format. A genetics lab working with GenBank .GBK or snapgene .DNA sequence files would be a good example. Now imagine your software engineers have written a custom app designed to perform some calculation or processing task on your data files, with the result or summary output to a new file.
Let's further imagine, that as a data manager, you need to have a good record of when any analyses where performed, who performed the analysis and precisely where the results of the analysis are stored. As it happens, this is a workflow that CERF ELN is very well pre-adapted to perform.
In most cases, users typically use CERF in conjunction with the default, industry standard applications on their local computer. An MS Word file, for example, may automatically open in MS Word, whilst your .DNA files may open in, say, snapgene. This workflow illustrates one of the unique advantages of a combined ELN and document management system that uses a desktop application to process your files. CERF carefully logs the interaction between the user and the files stored on the CERF server and displays all activity in the secure audit trail, so that managers are aware of current and past activity and access. In some cases, it may be advantageous to work with highly specialized applications that you've written yourself, designed specifically for performing specialized tasks on data that you stored in CERF. With CERF ELN, users can specify local applications on their computer that they would like to use to check out and edit specific file types. This allows users to optionally checkout files from CERF and open them in a non-default local application.
Lab-Ally has been working with bioinformatics students at the University of Maryland to create a toolbox of small accessory applications that can be used for processing various data files stored in CERF. Each academic term, as part of a capstone bioinformatics class, small groups of students (supervised by Lab-Ally) design, build and test an application of their choice. The application is designed to solve some common bioinformatics problem. An example is described below.
One team of 4 students recently built a GenBank extractor to make parsing genomic data easier and by utilizing this program you get a simplified output from the GenBank files that is readily compatible with CERF and the CERF search feature. The application can be used as a standalone tool or can be used integrated with CERF ELN to allow for superior record keeping, better efficiency and improved organization-wide collaboration. This parser is designed to extract essential information from GenBank files and output a readable .rtf file.
What Does the GenBank Parser Do?
The parser extracts important data from GenBank files, such as:
- Accession
- Organism (Genus species)
- Taxon data
- Gene(s)
- Genetic Sequence
It then organizes this data into an .rtf file, which is easy to read and compatible with most
platforms. Below is an example of what you will find in the output:
How to Use the Parser as a Standalone Application
Install the Application:
- Run the installation file on your computer
- Navigate to the executable file of the program: go to C:\Program Files\GenBankParser and double-click on GenBankParser.exe.
- A window should pop up with an "Open File" button.
- If the window doesn’t show up properly, try resizing the window. Some users have experienced this issue, and resizing the window can often solve it.
- After clicking the Open File button, choose a .gb or .gbff GenBank file from your system.
- The application will process the file and save an .rtf file to your desktop.
How to Use the Parser with CERF
- If you’ve installed the parser, the next step is to configure CERF so it can summon files from the CERF server on demand and utilize the parser tool. Without this step, CERF would simply open GenBank files in whatever the default sequence editing application is on the user's local machine.
- In CERF, navigate to Tools > Options > Applications.
- Add the GenBankParser by pointing to the .exe in C:\Program Files\GenBankParser.
- Set the MIME type to chemical/x-genbank. This helps CERF to understand what types of files you would like to open with the specified applicaiton
This is how Tools > Options > Applications should look once it's set up:
Viewing GenBank Files with CERF:
- Locate any .gb or .gbff file in CERF’s collections.
- Right-click the file, select View-in, and choose GenBankParser from the list.
- The parser will open, allowing you to process the file
- The application will process the file and save an .rtf file containing the results of the parser analysis to a specified local location.
- The file can then be dragged from the desktop into CERF, and specifically onto the associated file to have it pasted as a relation. This has the advantage that once added to CERF, the .rtf file is immediately indexed for searching so that users with the correct access permissions can search for target text that is located in the .rtf, and once they find THAT file, they can also locate the parent file containing the original raw sequence data.