Jonathan L. Jacobs, Jonathan D. Dinman
Department of Cell Biology & Molecular Genetics
2135 Microbiology Building
University of Maryland
College Park, MD 20742.
This tutorial is intended to assist you in the analysis of bicistronic reporter data and to guide you in the creation of an Excel spreadsheet that performs the necessary calculations. We therefore assume that you have read the original publication.
The following tutorial is directed specifically towards analysis of bicistronic reporter data from the dual luciferase assay (DLA) system (references1 & 2) for the purpose of studying programmed ribosomal frameshifting. It can, however, be easily adapted to any data set that represents any bicistronic reporter assay system.
Our DLA system requires the collection of data from two different plasmids; a “zero-frame” control plasmid and a “frameshift” plasmid which are each incipiently transformed into identical yeast strains. Both plasmids encode tandem Renilla and luciferase firefly genes which are separated by a multiple cloning site (MCS). The only difference between the two plasmids is the reading frame of Fluc relative to Rluc. In the zero-frame control, Fluc and Rluc are in the same reading frame. Translation of these two ORFs produces a fusion protein that has both Rluc and Fluc activity. Our frameshift reporters differ in that Fluc is out of frame relative to Rluc and a frameshift signal (either a +1 PRF or -1 PRF signal) has been cloned into the MCS. This allows us to study both +1 and -1 frameshifting using the same system. What we are interested in is the ratio of firefly to renilla expression in an experimental reporter normalized to the same ratio in the zero frame control reporter. A schematic of this DLA system is outlined below in Figure 1. More detailed information on our DLA system can be found in Reference #2.
|
The main point here is that, once the data has been collected, you have 2 data points from each trial representing the expression levels of each of two genes. In addition, after the data collection is complete, you will also have a two sets of paired data at a minimum: one set of paired data representing the zero-frame control; and a second of paired data representing the experimental reporter (in our case, a frameshift reporter). In the next section we will briefly outline what this data might look like in an Excel spreadsheet and suggest a few good way to organize it.
For the purposes of this tutorial, we will be analyzing EIGHT data sets and, in the end, will have recreated the same data set that was presented in the NAR publication. A summary of the eight data set is presented below.
| Data set | Plasmid | Description |
|---|---|---|
F1
|
pJD376
|
A DLA reporter with the L-A virus -1 PRF frameshift signal. |
F2
|
A DLA reporter with a putative -1 PRF signal cloned from the yeast
gene. |
|
F3
|
A DLA reporter with a putative -1 PRF signal cloned from the yeast
gene. |
|
F4+
|
A DLA reporter with the SARS virus -1 PRF frameshift signal in the
presence of 20 µg of Anisomycin. |
|
F4-
|
A DLA reporter with the SARS virus -1 PRF frameshift signal without
Anisomycin. |
|
C1
|
pJD375
|
Our standard DLA zero-frame control for yeast. |
C2+
|
Our standard DLA zero-frame control for mammalian cell culture in
the presence of 20 µg of Anisomycin. |
|
C2-
|
Our standard DLA zero-frame control for mammalian cell culture without Anisomycin. |
As a final note: the following sections will show screen shots taken from Microsoft Excel 2004 on a Mac running OS X. While the individual screens may look aesthetically different from similar screen shots taken using other versions of Excel on other operating systems, the functions, techniques and charting strategies will be the same.
While you are collecting your paired data sets, it’s important to organize it efficiently from the start. Below is a screen shot from MS Excel 2004 on OS X of Data set F1 collected from pJD376 (see above).

In total, there are 43 data points and each represents a the expression of
firefly and Renilla from an independent cell lysate. The first step is to calculate
the ratio of Fluc to Rluc and then rank order the
entire set from lowest to highest Fluc/Rluc ratio.
Throughout this tutorial, we will present a screen shot of the Excel formula
and the matching formula from the publication any time we present a new calculation.
It’s obvious in this case, but here they are just for consistency.
![]() |
[1] |

This calculation is repeated for each of the eight data sets for every data point.
The very first step in any analysis pipeline is to first identify and remove outliers from your data set. We will again be using Data set F1 as an example, but these steps are done for each of the other seven data sets as well.
Below is a screen shot of F1 data charted as Fluc vs Rluc expression values (e.g. reporter gene 1 vs reporter gene 2). This is done simply using the Excel Chart Wizard, choosing XY (Scatter) as your chart type and using the Fluc and Rluc values as the input data. In the context of the Dual Luciferase Assay system, the relationship between Fluc and Rluc should be linear across 10 orders of magnitude. This highly regular linearity in the data can be exploited to give us a first hand, rough estimate of the quality of our data. Your mileage may vary, however, depending on the details of your assay system in use.
First, chart your data to get a qualitative view of all your data points. Keep in mind, we are looking for linearity.


Your new chart should look something like the following one. The F1 data set is generally very linear, as we would expect. If, however, you were to create the same chart with the F3 Data set, you would immediately recognize that this data set has some problems.


You might be tempted to immediately throw away the data from the F3 set, but in general this is not a good practice because this technique is more of a qualitative measure of our Data set's linearity. There are other quantitative methods we will use later on in this tutorial that will formally exclude F3 as being a valid Data set
Once your data has been charted in this way, you should next quantitatively set the boundaries for excluding outliers in your data set. Outliers can be a major source of skewness in any Data set that is assumed to be normally distributed (an assumption we will test later). Therefore, it's important that we exclude out outliers using a method that is specifically "distribution free" so that we don't introduce possible bias into our analysis early on.
The method we use is commonly referred to as the Quartile or Fourth-Spread method (Devore, 2002). Essentially, you identify the boundaries of each of the quartiles in your data set, measure the fourth-spread (fs, which is the distance between the lower and upper quartiles), and set the upper and lower outlier boundaries as a function of fs. Below are both the formulas (from the publication) and Excel screen shots for doing this. Also, we'll need to create a new sheet in the Excel file ("Summary") for collected statistics. And again, we'll start by working on ratio values (Fluc/ Rluc) from Data set F1.
maximum
|
0.0327
|
|
75th percentile
|
0.0278
|
|
median
|
0.0268
|
|
25th percentile
|
0.0251
|
|
minimum value
|
0.0231
|
| fourth spread (fs) | 0.0027
|
| Upper OB | 0.0309
|
|
| Lower OB | 0.0227
|
A quick look at our Summary sheet revels the following.

At this point you can remove any outliers from the Data set by hand, or if you choose you can have excel automatically point them out to you using a simple formula. To have your outliers spotted and counted by Excel, continue to Step 4.


Finally, once you have done this for all your data sets, your Summary Page should look something like the screen shot below:

On a final note about outliers and charting you data; these two initial steps can be tied together so that on your XY scattergram you can visually see which data points along your linear spread are actually being treated as outliers. In the figures below, I have changed the color of points in the chart to black if they are outliers. Notice how this method pulls out outliers that might have otherwise gone unnoticed (in the case of F1), but does not "fix" the data in F3.


The next step is to simply generate the standard batch of statistical metrics (for the Fluc/Rluc ratios) commonly seen in research papers. The following table summarizes the metrics needed for this tutorial, the formulas used to calculate them, and how you might use excel to find these values. The sample Excel formulas are directed towards the F1 Data set, excluding the three outliers we previously identified. It's important to remember that, once you have designated a data point as an outlier it should not be included in with your statistical analysis.
| Metric | Formula | Expression # | Excel Sample Formula (F1 Data set) |
| mean | ![]() |
[5]
|
|
| variance | ![]() |
[7]
|
|
standard deviation |
![]() |
[8]
|
|
| standard error | ![]() |
[9]
|
|
sample size |
N
|
-
|
Once these formulas are applied to each Data set, your Summary sheet should look something like the follow screen shot The five values for Data set F1 from the table above and their respective formulas are shown highlighted in blue.

By this point you have qualitatively checked your data for linearity, identified and excluded the outliers from each Data set, and generated the standard statistical measurements most often seen in research papers. However, the assumptions of these statistical measures include: 1) that the data is normally distributed (fits a "bell curve"); and 2) that you have done enough replicates for a given degree of statistical confidence. This section deals with verifying the first assumption: that your data is indeed normally distributed.
It should be noted that several methods can be found in the statistical literature that can test for whether or not a particular Data set fits a given distribution. In this tutorial (and in our publication) we suggest the creation of a normal probability plot and to find the probability plot correlation coefficient (PPCC). We have found that this method works well for the Dual Luciferase system in use in our laboratory. However, interested readers are directed to the NIST/SEMATECH e-Handbook of Statistical Methods for a broader discussion of other possible solutions. The original publication was by Filliben, 1975.
Finding the PPCC of a given data set involves rank ordering the data and generating expected z-scores for each data point. A scattergram (XY) plotting the expected z-score on the x-axis against the actual Fluc/Rluc ratio on the y-axis. Linear least squares regression fits the data to a linear trendline and the correlation coefficient between the two values is the PPCC.
Once you have the PPCC in hand, you can check your value against a table of critical values. If your PPCC exceeds the critical value for your sample size and desired level of confidence, then your data can be considered "approximately normal". If it does not, then your data is not normally distributed; i.e. you shouldn't continue with any statistical analysis that assume a normal distribution (t-tests, etc).
Fortunately, with the help of Excel, this process is fairly straight forward. We'll calculate the PPCC of Data set F1 using these steps:

| z-score | ![]() |
[10] |
![]() |
[11] |
As you can see from the above charts, F1 is very linear even when compared to an "idealized" or expected set of values. Furthermore, the values are more heavily distributed around the middle range; much like a bell curve. The F3 data is again shown to be poor quality and it fails it's PPCC test.
So in the end all the data sets in this tutorial pass the test for normal distribution except for F3. The next step is the see if we carried out enough replicates in order to get a good estimate of our sample mean.
It's typical to simply want to do three replicate experiments. Unfortunately, the number of replicates you need to do for any experiment is directly related to 1) the degree of confidence you want to have in your estimate of the sample mean and 2) the degree of variability present in your samples (i.e. the variance). The second point is commonly overlooked (read: ignored) by most molecular biologists because 3 replicates is usually enough, right?
The most common method to estimate the "minimum sample size" for any experimental data set is shown below and can be found in most statistics handbooks. The problem with the formula is that it often underestimates the true minimum number of replicates needed in practical situations. In other words, it is an idealized uncorrected minimum sample size estimator. Kupper & Hafner published a conversion table that will provide a researcher with a "corrected minimum sample size" for a given confidence level and error rate.
So, to get started follow the steps below. I'll be using the F1 and F3 data sets from the publication as examples of "good" and "bad" data sets. Regardless, "three's a charm" does not apply for either of them.
![]() |
[12] |

This concludes this section. If you also complete the previous section "Is my data normally distributed?", then two fundamental assumptions about the data have been tested and we can continue on to calculating the ratiometric data and how to test to see if two Data set are statistically different or not.
In this section we are going to calculate the ratiometric statistics for each of the experimental data sets The term "ratiometric" is jargon that simply means: the ratio of the experimental data relative to some control data. Some labs simply refer this as "fold - change" statistics, % frameshifting, recoding efficiency, etc. Calculating ratiometric means and variances are not, however, as simply as you may expect. Finding the correct estimate for the variance of your ratio, for example, does not use the same formula for calculating variance of primary observational data. This is primarily because of "propagation of error" (PoE), a problem that arise whenever you are combining two sets of data together. PoE arises because the statistics from each Data set are estimates of their true values; e.g. the sample mean and sample variance of the F1 Data set is an estimate of the true population mean and population variance. When you combine two data sets together you end up compounding the problem because you are also combing these estimates together. In a way, making an estimate of an estimate. A better discussion of this topic can be found HERE.
The relative expression of your experimental reporter should be normalized to some control reporter. In the context of our dual luciferase system, this is termed % frameshifting. Below is the formula from the publication, the expression number, and a screen shot of the Excel cell formula.
![]() |
[13] |

For F1, xbar_r is 0.0798 and is simply the ratio 0.0263 / 0.3297.
Finding the relative sample variance, var(xbar_r), requires the use of a specialized formula (see Kendall & Stuart, 1994) that properly accounts for the relative contribution of the variances from each data set (F1 and C1 in this example).
![]() |
[14] |
|
The values
and
are
the sample variance from the F1 and C1 data set respectively (cells C13 & C12).
Similarly, the standard deviation of a combined ratiometric mean is found using the following formula (a derivation of the above formula from reference #4):
![]() |
[15] |
|
This value is commonly used to determine the size of the error bars on charts reporting xbar_r (i.e. % frameshifting). A bit more formally, it is an measure of the accuracy of your estimate of the sample mean.The Summary Page in the Excel spreadsheet uses the following formula (from a personal communication with Dr. Ray Koopman):
![]() |
[16] |
|
Quote from the paper:
| ... Comparing Data sets: The final stage is to determine if two experiments, each with their own respective values of xbar_r and var(xbar_r) are statistically different. The published record of studies utilizing various bicistronic reporters shows a wide variety of methods including fold-change, z-tests, or chi-square tests. For comparisons between data sets, a z-test is appropriate only for larger data sets with at least 40 samples each. Data sets for bicistronic reporter systems are usually not this large. Furthermore, a chi-square test is inappropriate as it requires both large sample sizes and that the data be separated of into discrete categorical values. We instead use the unpaired two-sample t-test (see expressions [17] and [18]) since it is more appropriate for smaller continuous data sets The requirements of this test are that the data must be normally distributed and independent, which are satisfied by the bicistronic assay data sets presented here. The hypothesis tested against states that two data sets (X & Y) come from the same population. A rejected hypothesis therefore affirms that the two data sets are indeed statistically different at some predefined confidence level (e.g. 95%, alpha = 0.05). The p-value obtained from this test is an estimation of the probability of an incorrect conclusion... |
So, an unpaired two-sample t-test is carried out by first determining the degrees of freedom (no guessing!) and then calculating the t statistic. The value of t is then compared to a critical value on a lookup table(this table can be found in nearly all statistics textbooks) for a given confidence interval and Degrees of Freedom. If the t-statistic is greater than the critical value, then your two data sets are statistically different under the assumptions of this test (normally distributed, sufficient samples, etc). This test is precisely the reason why we go to such length is calculating an accurate value for the variance of each of the experiments; var(xbar_r). Without an accurate measure of the variance, the results of this t-test will be incorrect.
In the following example we will be comparing the two F4 data sets with and without the inclusion of the drug Anisomycin (see the paper for more details).
From the sample Excel spreadsheets summary page, v is the value in cell I29:

Is is found, v=18, using the following calculation:
![]() |
[17] |
On the Summary Page, this is t = 8.92
![]() |
[18] |
![]()
The TDIST function in Excel uses numerical computation to give a very good estimation of the correct p-value for your test.
![]()
where x is the t statistic, deg_freedom is v, and tails is 2 (for a two-tailed test). A quote from the Excel help file on this function:
| TDIST Returns the Percentage Points (probability) for the Student t-distribution where a numeric value (x) is a calculated value of t for which the Percentage Points are to be computed. The t-distribution is used in the hypothesis testing of small sample data sets. Use this function in place of a table of critical values for the t-distribution. |
In our sample data set, the final p-value is 5.04E-08; a very significant value that says "Yes, these two Data set are very likely different.".
Thank you.
-- Jonathan Jacobs
-- jacobsjo@umd.edu