Thank you Nic, James and Salvador Pueyo for your comments about the influence of the prior distribution on the estimate of ECS.

James and Salvador both mention that what is commonly referred to as an “objective” prior isn’t really objective in the common usage of the word. That is corroborated by the fact that Salvador and Nic come to different conclusions based on what both regard as being an objective prior.

In the exchange between Nic and Salvador (in public comments) both seem to disagree about what’s the most appropriate non-uniform (aka “objective”) prior to use. Salvador claims that what Nic uses is not a truly non-uniform prior but rather the reference prior and is not the optimal choice to make. Salvador uses Jaynes’ non-uniform prior; Nic on the other hand claims that this is not appropriate.

According to Nic, the choice of prior “depends what is measured”. That is criticized by Salvador: If the prior depends on the experiment, it’s not strictly speaking a prior, but rather a reference distribution, which, in the absence of strong constraints by data (as is the case for ECS) causes a meaningless posterior distribution. **Question to Nic: Could you reply to this specific criticism?** Nic claims that the Jaynes prior is not suited to the “continuous parameter case”. **Question to Salvador: Could you reply to this specific criticism?**

Both Nic and Salvador come to their choice of prior to prevent the problem that’s common with uniform priors (which both Nic, Salvador and James have criticized, as e.g. quoted in Ch 10 of AR5), namely that the shape of the uniform prior distribution is very different in ECS versus in 1/ECS (the so called sensitivity parameter, lambda) and that a uniform prior in ECS could lead to an overestimate of ECS (though Salvador argues that in practice the uniform prior is only assumed for a certain range of ECS, so not all uniform priors necessarily result in overestimations). **Question to John: In light of these criticisms, is the use of uniform priors suitable for the estimation of ECS?**

James claims that what Nic uses as a prior can cause erroneous results: In the example as explained in http://julesandjames.blogspot.fi/2014/04/objective-probability-or-automatic.html (a reply to Nic’s post at http://climateaudit.org/2014/04/17/radiocarbon-calibration-and-bayesian-inference/) the posterior pdf shows zero probability density at locations where the data show substantial likelihood but the prior pdf is zero, i.e. the prior pdf prevents the data from being properly reflected in the posterior pdf (“[it] automatically assign[s] zero probability to the truth”). **Question to Nic: Could you reply to this specific criticism?
**

Primary scientific results are normally stated in terms of a best estimate and an uncertainty range, with any PDF underlying the best estimate and uncertainty range being secondary. When a frequentist statistical method is used – as it is in a large proportion of cases – the uncertainty range is usually designed to be a confidence interval whose boundaries the true value of the (fixed but uncertain) parameter involved would fall below in the specified proportions of cases (e.g., 5% and 95%) upon repeating the experiment many times using random drawings from the data etc. uncertainty distributions involved. The method may or may not accurately achieve that aim (known as probability matching), but in most cases there is little disagreement of its desirability. When the IPCC AR5 scientific report states that it is 95% certain that more than half the global warming over 1951– 2010 was anthropogenic in origin, that is based on a frequentist confidence interval, not derived from a Bayesian PDF. If most scientists were told that a archaeological artefact had been shown by radiocarbon dating to be at least 3000 years old with 95% certainty, I think they also would expect the statement to reflect a confidence interval bound, with at least approximately probability matching properties, not a subjective Bayesian PDF.

As I showed in my radiocarbon dating blog post, the use in the case considered of the noninformative Jeffreys’ prior provided uncertainty ranges that in all cases gave *exact* probability matching no matter what percentage boundaries were specified or from what probability distribution and within what range the sample being dated was picked, unlike whatever method James prefers. Not my idea of failure!

James considers that a near zero probability density over 1200-1300 AD – a calendar period over which the radiocarbon age hardly changes, so that the data is very uninformative about the calendar age – is unrealistic. I suggest that view can only come from prior knowledge of the probability characteristics of calendar ages of samples. The method I was criticising was put forward in a paper that explicitly assumed that no prior knowledge about the probability characteristics of calendar ages of samples existed. But even if some such knowledge does exist, it does not follow that incorporating such knowledge into calendar age estimation (by multiplying an estimated PDF reflecting it, used as an informative prior, by the data likelihood function in an application of Bayes’ theorem) will improve results, even if the PDFs look more believable. As my Climate Audit post showed, doing so and then drawing samples from a segment of the known true calendar age probability distribution often produced estimated uncertainty ranges with probability matching characteristics that were not just worse than when using Jeffreys’ prior (inevitably, as that gave perfect matching), but substantially worse. It should be noted that although the Jeffreys’ prior will assign low PDF values in a range where likelihood is substantial but the data variable is insensitive to the parameter value, the uncertainty ranges the resulting PDF gives rise to will normally include that range.

It is important to understand the meaning of the very low (not zero) value of the prior, and hence of the posterior PDF, over 1200-1300 AD, or over any other period where the radiocarbon age, whilst consistent with the data in terms of having a significant likelihood, varies little with calendar age. It simply reflects that over the interval concerned the data is very uninformative about the parameter of interest, because the interval corresponds to a small fraction of the data error distribution. If some non-radiocarbon data that is sensitive to calendar ages between 1200 and 1300 AD is obtained, then the noninformative prior for inference from the combined data would cease to be low in that region, and the posterior PDF would become substantial in the calendar region consistent with the new data, resulting in a much tighter uncertainty range.

James’ statement that “It is clear that, despite many decades of trying, no-one has come up with a universal automatic method that actually generates sensible probabilities in all applications.” is true. But it masks the fact that in very many cases – probably the vast bulk of practical parameter inference problems – Berger and Bernardo’s reference prior approach does, in many peoples’ view, do so. In the one-dimensional case, Jeffreys’ prior is the reference prior.

James refers to probabilities produced using an objective Bayesian approach as having some “intuitively appealing mathematical properties”. I will single one of these out as a property that the vast bulk of physicists would support. Jeffreys’ prior, and some of the more sophisticated priors that remain noninformative for marginal inference about one parameter out of many in circumstances when Jeffreys’ prior does not do so, are invariant under one-to-one transformations of data and parameter variables. That means, for instance, that if a PDF is estimated for the reciprocal of climate sensitivity, 1/ECS, rather than for ECS itself, and the resulting posterior PDF for 1/ECS is then converted into a PDF for ECS by using the standard transformation-of-variables formula, the PDF thus obtained will be identical to that resulting from estimating ECS directly (from the same data). The construction of the noninformative prior (which will differ greatly in shape between the two cases, when both expressed in terms of ECS) guarantees that this invariance property obtains. A subjective Bayesian approach does not respect it, at least when data variables are transformed.

James’ claims that there is nothing in “Nic’s approach that provides for any testing of the method, i.e. to identify in which cases it might give useful results, and when it fails abysmally.” I beg to differ. I think most statisticians (and scientists) would regard the accuracy of probability matching as a very useful – and widely used – way of identifying when a statistical method gives useful results. There is a large literature on probability-matching priors, and the performance of noninformative priors is often judged by their probability-matching (Kass and Wassermann, 1996). Indeed, Berger and Bernardo (1992) refer to the commonly used safeguard of frequentist evaluation of the performance of noninformative priors in repeated use, as being historically the most effective approach to discriminating among possible noninformative priors.

References

Kass RE, Wasserman L (1996): The Selection of Prior Distributions by Formal Rules. J Amer Stat Ass 91 435:1343-1370

Berger J O and J. M. Bernardo, 1992: On the development of reference priors (with discussion). In: Bernardo J. M., Berger J. O., Dawid A. P., Smith A. F. M. (eds) Bayesian Statistics 4. Oxford University Press, pp 35–60

Whereas the Ferrel cell sits poleward of the Hadley cell, as a much smaller doughnut sitting on the ground at 30N-60N side by side with the larger doughnut at 0N-30N, the Stratocell is also at 0N-30N but as a very slightly bigger doughnut in the stratosphere encircling the Hadley cell (i.e. above it from the point of view of an observer on the ground at 15N looking up vertically).

Just as the Hadley cell drives the Ferrel cell like one gearwheel driving another touching it, so does it also drive the Stratocell. The Stratocell’s bottom just above the tropopause is driven poleward accompanying (and driven by) the poleward flow of the Hadley cell’s top.

As the top of the Hadley cell approaches 30N it finds territory getting scarce (decreasing perimeter of the increasing latitudes), so to keep Navier and Stokes happy it dives down and flows back to the equator.

The bottom of the Stratocell encounters the same problem but it can’t solve it by diving down the way the Hadley cell does because the Hadley cell is selfish: it needs every bit of room it can get at 30N, in fact the pressure there should be getting larger on that account. So instead the Stratocell solves its space crisis by shooting up where there is no opposition, then over the top and back to the equator.

So now we have one Hadley cell driving two neighbors like touching gearwheels, one beside it, the Ferrel cell, and one sitting on top, the Stratocell. (Actually 6 touching gearwheels altogether when you include the SH, or 8 when the polar cells are counted.)

For the duration of the Stratosphere’s ride where it is in contact with the Hadley cell it continually picks up heat from the top of the Hadley cell. At 30N this heated stratospheric air then rises bringing the heat with it, though losing temperature due to lapse rate. On the way back, with no further heat input, it loses heat. By the time it dives down to the equator it has become a refreshing breeze cooling what theory would otherwise have predicted would be the tropical hot spot.

Since this mechanism seems pretty obvious I assume it was considered and discarded decades ago on account of some fatal flaw, such as evidence against any significant poleward flow up there. Nevertheless I’d be interested in seeing the literature where this mechanism is discussed. And if there isn’t any it would be interesting to know what’s wrong with this theory.

]]>wiljan, the concept of back pressure exists any time there is resistance to a flow from high pressure to low pressure, whether the pressure be radiation pressure, air pressure, voltage, whatever. It is the high pressure end that experiences the back pressure, reducing the flow by reducing the pressure gradient at that end. The notion of back pressure entails no contradiction to the relevant laws, whether applied to the flow of photons, air molecules, electrons, cars driving into a bottleneck, or people walking into a store.

]]>So if I may I would like to challenge it here.

On the face of it, it seems quite reasonable to assume that CO2 will be compounding annually at a CAGR of 1% by 2050. Taking that as a lumped value applicable to the century as a whole, this would make estimation of TCR invaluable for forecasting global mean surface temperature in 2100.

But what is the basis for estimates of TCR? Nic rightly focuses on Box 12.2 of AR5, which is where the current report examines this question most closely, along with estimating both equilibrium climate sensitivity and effective climate sensitivity defined as varying inversely with the climate feedback parameter.

I had a very hard time following how the behavior of 20th C global surface temperature could be used to estimate any of those three measures of climate response to CO2. Problem 1 is that CO2 was rising last year at only 0.5%, at 0.25% in 1960, and even less before then. Problem 2 is that CO2 has risen only 43% since the onset of industrial CO2. And Problem 3 is that ocean delay, long recognized as a source of uncertainty, may be an even bigger source of uncertainty than assumed in interpreting historical climate data.

I do not mean to imply that these are inconsequential numbers, quite the contrary in fact, but rather that they invalidate overly naïve extrapolation from the previous century to this one.

A pathologically extreme example of how badly things can go when you neglect changing CAGR of CO2 can be seen in the 2011 paper of Loehle and Scafetta on “Climate Change Attribution”. They analyze climate as a sum of two cycles, a linear “natural warming” trend, and a steeper linear anthropogenic trend. Setting aside the cycles, the trends purport to model rising temperature before and after 1942, rising (in their Model 2) at respectively 0.016 C and 0.082 C per decade, obtained by linear regression against the respective halves of HadCRUT3.

The following argument justifies their attribution of pre-1942 warming to natural causes.

“A key to the analysis is the assumption that anthropogenic forcings become dominant only during the second half of the 20th century with a net forcing of about 1.6 W/m2 since 1950 (e.g., Hegerl et al. [23]; Thompson et al. [24]). This assumption is based on figure 1A in Hansen et al. [25] which shows that before 1970 the effective positive forcing due to a natural plus anthropogenic increase of greenhouse gases is mostly compensated by the aerosol indirect and tropospheric cooling effects. Before about 1950 (although we estimate a more precise date) the climate effect of elevated greenhouse gases was no doubt small (IPCC [2]).”

For reasons I will give below it is not clear to me that the influence of pre-1942 CO2 was so minor, but set that aside for the moment. Their justification for their linear model of post-1942 warming is as follows.

“Note that given a roughly exponential rate of CO2 increase (Loehle [31]) and a logarithmic saturation effect of GHG concentration on forcing, a quasi-linear climatic effect of rising GHG could be expected.”

The relevant passage from [31] is,

“An important question relative to climate change forecasts is the future trajectory of CO2. The Intergovernmental Panel on Climate Change (IPCC, 2007) has used scenarios for extrapolating CO2 levels, with low and high scenarios by 2100 of 730 and 1020 ppmv (or 1051 ppmv from certain earlier scenarios: Govindan et al., 2002), and a central “best estimate” of 836 ppmv. Saying that growth increases at a constant percent per year, which is often how the IPCC discusses CO2 increases and how certain scenarios for GCMs are generated (see Govindan et al., 2002), is equivalent to assuming an exponential model.”

In effect L&S have based their model of 20th century climate on TCR.

So how bad can this get? Well, ln(1 + x) is close to x for x much less than 1, but becomes ln(x) for x much larger than 1. The knee of the transition is at x = 1. Taking preindustrial CO2 to be 1, today we have a CO2 level for which x = (400 – 280)/280 = 0.43, and with business as usual should reach 1 (double preindustrial) around 2050.

So for the 19th and much of the 20th century ln(1 + x) can be taken to be essentially x. Since x is the product of population and per-capita energy consumption we can assume with Hofmann, Butler and Tans, 2009, that up to now anthropogenic CO2 and hence forcing has been growing exponentially. (Actually the CDIAC data show that the CAGR of CO2 emissions for much of the 19th century held steady at 15%, declining to its modern-day value of around 4-6%, but the impact of anthropogenic CO2 was so small in the 19th C that approximating it with modern-day CAGR of CO2 emissions may not make an appreciable difference. When I spoke to Pieter Tans in 2012 about extrapolating their formula to 2100 he thought a lower estimate might be more appropriate, which is consistent with the declining CAGR of emissions between 1850 and now, but estimating peak coal/oil/NG is far from easy, a big uncertainty.)

It follows that CO2 forcing to date has been growing essentially exponentially, not linearly, but that it will gradually switch to linear (or even sublinear) during the present century. Hence extrapolating 20th century global warming to the 21st century and beyond cannot be done on the basis of either a linear or logarithmic response to growing CO2, but must respect the fact that over the current century forcing will be making the transition from one to the other.

The sharp transition at 1942 in Loehle and Scafetta’s model is in this light better understood as the flattening out (as you go from 1950 to 1930) of an exponential curve. Even if aerosol forcing happened to approximately cancel the left half of the exponential, it would be preferable to put an estimate of the aerosol contribution independently of the CO2 forcing. Moreover if the feedbacks are capable of doubling or tripling the no-feedback response then this would entail aerosol forcing driving CO2, raising the possibility of estimating aerosols around 1900 by comparing the difference between the Law Dome estimates of CO2 with the CDIAC’s estimates of CO2 emissions, provided the difference is sufficiently significant.

There is also the matter of any delay in the impact of radiative forcing on surface temperature while the oceans take their time responding to the former (Hansen et al 1985). If forcing grows as exp(t) with time t, any delay d means that temperature actually grows as exp(t – d) = exp(t)exp(-d), introducing a constant factor of exp(-d) into observation-based estimates of climate response. In particular if exp(-d) = ½, as it might well, then failure to take this delay into account will result in underestimating the prevailing climate response by a factor of two. This on its own would entirely account for misreading a sensitivity of 3.6 as 1.8. That’s a huge contribution to uncertainty. If furthermore the delay varies with time (as it may well given the complexities of ocean heat transport) then so does the factor exp(-d), making the uncertainty itself a factor of time. One might hope that d varied if at all very slowly with time, and preferably monotonically, say linearly to a first approximation.

For such reasons I feel that if climate projections are to be based on climate observations, a third notion of climate response is needed, one that differs from TCR along the above lines, taking into account both the manner in which CO2 grows and the extent to which the ocean delays the response of global mean surface temperature (in degrees) to forcing (in W/m2).

]]>Nic and Salvador have both discussed so-called “objective” approaches to Bayesian probability. It is important to clearly understand what this means, and its limitations. These “objective” probabilities do not represent some truth about the state of reality. They are merely an (at best) automatic way of converting uncertain information into a probability distribution which has some intuitively appealing mathematical properties. Intuition can be misleading, however, and despite these properties, there is no guarantee that the results will be useful, sensible, or even remotely plausible.

Conveniently, Nic provides a good example of a catastrophic failure of his approach in the example that he explains in some detail on this climate audit blog post. The topic in that case is carbon dating, but the point is a general one. In his example, his “objective” algorithm returns a probability distribution that assigns essentially zero probability to the interval 1200-1300 AD. That is, it asserts with great confidence that the object being dated does not date from that interval even in the case that the object does in fact date from that interval, and despite the observation indicating high likelihood (in the Bayesian sense) over that interval. That is, this result is entirely due to the so-called “objective” prior (“automatic” might be less susceptible to misinterpretation) irrespective of the data obtained.

Now, Nic asserts that any real physicist will agree with his method. If he can show me a scientist from any field who is happy to assert that a false statement concerning physical reality is true, then I’ll show him a poor scientist.

It is clear that, despite many decades of trying, no-one has come up with a universal automatic method that actually generates sensible probabilities in all applications. Moreover, there is nothing in Nic’s approach that provides for any testing of the method, i.e. to identify in which cases it might give useful results, and when it fails abysmally. Indeed, Nic appears to still think that his method presented in the climateaudit post is appropriate, despite it automatically assigning zero probability to the truth in the case that the item under study actually does date from the interval 1200-1300 AD. But I would hope that most readers – and most scientists aiming to understand reality – would agree that assigning zero probability to true events is not a good way to start, irrespective of the appealing mathematical properties of the method used to perform the calculations. Therefore, there seems little purpose is served by debating over which particular mathematical properties are most ideal in abstract situations. The purpose of scientific research is to understand the world as it really is, and the methods can only be evaluated in terms of how they might help or hinder in that endeavour.

]]>Nic refers to Bernardo and Smith’s authority to support the methods that he uses to obtain the “non-informative prior” for each dataset. However, Bernardo was careful enough to coin a new expression for what he (and now, Nic) was using: “reference prior”. Even though there is some confusion between both concepts in the statistical literature, they are quite different. The most important difference does not lie in how you calculate each of these “priors”, but in the meaning that you give to them. In the context of climate sensitivity, we might be able to progress more quickly in our discussions if, in his papers and posts, Nic says that he has been using the “reference prior” and that I sought (or that I found, but he does not seem to agree with this) the “non-informative prior”.

A non-informative prior distribution “sensu stricto” plays the original role of any prior distribution in Bayesian theory: it intends to tell how likely different options are (e.g. different values of climate sensitivity) without considering some given data (in the “non-informative” case, without considering any data at all). When you introduce the data, the prior probability distribution is updated and gives rise to the posterior distribution.

The reference distribution does not tell you the same. The reference distribution is a function that you can use in the place of the prior distribution “sensu stricto” when you cannot decide the later. It is intended just as a convention, as something that everybody is supposed to use when they don’t know what to use, so that everybody’s results are comparable (and, since the reference prior has several good statistical properties, you avoid some types of “accident”). This is a practical option when the posterior distribution is strongly constrained by the data. However, this is not the case of climate sensitivity. In the case of sensitivity, small differences in the prior can have a visible impact on the posterior. Since the reference prior cannot be given the strict meaning of a prior probability distribution, what you obtain by updating it cannot either be given the meaning of a posterior probability distribution. In fact, it is meaningless.

That the reference prior is not, strictly speaking, a prior probability distribution, is apparent from the fact that, as Nic emphasizes, it depends on the experiment. The probability that climate sensitivity is large cannot depend on some experiment that I am planning to do to measure it. Otherwise, climate policy would be much easier: rather than reducing emissions, just plan the right experiment to be carried out in a distant future: once you have it in mind, it should be unlikely that global warming will be severe. Well, at least this is what we would think if we interpreted the reference prior as a prior probability distribution “sensu stricto”, but this is not the right interpretation.

The confusion between reference prior and non-informative prior causes two serious problems. We have already seen one: that the final result (the posterior distribution) is given an unwarranted meaning. The second problem is that, as reference priors are different for different experiments, by using them you cannot combine different types of data. This is especially unfortunate in our case, because, without combining different data types (as Annan and Hargreaves 2006 began to do), it will be difficult for the data to constrain the posterior distribution enough to forget our discussions about the prior of choice (also, we will be more vulnerable to possible biases inherent to specific types of data).

In Pueyo (2012) I had already given an alternative: seek the actual non-informative prior based on Jaynes’ logic, and enrich it with well-justified pieces of prior information. Nic says that Jaynes’ approach “failed save in certain cases”, but I don’t know how he decides that it “failed”. However, even if we accepted that neither Jaynes’ nor any other method allow us to determine a true non-informative prior, there would still be something that we could do: to go ahead by putting together increasing amounts and heterogeneity of data up to the point in which the posterior is robust enough to our choice of prior. However, we cannot do this in the framework of reference priors.

Taking all of this into account, I invite Nic to rethink his current approach and his conclusion that climate sensitivity should be so low, and to consider exploring these other approaches.

]]>Salvador writes in his second comic: “One of the main differences is that my method follows Edwin T. Jaynes’ criterion (Jaynes is best known for having introduced the maximum entropy principle), while Lewis (like Jewson et al.) follows Harold Jeffreys’ criterion.”

I would certainly follow Jeffrey’s criterion (setting the prior equal to the square root of determinant of the Fisher information matrix) in the simple 0ne-dimensional case considered in Pueyo (2012), where climate sensitivity is the only parameter being estimated. I think it is quite well established that doing so is appropriate when inference about S is to be made purely on the basis of the data being analysed, without assuming any prior knowledge about it. The authoritative textbook Bayesian Theory Bernardo and Smith (1994/2000), in summarising the quest for noninformative priors, states baldly that:

“In one-dimensional continuous regular problems, Jeffreys’ prior is appropriate”

Jeffreys’ prior has the very desirable property (for a physicist or anyone else seeking objective estimation, if not for a Subjective Bayesian) that if the data variables and/or the parameters undergo some smooth monotonic transformation(s) (e.g., by replacing a data variable by its square), the Jeffreys’ prior will change in such a way that the inferred posterior PDF for the (original) parameter remains as it was before the transformation.

I am a fan of Jaynes, but his maximum entropy principle was developed for the finite case. Unfortunately, Jaynes’ attempts to extend it to the continuous parameter case failed save in certain cases (notably where a transformation group exists).

]]>Most experts in climate sensitivity think that the “non-informative prior distribution” of climate sensitivity is what Nic Lewis uses, and that it results into a low climate sensitivity. I do not agree. Some time before Nic published his paper, I also published a paper on the non-informative prior distribution of climate sensitivity (Pueyo, S. 2012. Climatic Change 113: 163-179), and my conclusions were very different:

http://www.springerlink.com/content/3p8486p83141k7m8/

Unfortunately, estimates of climate sensitivity are very sensitive to methodological choices. When adopting a given methodology, climatologists are implicitly positioning themselves about issues in which there is no unanimity among the own experts in probability theory. This means that, if we want our estimates to be realistic, we have a difficult challenge ahead, which we cannot address in the usual ways, e.g. by increasing computing power. However, I hope the climatological community ends up addressing this challenge fully, and does it as soon as possible. To help climatologists bypass some hard texts, I once wrote a comic version of my paper on non-informative priors, featuring a dialogue between two aliens named Koku and Toku.

Also, some time ago, motivated by a conversation with Dr. Forest, I “transcribed” another dialogue between Koku and Toku, which sheds light on the difference between Nic’s and my own view of non-informative priors (I strongly recommend reading the comic above before reading this second dialogue; the comic is short):

There is one thing in which Nic, myself and many others agree: in that the uniform prior vastly overestimates climate sensitivity S. However, this does not mean that many estimates in the literature should be overestimates. The overestimation resulting from this prior is so obvious that, in practice, the uniform is assumed only between S=0 and some Smax, and a zero probability is assumed above Smax, with no explicit criterion to choose Smax (discussed in Annan & Hargreaves 2011, Climatic Change 104:423–436). With this correction, it is not so obvious that this method should overestimate sensitivity, but it is obvious that it is inappropriate. The conclusion of my paper was that the non-informative prior of climate sensitivity is proportional to 1/S. In contrast, Nic sustains that the non-informative prior depends on the dataset but that it will often be roughly proportional to 1/S^2 (see his comment 1048). My prior, S^(-1), is midway between the uniform S^0 and Nic’s S^(-2). If using my prior results into a probability distribution f(S), Nic’s will often give a distribution f’(S) proportional to f(S)/S. My conclusions are that Nic’s is not the correct non-informative prior and that, at least for some datasets, it results into a vast underestimation of climate sensitivity.

Let me add that, in fact, my proposal in Pueyo (2012) was not a direct use of 1/S. I proposed a middle way between the non-informative prior (proportional to 1/S) and subjective priors. My proposal was to start from the non-informative prior, and, then, to introduce explicit and well-justifed modifications (e.g. based on physics) before feeding the data. I hope someone tries this.

]]>Let me start by making two general points.

First, in general, standard frequentist statistical methods such as ordinary least squares (OLS) regression can be interpreted from a Bayesian viewpoint and, when doing so, involve use of a prior that is implicit in the method and data error distributions. That prior is necessarily objective – it emerges from the statistical model involved, not from any subjective choice by the investigator. For instance, when OLS regression is used and Gaussian data error distributions are assumed, uncertainty in the regressor (x) variable being negligible relative to that in the regressee (y) variable, the implicit prior for the regression coefficient (slope) is uniform. That prior is completely noninformative in this case.

Secondly, in many studies climate sensitivity is not the only unknown parameter being estimated. In such cases, where a Bayesian approach is used, a joint likelihood function is derived and multiplied by a joint prior distribution to give a joint estimated posterior PDF, from which marginal PDFs for each parameter of interest are obtained by integrating out the other parameters. The joint prior that gives rise to a marginal PDF for climate sensitivity (or another parameter of interest) that properly reflects the information provided by the data will not necessarily be the product of individual priors for each parameter – it may well be a non-separable function of all the parameters.

James is correct to say that use of Jeffreys’ prior can give rise to substantial problems, although I am satisfied that it has not done so in the cases where I have used it for estimating climate sensitivity. Problems generally do not arise unless there are multiple parameters and marginal posterior parameter PDFs are required, not just a joint PDF for all parameters. It is well known that Jeffreys’ prior often needs modifying when a parameter’s uncertainty is being estimated as well as its central value. An example is simultaneous estimation from a sample of the underlying population mean and standard deviation. But in most studies uncertainty is not estimated simultaneously with climate sensitivity, and this problem tends not to arise. When Jeffreys’ prior is not suitable, the so-called “reference prior” method, developed by Bernardo and Berger, often provides a satisfactory noninformative prior.

An expert prior is a particular type of informative prior – one might say it is an intentionally informative prior that is derived from subjective opinions rather than only from data. Investigators often use uniform priors for climate sensitivity (and other parameters). Uniform priors are typically informative, biasing estimation towards higher sensitivity values and greatly increasing the apparent probability of sensitivity being very high, relative to what the data values and data error assumptions implied. But I do not imagine that reflects a genuine prior belief on the investigators part that sensitivity is high and an intention to reflect that belief in the prior. Rather, I think in reflects ignorance about Bayesian inference and, in some cases, inappropriate advice in the widely-cited Frame et al (2005) paper to use a uniform prior in the parameter that was the target of the estimate, which advice was adopted in AR4 in relation to climate sensitivity.

There are two problems with using expert priors, even assuming that genuine prior information exists as to parameter values and that it is desired to reflect that information rather than (as is usual in scientific studies) for the results given to reflect only the data obtained and used in the experiment involved.

The first problem is that where the data only weakly constrains the parameter, as is the case for climate sensitivity, the results will be strongly influenced, and may even be dominated, by the expert prior used. That appears to be the case for several of the climate sensitivity estimates presented in AR5: Tomassini et al (2007), Olson et al (2012) and Libardoni and Forest (2011/13).

The second problem is more subtle: the posterior PDF resulting from use of an expert prior may not correctly reflect the combined information embodied in that prior and the data used in the study. That is because, if the expert prior distribution is thought of as arising from multiplying a data-likelihood function by a prior that is noninformative for inference from the statistical model involved, that prior is unlikely also to be noninformative for inference from the product of that notional likelihood function and the likelihood function for the study’s actual data.

I would therefore not recommend using any sort of informative prior, expert or otherwise, for climate sensitivity when estimating that parameter. A noninformative (joint) prior should always be used IMO; Jeffreys’ prior is a good one to start with and is likely to be satisfactory for the purpose.

It may well be appropriate to use data-based informative prior, and sometimes expert priors, for parameters that are not of interest and/or that the study does not constrain well. Indeed, in some studies many variables that would often be treated as uncertain data (e.g., the strengths of various forcings) are estimated as unknown parameters, using priors that reflect the uncertainty distributions of current estimates for those variables.

By and large the same considerations apply to paleoclimate as to instrumental period studies. However, as paleo studies generally involving higher uncertainty the importance of using a noninformative prior is greater. If climate sensitivity is the only parameter being estimated in a paleo study and, as with instrumental period warming based studies, fractional (%) uncertainty in forcing changes dominates that in temperature changes, a uniform prior in the climate feedback parameter, the reciprocal of climate sensitivity, will generally be noninformative for estimating that parameter. It follows mathematically that a prior of the form 1/Sensitivity^2 will be noninformative for estimating climate sensitivity.

Whatever prior is used, I recommend comparing the resulting best estimate (the median should be used) and uncertainty ranges with those derived from using a frequentist profile likelihood method. The signed root likelihood ratio (SRLR) method is simplest to apply. Although the confidence intervals the SRLR method gives are generally only approximate and may well be a bit narrow, they provide an excellent check on whether the credible intervals derived from a Bayesian marginal posterior PDF are realistic. And the median estimate from that PDF should, if it realistic, be very close to the maximum of the profile likelihood.

]]>