Bart has over-interpreted my position. I don’t exactly reject paleo-estimates of ECS. Rather, I agree with AR5′s caveats, and broadly accept their 1–6°C range, although I would be inclined to treat it as a 17–83% likely range rather than a 10–90% range.

However, AR5 gives little indication of the shape of the uncertainty distribution involved in paleo estimates. My view is that for paleo estimates the fractional uncertainty as to forcing changes and as to the relationship of climate feedbacks in different climate states (which AR5 does highlight) is likely in reality to be considerably greater than fractional uncertainty as to temperature changes. Assuming so, the overall PDF for ECS from paleoclimate studies should have a rather similar skew to that derived from instrumental period warming based studies, implying a median estimate far below the midpoint of the 1–6°C (or whatever) range.

That being so, even if the paleoclimate studies provide a completely independent overall estimate of ECS to that from warming over the instrumental period, the paleo estimate should not greatly affect the overall median and likely range derived from warming over the instrumental period. I’ve done some calculations based on the 1.7°C median estimate and 1.2–3°C likely range I put forward, which corresponds to uncertainty in the climate feedback parameter (the scaled reciprocal of ECS) having a normal distribution. If the overall 1–6°C paleo estimate shares that characteristic, then incorporating it would do little other than narrow my 1.2–3°C likely range at both ends. This is perhaps an extreme case, but it does illustrate my point.

]]>I would like to discuss now another important line of evidence: paleo climate. Below I try to summarize the arguments that have been brought up in the guest blogs and in the discussion so far.

James argues that when averaged over a sufficiently long period of time, the earth must be in radiative balance or else it would warm or cool massively. This enables us to use paleoclimatic evidence to estimate ECS. Non-linearities in the temperature response complicate the comparison of paleo-climate to the current changes in climate, but James argues that nevertheless paleoclimate evidence can offer useful constraints to ECS, due to the relatively large changes in temperature and forcing. The evidence rules out both very high and very low sensitivities and provides a figure around the IPCC range which could be used as a prior for Bayesian analyses.

Nic basically rejects paleo climatic approaches based on what is written in the last sentence of paragraph 10.8.2.4 in AR5 where it is concluded that paleo studies support a wide 10–90% range for ECS of 1.0–6°C. Nic points to the fact that in general, AR5 states that the uncertainties in paleo-studies are underestimated because of 1) the difficulty in estimating changes in forcing and temperature and 2) past climate states are very different, that is, may differ from the ECS measuring the climate feed-backs of the Earth system today and therefore widening the uncertainty range (i.e. flattening the PDF) seems reasonable. Nic thinks the uncertainties are simply too great to support the narrower 2–4.5°C range mentioned by James.

John argues that paleo studies benefit from the large climate signals that can occur over millennia and that the paleo record provides a vital perspective for evaluating the slowest climate feedbacks. He emphasizes that sensitivity to nonlinearities, major uncertainty in proxy records (Rohling 2012), data problems, and uncertainty in forcing undermine any strong constraint on ECS and it is unclear whether progress on these fronts presents an immediate opportunity for reducing uncertainty in ECS in the near future.

A general question for all would be to discuss the pros and cons of paleo-estimates of ECS, in light of the arguments brought forward by the others.

Specifically:

James: Could you respond to the issues raised by Nic and to indicate why you think the uncertainties aren’t too great to support the 2 – 4.5 °C range.

John: What range of ECS estimates do you think can be derived from paleo-studies

Nic: AR5 is full of caveats (incl e.g. about Bayesian priors), so why should the caveated language about paleo-estimates of ECS be translated into them being rejected?

]]>I not only made the distinction between reference priors and non-informative priors, but went beyond and stated that “the reference prior cannot be given the strict meaning of a prior probability distribution ”. Nic writes “I think Salvador arguing that such a prior is not actually a prior distribution in the strict sense of representing a genuine probability distribution for the parameter(s). I wouldn’t disagree. But so what? ”. His interpretation is correct, but let me answer to the “so what?”. Bayes theorem establishes a mathematical relationship between probabilities. Therefore, if you pretend to apply Bayes theorem but your input is not a probability distribution, then you cannot use Bayes theorem to state that your output is a probability distribution. You are free to equate this output to a probability distribution, but this results from an extra step: a subjective decision. It is not an objective result.

Then, what is the usefulness of the “reference posterior” that you obtain using a “reference prior”? I had already stated two points: as a conventional way to express the information in your sample, and as a way to avoid some technical problems that some other priors pose in some cases. Bernardo mentions these uses, but actually, he emphasizes another one: it “is just a part – an important part, I believe – of a healthy sensitivity analysis to the prior choice” (Bernardo et al. 1997, p. 163). He means that the result of applying a reference prior is useful because it can be compared with the result of applying your subjective prior of choice, to check if the posterior distribution is sensitive to the prior.

We are having this lively discussion because of the consequences that different choices of prior may have for decisions in climate policy, including Nic’s choice, i.e. a reference prior. If the usefulness of reference priors is limited to the points that I described above, what are the implications of taking the resulting posterior distribution at face value for policy decisions? Bernardo (1979b, p. 140) was admirably honest: “it would certainly be foolish to use it in lieu of the personal posterior which describes the decision-maker opinions”. Of course, this assumes that we can reach well-founded opinions in other ways, probably with a sound expert prior, which is no less problematic in the case of climate sensitivity. So, which other alternatives do we have?

I mentioned two alternatives. For those who, unlike Bernardo and many others, think that non-informative priors do exist, the alternative is clear: using them (corrected with pieces of well-founded knowledge, Pueyo 2012). This is my opinion and I posit that such priors can be found by applying Jaynes’ logic. The problem that Nic sees in this option is that Jaynes admitted being only “able to resolve the measure problem in special cases, in particular when a transformation group existed.” In the case of climate sensitivity, we can consider the transformation group whose elements are changes in measurement units. These changes do not have to affect a non-informative prior. This leads to the result in Pueyo (2012).

The second alternative is accepting a posterior distribution only when it proves mostly insensitive to the prior (so we do not need to decide which prior is the correct one). Probably, we will need to combine different types of data to reach this point. Such combinations are forbidden when using reference priors, but are perfectly correct when assuming that we have a prior probability distribution sensu stricto, either non-informative or informative, whether or not we specify it.

References

Bernardo, J.M. 1979a. Reference posterior distribuitons for Bayesian inference. Journal of the Royal Statistical Society B 41: 113-128.

Bernardo, J.M. 1979b. Author’s reply. Journal of the Royal Statistical Society B 41: 139-147.

Bernardo, J.M., Irony, T.Z. & Singpurwalla, N.D. 1997. Non-informative priors do not exist. A dialogue with José M. Bernardo. Journal of Statistical Planning and Inference 65: 159-177.

Ghosh, J.K. 1997. Non-informative priors do not exist – discussion of a discussion. Journal of Statistical Planning and Inference 65: 180-181.

Pueyo, S. 2012. Solution to the paradox of climate sensitivity. Climatic Change 113: 163-179

]]>Bart states that the fact that Salvador Pueyo and I come to different conclusions based on what we both regard as being an objective prior provides corroborative evidence that an objective prior is not objective in the common use of that word. Although in most cases a completely noninformative prior may not exist, so that conclusions will depend to at least a modest extent on a subjective choice of prior, the main reason that Salvador and I come to different conclusions is that we have completely different views on what makes a prior uninformative, resulting in us selecting different priors. One of us must be wrong! (Of course, even where there is a unique fully noninformative prior, parameter estimation involves other subjective choices.)

Bart asks me to respond to Salvador’s criticism that if the prior depends on the experiment, it’s not strictly speaking a prior, but rather a reference distribution, which, in the absence of strong constraints by data (as is the case for ECS) causes a meaningless posterior distribution.

Where a prior is intended to be noninformative, not reflecting any prior knowledge of the parameter(s) being estimated, then it should depend on the experiment and nothing else. It is best regarded as a mathematical tool or weighting function, designed to produce a posterior PDF that reflects the data not the prior. Such a prior has no direct probabilistic interpretation: it should not be regarded as a probability density. A prior that is noninformative in this sense may or may not be a “reference prior” in the Berger and Bernardo’s sense. I think Salvador arguing that such a prior is not actually a prior distribution in the strict sense of representing a genuine probability distribution for the parameter(s). I wouldn’t disagree. But so what? It doesn’t follow that in the absence of strong constraints by data that means the resulting posterior distribution is meaningless. Quite the contrary. And the distinction that Salvador is making between what he calls respectively “reference priors” and “noninformative priors” makes no sense to me.

The whole point about a noninformative prior is that it is constructed so that only weak constraints by the data are required in order for the resulting posterior PDF for the parameter(s) to be dominated by (correctly-reflected) information from the data rather than information from the prior. Indeed, Berger and Bernardo show that reference priors have a minimal influence on inference, in the sense of maximising the missing information about the parameters.

Salvador’s arguments about a uniform distribution being noninformative for positions, and relating to symmetries, may be valid when parameters only have a finite number of possible values, but they fail in the continuous case because the relevant invariant measure is unspecified. Jaynes recognised this point (Section 12.3 of Probability Theory: The Logic of Science), and was only able to resolve the measure problem in special cases, in particular when a transformation group existed.

Bart also asks me to respond to James’ criticism that (in a radiocarbon dating example) the posterior PDF shows zero probability density at locations where the data show substantial likelihood but the prior pdf is zero. My 23 July comment already deals with most of what James said. When Jeffreys’ prior (the original noninformative prior) is used, the prior, and hence the posterior, is very low (not zero) in regions where the data are very uninformative about – change little with – the parameter(s). If no existing knowledge about the parameters is to be incorporated, the resulting PDF is correct, however odd it may look.

Suppose the data measures a variable (here radiocarbon age of an artefact), with known-variance Gaussian-distributed random errors. In the absence of prior knowledge about the artefact’s radiocarbon age or calendar age, use of a uniform prior for inferring the radiocarbon age of the artefact is both natural and noninformative. Use of a uniform prior results in a Gaussian posterior PDF, credible intervals from which exactly match frequentist confidence intervals for the same measurement. What’s not to like about that? But if one accepts that posterior PDF for radiocarbon age, one necessarily accepts the sort of odd-shaped posterior for the artefact’s calendar age that James rejects, since it follows from applying the standard transformation of variables formula.

James prefers what he views as realistic-looking posterior PDFs for parameters, even if the uncertainty ranges they produce disagree substantially with relative frequencies in the long run – and the posterior PDFs they imply for radiocarbon ages are most unrealistic looking.

I on the other hand prefer – , certainly for reporting scientific results – posterior PDFs that produce uncertainty ranges which are at least approximately valid in frequentist coverage (probability matching) terms upon (hypothetically) repeated independent measurements, so that they represent approximate confidence intervals. (Exact matching of confidence intervals is not generally possible using a Bayesian approach.) Of course, if there is genuine prior knowledge about the distribution of an artefact’s calendar age, then the position is different. But I don’t think a wide uniform distribution would convey such knowledge in any case.

]]>I am grateful that Bart clearly separates my approach from Nic’s. Quite confusingly, they are often lumped together, e.g. in the recent post by James (perhaps because he posted it while my previous post, in which I emphasize the differences, was awaiting approval). As I said in my previous post, Nic’s would be more properly called “reference prior” than “non-informative prior”, while I did seek a “non-informative prior”. From my point of view, the concepts of “non-informative prior” and “reference prior” differ as much from each other as each of them differs from the concept of “expert prior”.

James and Nic have engaged in a discussion on whether or not it is a “catastrophic failure” that, in a given example (unrelated to climate sensitivity), Nic’s method assigns an almost null probability to the interval in which the only measurement lies, and larger probabilities to neighboring intervals. Behaviors like this occur because “reference priors” like Nic’s can be very complex, which makes clear that they are not “non-informative”. In fact, they contain a lot of information, but it is information on the measurement technique, which is completely unrelated to the original concept of “prior probability” (i.e. the probability to be assigned to the property to be measured before the measurement takes place).

James uses this example to criticize so-called “objective priors” in general, an expression that encompasses both reference priors and non-informative priors, and that he suggests to replace by “automatic priors”. I think my comment above makes clear that his point is valid only for reference priors, and cannot be extrapolated to non-informative priors. Furthermore, I will argue that, in some sense, non-informative priors are indeed “objective”, but are not “automatic” (unlike reference priors).

I will introduce my argument with the help of a thought experiment. Consider a set of objects whose positions have been decided by algorithms that are completely different and unrelated to each other. My intuition tells me that the frequency distribution of the coordinates of these objects will be uniform. Besides intuition, the rational argument to expect a uniform distribution is that it is the only one that preserves the symmetries of the problem. Therefore, the uniform should be the non-informative probability distribution in this case (according to Jaynes’ invariant groups logic), and it is so “objective” that it results into an observable frequency distribution.

I have suggested that there are several frequency distributions in nature that are almost non-informative because they result from putting together values with completely different, “idiosyncratic” origins (see a synthesis of previous papers in http://arxiv.org/abs/1312.3583). Often, the variable of interest is not a position, so the symmetries to be conserved are not the same, and give rise to a distribution that is not uniform. To predict this objective distribution, one needs to identify the symmetries of the problem, which is an objective but not automatic process.

We do not have a sample of “idiosyncratic” climate sensitivities allowing us to observe a frequency distribution, but we can still treat the non-informative distribution similarly to a frequency distribution, as we would treat the result of tossing a coin only once. As soon as we have this prior distribution (log-uniform, according to my results) and some measurements defining a likelihood function, Bayes theorem is very clear in that we have to combine the first with the second to know how likely different values of sensitivity are. The way measurements are taken affects only the likelihood function, not the prior probability distribution.

Nic’s technique does not deal with actual non-informative priors. Whether or not one agrees with my claim that non-informative priors do exist, one has to concede that using a different type of prior (reference prior) as if it were non-informative can cause serious trouble unless the amount of data makes the result quite insensitive to the prior, which is rarely the case with climate sensitivity. As I said, underestimation of climate sensitivity is especially likely when using Nic’s method.

Bart asked me to reply Nic’s claim that Jaynes’ prior is not suited to the “continuous parameter case”. In my previous post, I stated: “Nic says that Jaynes’ approach ‘failed save in certain cases’, but I don’t know how he decides that it ‘failed’”. Unless he clarifies this, I cannot fully answer his criticism. However, I have already given some reasons to think that Jaynes’ logic is valid.

]]>Thank you Nic, James and Salvador Pueyo for your comments about the influence of the prior distribution on the estimate of ECS.

James and Salvador both mention that what is commonly referred to as an “objective” prior isn’t really objective in the common usage of the word. That is corroborated by the fact that Salvador and Nic come to different conclusions based on what both regard as being an objective prior.

In the exchange between Nic and Salvador (in public comments) both seem to disagree about what’s the most appropriate non-uniform (aka “objective”) prior to use. Salvador claims that what Nic uses is not a truly non-uniform prior but rather the reference prior and is not the optimal choice to make. Salvador uses Jaynes’ non-uniform prior; Nic on the other hand claims that this is not appropriate.

According to Nic, the choice of prior “depends what is measured”. That is criticized by Salvador: If the prior depends on the experiment, it’s not strictly speaking a prior, but rather a reference distribution, which, in the absence of strong constraints by data (as is the case for ECS) causes a meaningless posterior distribution. **Question to Nic: Could you reply to this specific criticism?** Nic claims that the Jaynes prior is not suited to the “continuous parameter case”. **Question to Salvador: Could you reply to this specific criticism?**

Both Nic and Salvador come to their choice of prior to prevent the problem that’s common with uniform priors (which both Nic, Salvador and James have criticized, as e.g. quoted in Ch 10 of AR5), namely that the shape of the uniform prior distribution is very different in ECS versus in 1/ECS (the so called sensitivity parameter, lambda) and that a uniform prior in ECS could lead to an overestimate of ECS (though Salvador argues that in practice the uniform prior is only assumed for a certain range of ECS, so not all uniform priors necessarily result in overestimations). **Question to John: In light of these criticisms, is the use of uniform priors suitable for the estimation of ECS?**

James claims that what Nic uses as a prior can cause erroneous results: In the example as explained in http://julesandjames.blogspot.fi/2014/04/objective-probability-or-automatic.html (a reply to Nic’s post at http://climateaudit.org/2014/04/17/radiocarbon-calibration-and-bayesian-inference/) the posterior pdf shows zero probability density at locations where the data show substantial likelihood but the prior pdf is zero, i.e. the prior pdf prevents the data from being properly reflected in the posterior pdf (“[it] automatically assign[s] zero probability to the truth”). **Question to Nic: Could you reply to this specific criticism?
**

Primary scientific results are normally stated in terms of a best estimate and an uncertainty range, with any PDF underlying the best estimate and uncertainty range being secondary. When a frequentist statistical method is used – as it is in a large proportion of cases – the uncertainty range is usually designed to be a confidence interval whose boundaries the true value of the (fixed but uncertain) parameter involved would fall below in the specified proportions of cases (e.g., 5% and 95%) upon repeating the experiment many times using random drawings from the data etc. uncertainty distributions involved. The method may or may not accurately achieve that aim (known as probability matching), but in most cases there is little disagreement of its desirability. When the IPCC AR5 scientific report states that it is 95% certain that more than half the global warming over 1951– 2010 was anthropogenic in origin, that is based on a frequentist confidence interval, not derived from a Bayesian PDF. If most scientists were told that a archaeological artefact had been shown by radiocarbon dating to be at least 3000 years old with 95% certainty, I think they also would expect the statement to reflect a confidence interval bound, with at least approximately probability matching properties, not a subjective Bayesian PDF.

As I showed in my radiocarbon dating blog post, the use in the case considered of the noninformative Jeffreys’ prior provided uncertainty ranges that in all cases gave *exact* probability matching no matter what percentage boundaries were specified or from what probability distribution and within what range the sample being dated was picked, unlike whatever method James prefers. Not my idea of failure!

James considers that a near zero probability density over 1200-1300 AD – a calendar period over which the radiocarbon age hardly changes, so that the data is very uninformative about the calendar age – is unrealistic. I suggest that view can only come from prior knowledge of the probability characteristics of calendar ages of samples. The method I was criticising was put forward in a paper that explicitly assumed that no prior knowledge about the probability characteristics of calendar ages of samples existed. But even if some such knowledge does exist, it does not follow that incorporating such knowledge into calendar age estimation (by multiplying an estimated PDF reflecting it, used as an informative prior, by the data likelihood function in an application of Bayes’ theorem) will improve results, even if the PDFs look more believable. As my Climate Audit post showed, doing so and then drawing samples from a segment of the known true calendar age probability distribution often produced estimated uncertainty ranges with probability matching characteristics that were not just worse than when using Jeffreys’ prior (inevitably, as that gave perfect matching), but substantially worse. It should be noted that although the Jeffreys’ prior will assign low PDF values in a range where likelihood is substantial but the data variable is insensitive to the parameter value, the uncertainty ranges the resulting PDF gives rise to will normally include that range.

It is important to understand the meaning of the very low (not zero) value of the prior, and hence of the posterior PDF, over 1200-1300 AD, or over any other period where the radiocarbon age, whilst consistent with the data in terms of having a significant likelihood, varies little with calendar age. It simply reflects that over the interval concerned the data is very uninformative about the parameter of interest, because the interval corresponds to a small fraction of the data error distribution. If some non-radiocarbon data that is sensitive to calendar ages between 1200 and 1300 AD is obtained, then the noninformative prior for inference from the combined data would cease to be low in that region, and the posterior PDF would become substantial in the calendar region consistent with the new data, resulting in a much tighter uncertainty range.

James’ statement that “It is clear that, despite many decades of trying, no-one has come up with a universal automatic method that actually generates sensible probabilities in all applications.” is true. But it masks the fact that in very many cases – probably the vast bulk of practical parameter inference problems – Berger and Bernardo’s reference prior approach does, in many peoples’ view, do so. In the one-dimensional case, Jeffreys’ prior is the reference prior.

James refers to probabilities produced using an objective Bayesian approach as having some “intuitively appealing mathematical properties”. I will single one of these out as a property that the vast bulk of physicists would support. Jeffreys’ prior, and some of the more sophisticated priors that remain noninformative for marginal inference about one parameter out of many in circumstances when Jeffreys’ prior does not do so, are invariant under one-to-one transformations of data and parameter variables. That means, for instance, that if a PDF is estimated for the reciprocal of climate sensitivity, 1/ECS, rather than for ECS itself, and the resulting posterior PDF for 1/ECS is then converted into a PDF for ECS by using the standard transformation-of-variables formula, the PDF thus obtained will be identical to that resulting from estimating ECS directly (from the same data). The construction of the noninformative prior (which will differ greatly in shape between the two cases, when both expressed in terms of ECS) guarantees that this invariance property obtains. A subjective Bayesian approach does not respect it, at least when data variables are transformed.

James’ claims that there is nothing in “Nic’s approach that provides for any testing of the method, i.e. to identify in which cases it might give useful results, and when it fails abysmally.” I beg to differ. I think most statisticians (and scientists) would regard the accuracy of probability matching as a very useful – and widely used – way of identifying when a statistical method gives useful results. There is a large literature on probability-matching priors, and the performance of noninformative priors is often judged by their probability-matching (Kass and Wassermann, 1996). Indeed, Berger and Bernardo (1992) refer to the commonly used safeguard of frequentist evaluation of the performance of noninformative priors in repeated use, as being historically the most effective approach to discriminating among possible noninformative priors.

References

Kass RE, Wasserman L (1996): The Selection of Prior Distributions by Formal Rules. J Amer Stat Ass 91 435:1343-1370

Berger J O and J. M. Bernardo, 1992: On the development of reference priors (with discussion). In: Bernardo J. M., Berger J. O., Dawid A. P., Smith A. F. M. (eds) Bayesian Statistics 4. Oxford University Press, pp 35–60

Whereas the Ferrel cell sits poleward of the Hadley cell, as a much smaller doughnut sitting on the ground at 30N-60N side by side with the larger doughnut at 0N-30N, the Stratocell is also at 0N-30N but as a very slightly bigger doughnut in the stratosphere encircling the Hadley cell (i.e. above it from the point of view of an observer on the ground at 15N looking up vertically).

Just as the Hadley cell drives the Ferrel cell like one gearwheel driving another touching it, so does it also drive the Stratocell. The Stratocell’s bottom just above the tropopause is driven poleward accompanying (and driven by) the poleward flow of the Hadley cell’s top.

As the top of the Hadley cell approaches 30N it finds territory getting scarce (decreasing perimeter of the increasing latitudes), so to keep Navier and Stokes happy it dives down and flows back to the equator.

The bottom of the Stratocell encounters the same problem but it can’t solve it by diving down the way the Hadley cell does because the Hadley cell is selfish: it needs every bit of room it can get at 30N, in fact the pressure there should be getting larger on that account. So instead the Stratocell solves its space crisis by shooting up where there is no opposition, then over the top and back to the equator.

So now we have one Hadley cell driving two neighbors like touching gearwheels, one beside it, the Ferrel cell, and one sitting on top, the Stratocell. (Actually 6 touching gearwheels altogether when you include the SH, or 8 when the polar cells are counted.)

For the duration of the Stratosphere’s ride where it is in contact with the Hadley cell it continually picks up heat from the top of the Hadley cell. At 30N this heated stratospheric air then rises bringing the heat with it, though losing temperature due to lapse rate. On the way back, with no further heat input, it loses heat. By the time it dives down to the equator it has become a refreshing breeze cooling what theory would otherwise have predicted would be the tropical hot spot.

Since this mechanism seems pretty obvious I assume it was considered and discarded decades ago on account of some fatal flaw, such as evidence against any significant poleward flow up there. Nevertheless I’d be interested in seeing the literature where this mechanism is discussed. And if there isn’t any it would be interesting to know what’s wrong with this theory.

]]>wiljan, the concept of back pressure exists any time there is resistance to a flow from high pressure to low pressure, whether the pressure be radiation pressure, air pressure, voltage, whatever. It is the high pressure end that experiences the back pressure, reducing the flow by reducing the pressure gradient at that end. The notion of back pressure entails no contradiction to the relevant laws, whether applied to the flow of photons, air molecules, electrons, cars driving into a bottleneck, or people walking into a store.

]]>So if I may I would like to challenge it here.

On the face of it, it seems quite reasonable to assume that CO2 will be compounding annually at a CAGR of 1% by 2050. Taking that as a lumped value applicable to the century as a whole, this would make estimation of TCR invaluable for forecasting global mean surface temperature in 2100.

But what is the basis for estimates of TCR? Nic rightly focuses on Box 12.2 of AR5, which is where the current report examines this question most closely, along with estimating both equilibrium climate sensitivity and effective climate sensitivity defined as varying inversely with the climate feedback parameter.

I had a very hard time following how the behavior of 20th C global surface temperature could be used to estimate any of those three measures of climate response to CO2. Problem 1 is that CO2 was rising last year at only 0.5%, at 0.25% in 1960, and even less before then. Problem 2 is that CO2 has risen only 43% since the onset of industrial CO2. And Problem 3 is that ocean delay, long recognized as a source of uncertainty, may be an even bigger source of uncertainty than assumed in interpreting historical climate data.

I do not mean to imply that these are inconsequential numbers, quite the contrary in fact, but rather that they invalidate overly naïve extrapolation from the previous century to this one.

A pathologically extreme example of how badly things can go when you neglect changing CAGR of CO2 can be seen in the 2011 paper of Loehle and Scafetta on “Climate Change Attribution”. They analyze climate as a sum of two cycles, a linear “natural warming” trend, and a steeper linear anthropogenic trend. Setting aside the cycles, the trends purport to model rising temperature before and after 1942, rising (in their Model 2) at respectively 0.016 C and 0.082 C per decade, obtained by linear regression against the respective halves of HadCRUT3.

The following argument justifies their attribution of pre-1942 warming to natural causes.

“A key to the analysis is the assumption that anthropogenic forcings become dominant only during the second half of the 20th century with a net forcing of about 1.6 W/m2 since 1950 (e.g., Hegerl et al. [23]; Thompson et al. [24]). This assumption is based on figure 1A in Hansen et al. [25] which shows that before 1970 the effective positive forcing due to a natural plus anthropogenic increase of greenhouse gases is mostly compensated by the aerosol indirect and tropospheric cooling effects. Before about 1950 (although we estimate a more precise date) the climate effect of elevated greenhouse gases was no doubt small (IPCC [2]).”

For reasons I will give below it is not clear to me that the influence of pre-1942 CO2 was so minor, but set that aside for the moment. Their justification for their linear model of post-1942 warming is as follows.

“Note that given a roughly exponential rate of CO2 increase (Loehle [31]) and a logarithmic saturation effect of GHG concentration on forcing, a quasi-linear climatic effect of rising GHG could be expected.”

The relevant passage from [31] is,

“An important question relative to climate change forecasts is the future trajectory of CO2. The Intergovernmental Panel on Climate Change (IPCC, 2007) has used scenarios for extrapolating CO2 levels, with low and high scenarios by 2100 of 730 and 1020 ppmv (or 1051 ppmv from certain earlier scenarios: Govindan et al., 2002), and a central “best estimate” of 836 ppmv. Saying that growth increases at a constant percent per year, which is often how the IPCC discusses CO2 increases and how certain scenarios for GCMs are generated (see Govindan et al., 2002), is equivalent to assuming an exponential model.”

In effect L&S have based their model of 20th century climate on TCR.

So how bad can this get? Well, ln(1 + x) is close to x for x much less than 1, but becomes ln(x) for x much larger than 1. The knee of the transition is at x = 1. Taking preindustrial CO2 to be 1, today we have a CO2 level for which x = (400 – 280)/280 = 0.43, and with business as usual should reach 1 (double preindustrial) around 2050.

So for the 19th and much of the 20th century ln(1 + x) can be taken to be essentially x. Since x is the product of population and per-capita energy consumption we can assume with Hofmann, Butler and Tans, 2009, that up to now anthropogenic CO2 and hence forcing has been growing exponentially. (Actually the CDIAC data show that the CAGR of CO2 emissions for much of the 19th century held steady at 15%, declining to its modern-day value of around 4-6%, but the impact of anthropogenic CO2 was so small in the 19th C that approximating it with modern-day CAGR of CO2 emissions may not make an appreciable difference. When I spoke to Pieter Tans in 2012 about extrapolating their formula to 2100 he thought a lower estimate might be more appropriate, which is consistent with the declining CAGR of emissions between 1850 and now, but estimating peak coal/oil/NG is far from easy, a big uncertainty.)

It follows that CO2 forcing to date has been growing essentially exponentially, not linearly, but that it will gradually switch to linear (or even sublinear) during the present century. Hence extrapolating 20th century global warming to the 21st century and beyond cannot be done on the basis of either a linear or logarithmic response to growing CO2, but must respect the fact that over the current century forcing will be making the transition from one to the other.

The sharp transition at 1942 in Loehle and Scafetta’s model is in this light better understood as the flattening out (as you go from 1950 to 1930) of an exponential curve. Even if aerosol forcing happened to approximately cancel the left half of the exponential, it would be preferable to put an estimate of the aerosol contribution independently of the CO2 forcing. Moreover if the feedbacks are capable of doubling or tripling the no-feedback response then this would entail aerosol forcing driving CO2, raising the possibility of estimating aerosols around 1900 by comparing the difference between the Law Dome estimates of CO2 with the CDIAC’s estimates of CO2 emissions, provided the difference is sufficiently significant.

There is also the matter of any delay in the impact of radiative forcing on surface temperature while the oceans take their time responding to the former (Hansen et al 1985). If forcing grows as exp(t) with time t, any delay d means that temperature actually grows as exp(t – d) = exp(t)exp(-d), introducing a constant factor of exp(-d) into observation-based estimates of climate response. In particular if exp(-d) = ½, as it might well, then failure to take this delay into account will result in underestimating the prevailing climate response by a factor of two. This on its own would entirely account for misreading a sensitivity of 3.6 as 1.8. That’s a huge contribution to uncertainty. If furthermore the delay varies with time (as it may well given the complexities of ocean heat transport) then so does the factor exp(-d), making the uncertainty itself a factor of time. One might hope that d varied if at all very slowly with time, and preferably monotonically, say linearly to a first approximation.

For such reasons I feel that if climate projections are to be based on climate observations, a third notion of climate response is needed, one that differs from TCR along the above lines, taking into account both the manner in which CO2 grows and the extent to which the ocean delays the response of global mean surface temperature (in degrees) to forcing (in W/m2).

]]>