Arranging a Beta Distribution into Exponential Family Form

November 20th, 2016

A family of PMFs or PDFs is an exponential family if it can be arranged into the form

$f(x|\theta) = h(x)c(\theta)\exp\left(\sum_{i=1}^kw_i(\theta)t_i(x)\right)$

where $\theta$ is the vector of parameters.

The Beta distribution PDF takes the form

$f_X(x) = \frac{x^{\alpha – 1}(1 – x)^{\beta – 1}}{\textrm{B}(\alpha, \beta)}$

Given that the $\alpha$ and $\beta$ parameters are unknown, we now arrange $f_X(x)$ into an exponential family form:

$f(x|\alpha,\beta) = \frac{x^{\alpha – 1}(1 – x)^{\beta – 1}}{\textrm{B}(\alpha, \beta)}$
$= \frac{e^{(\alpha – 1)\ln(x)}e^{(\beta – 1)\ln(1 – x)}}{\textrm{B}(\alpha, \beta)}$
$= \frac{e^{(\alpha – 1)\ln(x) + (\beta – 1)\ln(1 – x)}}{\textrm{B}(\alpha, \beta)}$
$= \frac{1}{\textrm{B}(\alpha, \beta)}e^{(\alpha – 1)\ln(x) + (\beta – 1)\ln(1 – x)}$

The log identity $x^b = e^{b\ln(x)}$ is a very useful logarithmic identity to remember when trying to arrange PDFs into exponential family form.

We observe:
$h(x) = I_{x \in (0,1)}(x)$ (If you see that h(x) = 1, that is a cue to use an indicator function that ranges through the support of $x$.)
$c(k,\beta) = \frac{1}{\textrm{B}(\alpha, \beta)$
$w_1(k,\beta) = \alpha – 1$
$w_2(k,\beta) = \beta – 1$
$t_1(x) = \ln(x)$
$t_2(x) = \ln(1-x)$

Hence, the Beta distribution given unknown parameters $\alpha$ and $\beta$ is an exponential family with a two-dimensional parameter vector $\theta$.

A similar process will apply for showing that a Beta PDF with one unknown parameter, $\beta$ or $\alpha$ is an exponential family.

Regarding Identity Politics And Their Guaranteed Persistence

November 20th, 2016

I absolutely hate this talk of “we must end identity politics” that has become pervasive after the election, especially as people are trying to prescribe Democrats solutions for how to recover from this election. First off, how are people defining “identity politics”? It appears that a lot of people are using it as a blanket term for political organization and mobilization by gender, race, or ethnic group. Identity politics exist and will not go away because your geographic origins, gender, phenotypes, genotypes determine your lot in life, the hand you’ve been dealt, et cetera (insert favored idiom here). Your predetermined traits (like those of an RPG character class) come with a set of pros and cons that relate to your probabilities of success in life as defined by societal and cultural values. People are motivated to maintaining or improving their probabilities, so it is easy to motivate people to advocate for the blocs they belong to. Yes, identity politics make it easy to hate on another group of people as The Others, but it has also mobilized mass sectors of the American population to advocate for egalitarian goals, such as women’s suffrage, desegregation, and elimination of sodomy laws, which allowed members of one American bloc to live significantly more comfortably at little or no cost (and perhaps benefit) to opposing blocs.

As an alternative, people are promoting an emphasis on “class politics.” But how are identity politics so different from class politics? You are still slicing a population into sub-groups, and then trying to get them to oppose the interests of another group. Yes, having no hive-mind sub-group jousting would be ideal, but it is a platonic ideal that is impossible given human brain heuristics. Identity politics will survive the 2016 election and persist as humans continue to exist in their current form.

Arranging a Gamma Distribution Family into Exponential Family Form

November 20th, 2016

A family of PMFs or PDFs is an exponential family if it can be arranged into the form

$f(x|\theta) = h(x)c(\theta)\exp\left(\sum_{i=1}^kw_i(\theta)t_i(x)\right)$

where $\theta$ is the vector of parameters.

Now, the shape-scale parameterization of the Gamma distribution PDF takes the form

$f_X(x) = \frac{1}{\Gamma(k)\beta(k)}x^{k-1}e^{-\frac{x}{\beta}}$

$\beta$ is typically used as the symbol for the rate parameter in the shape-rate parameterization of the Gamma PDF, but the default symbol for the scale parameter, $\theta$, would conflict with the symbol for our parameter vector.

Can we arrange that PDF into an exponential family form? Spoiler: yes.

Here, we demonstrate that a Gamma PDF given two unknown parameters, $\beta$ and $k$, is an exponential family.

$f(x|k,\beta) = \frac{1}{\Gamma(k)\beta(k)}x^{k-1}e^{-\frac{x}{\beta}}$
$ = \frac{1}{\Gamma(k)\beta(k)}e^{(k-1)\ln(x)}e^{-\frac{x}{\beta}}$
$ = \frac{1}{\Gamma(k)\beta(k)}e^{(k-1)\ln(x) – \frac{x}{\beta}}$

The log identity $x^b = e^{b\ln(x)}$ is a very useful logarithmic identity to remember when trying to arrange PDFs into exponential family form.

We observe:
$h(x) = I_{x>0}(x)$ (If you see that h(x) = 1, that is a cue to use an indicator function that ranges through the support of $x$.)
$c(k,\beta) = \frac{1}{\Gamma(k)\beta(k)}$
$w_1(k,\beta) = k – 1$
$w_2(k,\beta) = -\frac{1}{\beta}$
$t_1(x) = \ln(x)$
$t_2(x) = x$

Hence, the Gamma distribution given unknown parameters $\beta$ and $k$ is an exponential family with a two-dimensional parameter vector $\theta$.

A similar process will apply for showing that a Gamma PDF with one unknown parameter, $\beta$ or $k$ is an exponential family.

Thoughts on Lal 2008, “Carbon Sequestration.”

August 1st, 2016

From Phil. Trans. R. Soc. B

Carbon sequestration is defined as the transfer of carbon from the atmospheric carbon pool to other carbon pools. Including the atmospheric pool, there are five carbon pools, with the largest being the oceanic pool at an estimated 38,000 Pg C. The pedologic pool is the third largest at 2500 Pg and further subdivides into the soil organic carbon (SOC) and soil inorganic carbon (SIC) pools.

There are abiotic techniques for carbon sequestration comprised of engineering methods and chemical processes. Many of them consist of injecting carbon into non-atmospheric pools. There are also biotic techniques that rely on organisms, primarily plants and microbiota, for removing CO2 from the atmosphere. Theoretically, abiotic techniques can store more carbon, but there are questions about the safety and reliability of those techniques. The risk of carbon leakage and the effects of leakage on ecosystems is still uncertain. Additionally, the expensive cost of geo-engineering is a limitation. By comparison, biotic techniques are more cost-effective and less risky, while providing accompanying benefits such as improved soil and water quality and ecosystem preservation absent from abiotic methods. Biotic approaches do have a smaller cumulative carbon sink capacity than abiotic approaches.

Biotic techniques can be subdivided into oceanic sequestration and terrestrial sequestration methods. In terms of terrestrial methods, afforestation in the U.S alone can sink up to 117 Tg C per year in the U.S alone (IPCC 1999). The cost of afforestation is the drain on water resources, which can make the practice prohibitive in drought-stricken regions, like California. The family of techniques focusing on SOC and SIC sequestration can also cumulatively sink a significant amount of carbon. Land use conversion and restoration of degraded soils can increase overall microbiota concentrations and diversity in soils. Restoration of degraded soils and habitats in the tropics can potentially sequester an additional 1.1 Pg C per year (Grainger 1995). What constitutes “degraded lands … with potential for afforestation and soil quality enhancement” is something I am not clear on, as I have not read the Grainger paper. Moving away from mono to multi-cultures for agricultural crops can mitigate SOC losses and improve the ability of agricultural-use land to sequester carbon.

A lingering question — how many acres could be converted from agricultural usages and how much soil could be restored if food waste were more controlled throughout the world?

Thoughts on Li et al. 2014, “Soil carbon sensitivity to temperature and carbon use efficiency compared across microbial-ecosystem models of varying complexity.”

July 9th, 2016

Ha, that is a long auto-generated URL. Given what has happened in the past few days, one has to chuckle at and cherish the little harmless things.

I read Li et al. 2014 from Biogeochemistry. This paper compares the output of several Earth system models including the “conventional model,” German (German et al., 2012), AWB (Allison et al., 2010), and MEND (Wang et al., 2013). The models differ in carbon pool structure and interactions, parameter values, and complexity — German has the fewest parameters and pools, while MEND has the most. The models were simulated under three separate microbial carbon use efficiency (CUE) scenarios. CUE is an important parameter in describing microbial function, and the effect of rising temperatures on the CUE of global microbial populations will be a key determinant of changes to the soil organic carbon (SOC) pool size in the coming century.

The three CUE scenarios tested were:

  1. A constant CUE scenario in which the CUE parameter stayed at 0.31 and did not depend on temperature
  2. A varied CUE scenario in which CUE monotonically decreases with temperature increase
  3. A varied CUE with thermal acclimation

Another key parameter that CUE depends on in all of the models in Li et al. (with the exception of the conventional model) is m, the CUE temperature response coefficient. CUE is given by

CUE(T) = CUE_{ref} + m * (T – T_{ref})

where T is temperature, CUE_{ref} is a set reference CUE value, and T_{ref} is a set reference temperature, in this 298.15 Kelvin.

Models were simulated at initial temperatures until they reached equilibrium and then perturbed with a 5 degree Celsius temperature increase.

Now, I won’t go into too much detail since I need to go to bed at some point, but there are several results in this paper that piqued my interest. For one, for regions initiated at low temperatures, the German, AWB, and MEND models predicted the decrease of SOC pool sizes. Regions initially seeded at higher temperatures saw smaller SOC losses, or even modest gains. This prediction aligns with experimental results predicting SOC losses in Arctic soils (Xue et al., 2016; Natali et al., 2011). Additionally, the observation of damped oscillations matched my own observations in simulations I have run, which makes sense as interactive coupling between SOC and microbial soil (MBC) pools is reminiscent of that observed in predator-prey models, so I was glad to see that confirmed.

Since I really have to go to bed now, I’ll jump straight to questions and future research directions that this paper has evoked. First, instead of a constant 50% thermal acclimation scenario (where m is halved in comparison to the varied CUE scenario), I wonder how changing m to be a function dependent on time (representing adapting mutations) would change things up. Second, this is a question less related to this paper, but with these Earth system biogeochemistry models, the carbon dioxide flux does not feed back into the pools in any way and is entirely separate from the input. How could the atmospheric carbon pool size be fit into these models? As a person new to this sub-field of Earth system biogeochemistry, I’m wondering why atmospheric carbon is not accounted for as an interactive pool in these models.