The Planetary News Radio – Episode 2: Linguistics and Genetics

Hello. Welcome to The Planetary News Radio, Episode 2 with Bryan White. So I’m outside today. I don’t know what this is going to sound like. So I apologize if there’s any strange car noises in the background. Hopefully there’s nice bird noises in the background instead. Let’s talk a little bit about projects and let’s talk about news. So the date is Sunday, May 26. I’m in Oregon,Corvallis and so I’m taking an opportunity here to be outside while there is currently no rain. So this year was an ENSO event, which I don’t remember if it was El Nino or La Nina. But either way, the rain patterns in this part of the world, which is the Pacific Northwest, have been altered. And so what we’re getting is a longer, warmer, wetter winter, [with] less snow but much more rainfall, and you’ll see that kind of translated across the United States in different ways.

So California, for the first time, [the drought] has ended officially for the first time in seven years, so they got a lot more water [than usual]. But anyways, the point is, here [in Oregon] we are at May 26 and just barely getting in a week here without rain. So I’m happy to be outside talking about science and projects. So again, I mentioned last time I’m developing software to support this science news project. And so one of the interesting things that I’m looking at is the similarity between genetics and linguistics, or really computational linguistics and computational genetics. So why are these things similar? And so this is just an interesting idea to talk about.

In human language, [sentences have] a grammar. A document is a collection of sentences, and sentences have small ideas in them. Paragraphs have, greater ideas. When you’re constructing a paragraph, you’re weaving together a more complex idea. But you could think of a paragraph as a unit and a sentence as a unit, and a word within a sentence as a unit. And then you have the document as a whole. And so you have this multilayered system of grammar [and language] that humans have developed and evolved and is a very rigid structural part of our brains. Language is not something that is abstract completely. It is bound by our physical constraints to process information. And so we have specific areas in the brain to think about language, and that is reflected in the way in our writing systems. So maybe another time we’ll talk more about the biology of language.

But let’s just take what I just said about human language and compare it to genetics. And so, in genetics, you have DNA, which is a sugar molecule that can bond together and form very strong molecular structures that can last for potentially 1,000,000 years. Actually, the half life of DNA is such that the longest living DNA molecule could not be [more than] 1.5 million years old. So DNA is a very stable molecule, and there’s a lot of molecular properties that go into that. The interesting corollary to that is that it would be impossible for us to find dinosaur DNA because dinosaurs went extinct 65,000,000 years ago, and the oldest DNA, under the best preservation conditions, say a woolly mammoth in Siberia or something like that that died in the ice and has been frozen for 1,000,000 years. It’s DNA would degrade after 1.5 million years. And so we’re stuck without dinosaurs, unfortunately, but we have these very stable molecules, so DNA is a molecule, [a base unit in the language of genetics].

Now there’s four letters in this language. So if we think of DNA as a language, there’s an A [adenine] and T [thymine] and G [guanine] and C [cytosine]. Now, I don’t have a map sitting in front of me of the genetic code. I’m just going from memory. So forgive my memory here of the [lack of] base pairing, but DNA is a double helix. You have these four letters paired with each other in such a way that every DNA molecule has a identical copy of the same information paired to itself. And so that’s why you you have this zipper effect, so if you unzip the DNA molecule on one side, you have version A of the information on the other side, you have version B of the information [on the other] sort of like an inverse [copy]. And so you always have the information duplicated in DNA. So there’s a redundancy. If you lose 1/2 the other, the other half can be repaired. And that’s how we get mutations.

When pieces of DNA are damaged and our cellular machinery goes in and repairs it, sometimes it makes mistakes. When DNA copies itself during gametogenesis, it also makes mistakes. And so you have mutations that can happen, say, in a skin cell due to ultraviolet radiation, or you have mutations that can happen when a [gamete] is developing during gametogenesis. So those mutations [(gametic)] [will be] carried on into the next generation, whereas, somatic mutations will cause things like cancer and or be removed. That’s the brief intro to the molecular side of DNA. Now, what about this hierarchy of information that I talked about with human language? So you have DNA as the base molecule. But how is DNA arranged? In most living organisms that we know of, DNA has evolved to be arranged in genetic units, which we call genes, but we could just also call them the “basic genetic unit”. A gene is really the sentence of genetics and a lot of times we in colloquial terminology will talk about [having a] a gene for red hair, a gene for [hair] color, a gene for height or, running ability, or skin color.

But really, a lot of times, there’s multiple genes involved, [particularly in the cases of height or skin color]. And not only that, even when there’s only one gene, one gene [potentially] has many pieces involved on. So genes themselves can have multiple subunits. There’s really [(at least)] two layers to a gene. There are the base units of genes that can be found next to each other [(introns/exons/open reading frames)]. So called these sequences of DNA that are consecutive and these consecutive sequences are [part of] a chromosome so a chromosome could be thought of as a chapter in a book. So this is a very complex document.

If we think of a chromosome is a chapter in a book and a gene as having both paragraphs and sentences, what’s the intermediate level? What’s the page in a chapter of the book? This is where genetics is slightly different from human language. The the information hierarchy from a gene up to the chromosome is a little fuzzy. [One way to think of it is that] genes have parts of them that are like sentences, they have parts of them that are like paragraphs, and they have parts of them that are like entire pages. And so some genes are very simple. Maybe some genes are only one paragraph, and they have 10 sentences. So in [terms of language, it might be] a complex idea. [In terms of genetics, it might be] a complex protein, but it’s not super complex. Now, some genes are extremely complex, and they could take up many pages, so it might take many paragraphs to describe this protein. And I could talk more about how that process [(transcription and translation)] happens at a molecular level. But that could be later.

[Essentially,] the genome is arranged in this similar hierarchy [(sentences, to paragraphs, to pages, to chapters, to documents)]. [For example], if [a genome were a book] and you were to turn to a [page in] chapter one, chromosome one, page one might start with a simple paragraph that describes eye color. But then it might very quickly go into an extremely a complex set of paragraphs that began to talk about how to build muscle proteins. And so building muscle proteins then becomes an epic poem, [but just for that section of the chapter/chromosome]. And so that’s how the genome goes.

So we have linear chromosomes, so humans have 23 pairs of linear chromosomes. So we have 23 chapters in our book. And so you go on to chapter two and now chapter two might begin to focus more on hair structure. And so you might start with the paragraph on hair structure, and then you might go into a paragraph on fingernails, and teeth and things like that [built from keratin]. And so you could spend 30 pages talking about how to build keratin proteins, what time to activate them, where to put them, and how to make them. And that’s kind of how the human genome can be thought of as a book.

And so you go on through chapters one through 21 on and then you have these other kind of appendices we could say, [which are] the sex chromosomes. So the X chromosome and the Y chromosome in humans are the sex determining chromosomes and so we have the 21 somatic chromosomes and the 2 sex determining chromosomes. And so that’s our book. 23 chapters duplicated, you get one set of chapters from your mom and one set from your dad. And they have a little mini guide on the side – the mitochondria, which is a short little piece of DNA, well, the mitochondria is an organelle, like the nucleus, that has a piece of DNA in it. The mitochondrial genome. And so that’s how I think of the similarity between language. Sorry, I’m just checking my time here.

Language and genetics. So when I am thinking about how to use these [two concepts] in the news, what I’m looking at are algorithms that have been developed to understand genetics and apply them to human [written] language. And so we spent a lot of time since the development of the Human Genome Project creating very sophisticated computer algorithms for comparing DNA sequences. And now what I want to do is go back in and use those same algorithms to compare human written text. So that’s that’s the beginning of the of the idea [for this news project] there.

Let’s talk briefly about the news. We have, sorry, had paused for a moment there. We have one recent interesting development in science news. There’s a debate right now in the United States in Congress and NASA and probably the [scientific/space exploration] community, whether we should go to Mars or the Moon first. And so this has largely been settled by the United States government supporting the moon [first], whereas private industry supporting Mars. Now, that doesn’t mean that private industry isn’t still doing both. It’s just the priority of the federal government of the United States is currently going to be the Moon. This would be to send humans to the moon again. Recently we have the first major step here, which is the award of a contract to a company called Maxar to develop a orbital platform for [creating] an [orbital] stop point for sending astronauts to the Moon and, presumably cosmonauts and any other international collaborators to the Moon as well. So we’re seen headway there.

One of the reasons I’m doing this show is that, [for example], the top news for science news today or this week is this Moon mission. And so my question is: Why is space science always the most popular science news in the Internet? And so when I say that, I mean in terms of Google news or websites that aggregate news and science news, usually space flight announcements are very popular. So you see them capturing the biggest audience. My question to myself to solve is: Why is this [space the most popular science in the news]? Why is space so captivating in terms of Internet popularity? And so I’m going to explore that, and my first intuition is that people like space. I like space. I like the fact that we’re developing a Moon mission. I don’t know how useful going to the Moon is aside for the fact that it forces us to develop technologies that I think we should have. So I like it. I support it. I just don’t know that it’s the most important thing happening right now in on the planet Earth. In terms of scientific development, that’s something to focus on, understanding science popularity. But I’ll leave the listeners there to think about that. Why is the Moon so popular? Why is space so popular? So that’s Bryan White, with The Planetary News Radio signing off. Have a good day and thanks for listening.

Join the discussion on Discord at: https://discord.gg/5HQj8eC.

Support on Patreon at: https://www.patreon.com/planetarynews.