Understanding Topological Data Analysis
What do we think of when we hear the word data?. We might think of spreadsheets, we might think a cell phone or GPS records, we might think of Internet traffic or we might even think of DNA sequences.
All these sources provide very large and complicated datasets to analyze. How do we make sense of larger raise of numbers in columns and rows?
Existing methods for studying datasets typically proceed by asking very specific questions of your data. But let's think about the whole problem a little bit different. Suppose that our data is divided into groups let's represent each group by a node and let's also represent the relationships between the groups by connections between the nodes.
Now we see a merging here of unstructured massive numbers we instead have a kind of shape or network which is really representing the shape of our data. We can now use our visual system to look at the data and identify features in the network which correspond to patterns in the data. These patterns are what we mean when we say extracting knowledge from data. The methodology I've been describing for you is called topology.
Topology the subfield of mathematics that concerns itself with the study of shape and it has its origins in the 18th century with the Swiss mathematician Leonhard Euler. Euler became aware of a challenge problem concerning the seven bridges crossing the river Prego.
The question was can you stand at the end of one of the bridges walk across each of the bridges in succession and return to where you started by crossed each bridge exactly once.
Now what Euler did was really extraordinary. He took all the information about the bridges, the river, the islands and land masses and converted it all into a simple network. In doing so he found that in fact it's not possible to walk across all the bridges exactly once.
Topology has been studied as part of math for its own sake for the last 250 years. But what was very exciting is that in the last fifteen years we found that it has applications in too many different real-world problems. One of those is the analysis and understanding of high-dimensional and complex datasets. This new area of study is called topological data analysis and it's changing the way that people are able to understand and analyze their data.
There are three big concepts about topology that give its power for analyzing and understanding shapes.
The first one of these properties is called coordinate invariance. It refers to the fact that topology measures properties of shapes that don't change even as you rotate the shape or maybe change the coordinate system in which you're viewing the shape.
The second big property is called deformation invariance. We call a shape has deformation invariance property if that property doesn't change even though I might stretch me or squash the shape without tearing.
Humans are really good at recognizing deformation invariant properties. That ability is what allows us to recognize the letter A is a letter A no matter what font that letter is written in. Thinking this way can also lead you to some surprising conclusions. For example that a doughnut is very much like a coffee cup.
The final property is that compressed representations. Suppose we have in front of us a sphere. That’s infinite amounts of information which is very hard for us to process and understand. On the other hand suppose I approximate the shape by hexahedron. Still very much like a sphere but now it's represented by a simple list consisting of 12 nodes, 30 edges and twenty faces.
The focus on connectivity in continuity information allows the topology to recognize patterns in data which make that data relevant.
The three fundamental properties combine and very striking ways to allow us to analyze and understand very large and complicated data sets. But this is only the beginning Topogical Data Analysis represents fundamental advances in machine learning. In the near future machines will help humans organize, simplify and understand their very large and complicated data sets. This partnership between man and machine will have impact for all areas of human forever.