Holding Tide: pattern recognition

Showing posts with label pattern recognition. Show all posts

Tuesday, December 22, 2009

Pitman Shorthand

(source)
Pitman shorthand is a system of shorthand for the English language developed by Englishman Sir Isaac Pitman (1813–1897), who first presented it in 1837. Like most systems of shorthand, it is a phonetic system; the symbols do not represent letters, but rather sounds, and words are, for the most part, written as they are spoken.

Tuesday, December 15, 2009

Algorithm Design

In scientific computing, optimality of algorithms is not always something which receives full consideration by users-- if you want to run a sort or solve some combinatorial problem, you are more concerned with making the algorithm work than looking into how fast it runs. But in dealing with very large data sets, reducing the limiting behavior of your algorithm from N^2 to NlogN can reduce your runtime from something on the order of years to something on the order of seconds. So while there are many computational problems out there for which solutions are known to exist, the practical matter of implementing such solutions is so expensive that they are effectively useless-- but if a new algorithm could be found which would reduce their runtime, we might suddenly be able to use them.

This is the driving motivation behind much of the work that goes into quantum computing. Because bit state in a quantum computer is probabilistic rather than binary, the computer operates in a fundamentally different way, and we can design algorithms which take such differences into account. One vivid example is Grover's Algorithm for searching an unsorted array. Here's a good description from Google labs:

Assume I hide a ball in a cabinet with a million drawers. How many drawers do you have to open to find the ball? Sometimes you may get lucky and find the ball in the first few drawers but at other times you have to inspect almost all of them. So on average it will take you 500,000 peeks to find the ball. Now a quantum computer can perform such a search looking only into 1000 drawers.

So if you were opening one drawer a second, the traditional algorithm would take you an average of six days to run, while the quantum algorithm would take you a little under 17 minutes.

(Now, say each of those million drawers represented a different combination of letters and numbers, and you were trying to find the drawer/combination which corresponded to the password to someone's email account. Encryption standards which would be secure against attacks from a traditional computer are easily bypassed by quantum algorithms.)

While quantum computing still has a ways to go, parallel programming is already providing another alternative to traditional computer architecture. In parallel programming, you split your code up and send it to a number of computers running simultaneously (for our million-drawer problem: say you had 9 other people to help you, you could each search a different set of 100,000 drawers and it would only take 50,000 steps on average for the ball to be found.) So the trick in parallel programming is to figure out the right way to eliminate all the bottlenecks in your code and split up your task across processors as efficiently as possible.

Now, what about a task like image recognition? If you had a couple thousand processors at your disposal and a single image to feed them, what is the most efficient way for your processors to break up that image so that between them they can reconstruct an understanding of what it depicts? You might decide to give each computer a different small piece of the image, and tell it to describe what it sees there-- maybe by indicating the presence or absence of certain shapes within that piece. Then have another round of computers look at the output of this first batch and draw more abstract conclusions-- say computers 3, 19, and 24 all detected their target shape, so that means there's a curve shaped like such-and-such. And continue upwards with more and more tiers representing higher and higher levels of abstraction in analysis, until you reach some level which effectively "knows" what is in the picture. This is how our current understanding of the visual cortex goes-- you have cells with different receptive fields, tuned to different stimulus orientations and movements, which all process the incoming scene in parallel, and in communication with higher-level regions of the brain.

It would be interesting, then, to see what sensory-processing neuroscience and parallel programming could lend one another. Could the architecture of the visual cortex be used to guide design of a parallel architecture for image recognition? Assuming regions like the visual cortex have been evolutionarily optimized, an examination of the parallel architecture of the visual processing system could tell us a lot about how to best organize information flow in parallel computers, and how to format the information which passes between them. Or in the other direction, could design of image-recognition algorithms for massively parallel computers guide experimental analysis of the visual cortex? If we tried to solve for the optimal massively-parallel system for image processing, what computational tasks would the subunits perform, and what would their hierarchy look like-- and could we then look for these computational tasks and overarching structure in the less-understood higher regions of the visual processing stream? It's a bit of a mess because the problem of image processing isn't solved from either end, but that just means each field could benefit from and help guide the efforts of the other.

So! Brains are awesome, and Google should hire neuroscientists. Further reading:

NIPS: Neural Information Processing Systems Foundation
Cosyne: Computational and Systems Neuroscience conference
Introduction to High-Performance Scientific Computing (textbook download)
Message-Passing Interface Standards for Parallel Machines
Google on Machine Learning with Quantum Algorithms
Quantum Adiabatic Algorithms employed by Google
Optimal Coding of Sound

Wednesday, December 9, 2009

On Google Trends

My new favorite thing: periodic behavior on Google trends. Witness: the holiday reverse pulse response; ipod product release schedule; shopping for cars; getting lonely in December; school and the side effects of school.

Friday, November 20, 2009

High Speed Sequencing

This video dedicated to my undergraduate degree in biology, in which it was never deemed necessary to introduce the fact that sequencing technology more sophisticated than the Sanger method exists. This is an animation explaining the process behind Helicos's new single-molecule sequencing technology. Like all other modern sequencing methods, this technique is based on short-read sequences-- DNA is replicated and then broken into millions of tiny fragments (25-50 base pairs at the low end), all of which are sequenced simultaneously. Given about 30-fold coverage of your genome, you can align these fragments to confidently reconstruct it as a single sequence.

Also of note, the Velvet algorithm is one cool sequence assembly program which, instead of aligning DNA fragments by simply looking for overlapping regions between them, plots all the fragment sequences generated onto a De Bruijn graph, and then uses principles of graph theory to condense them into a single sequence. Yay math!

Thursday, August 20, 2009

Google News timelines

So I've just discovered Google News has an Archive Search feature, which lets you search historical news articles for a given phrase and return a histogram of hits. It makes an interesting way to map the rise and fall of concepts, events, and phrases in the public mind. Here are some queries I've come across that have interesting patterns:

First, some normalization: a search for the, and, a, etc gives us an estimate of the number of articles on record-- gradual uphill increases like those seen here should be attributed to the nature of the data set and not the data itself. (Science!)

There's lots of modern words and phrases we can watch grow into popularity, like outer space and DNA. More subtly, we see the emergence of the adjective global starting in the 1940's, and a sudden rise in popularity of the word deadly in the 1980's (wut?). Robot grows gradually in use over the 20th century, though there is a funny spike in the summer of 1944, which correlates to German use of "robotic" planes to bomb Britain during WW2. And atom shows a boom midway through 1945, of course, though it's curious to note that its appearance in the news is deminished prior to that, during the war-- this could be a result of wartime news censorship, but then if you search science itself, you see that science reporting in general tends to drop during wartime, which could also be a factor.

Then some words are tied to a certain time period-- like fallout shelter and elixir. Others fall from popularity: for some odd reason, the word obituary became wildly unpopular in 1986, while the civil rights movement (I assume) soundly quashed use of the word negro after the late 60's. And lipstick, after rising in popularity starting in the roaring 20's (a phrase which didn't actually take off 'til the 60's-- does that mean 20's culture was to the 60's what baby boomer culture is to the 90's/today?), lipstick suffered a temporary blow in the 1970's, either from the growth of the feminist movement or simply from the fashion of the time.

What other trends are out there?

Wednesday, May 6, 2009

Visualizing Music and Looking for Patterns

Found this and many other impressive videos on one Stephen Malinowski's YouTube channel. I really like the way the colored bar visualization separates out the different voices in a piece, especially the fugues.

The opening to Gödel, Escher, Bach has a fun discussion on the structure of the fugue-- the gist of it is that the composer develops the piece out of one short theme (a few measures of some simple melody), carried by a fixed number of voices. Starting with one voice expressing the theme, each additional voice chimes in repeating the theme until all are present. The theme is further explored and varied throughout the piece via transformations of the original melody: inverting, reversing, transposing, compressing. Soooo the video above is really cool, because the visualizations make it that much easier to pick out all the transformations that are taking place. Yay!

I wonder if you could make other visualization methods which help you pick out recurring themes in a piece, and are robust to transformations from the original theme (or measure distance from the original). It seems like a problem that crops up a lot, in problems from network analysis to predicting structural motifs in proteins. For instance, all integral membrane proteins will have a hydrophobic region which crosses the lipid bilayer-- this requires an extended sequence of hydrophobic amino acids, which will be reflected in the genetic code. There would be variation in sequence (not all membrane proteins would have the same arrangement of hydrophobic amino acids), but there might still be trends which might be picked up. Fourier/Laplace transforms can break a signal down into its periodic components; is there some way to transform a signal to visualize it in the space of its recurrent themes and their variations?

Holding Tide