Tuesday, May 3, 2011

Solution for @googlenexus puzzle number 5 in Python, in less than a tweet.

b='tcag'
s=''
for c in i.lower():
if c in b:s+=c
if len(s)>2:print dict(zip([x+y for x in b for y in b],'***WL*HRIT*SVAE'))[s[:2]];s=''


That's it! Now the explanation:



It's assumed that the variable i contains the INPUT of the puzzle: a string with all the cities, with one modification: the letter 'ã' of São Paulo has been replaced by 'a'.
- But why? Isn't that cheating? The character 'ã' is not the character 'a'!
- Well, I justify that change in that the letter 'ã' does not represent Adenine, so if we take this solution down for that, the problem has to go down too... :o)


First: the chunk in red is a python dictionary mapping the two first letters of a codon (three bases) with its one code letter (see the second table here: http://www.hgvs.org/mutnomen/codon.html ).


>>> b='tcag' ; dict(zip([x+y for x in b for y in b],'***WL*HRIT*SVAE'))
{'aa': '*', 'ac': 'T', 'gt': 'V', 'ag': 'S', 'cc': '*', 'tt': '*', 'cg': 'R', 'gc': 'A', 'at': 'I', 'ga': 'E', 'tg': 'W', 'ca': 'H', 'ta': '*', 'tc': '*', 'ct': 'L'}

From that can be seen that every relevant couple of characters has a one code letter associated. Irrelevant couples have asterisks. For example (in green), when I find a codon of this shape gtX, being X any of {A, T, C, G},  I know that the one code letter will be 'V'. A while ago I mapped all three letters of a codon, but I realized that I didn't need that much precision, since with only the first two letters I could get a pretty decent mapping, not biunivocal but with only one ambiguity (that's nothing for an average human like me). The third letter was almos superflous. Almost. 

Now line by line:



Line 1: These are the bases.
Line 3: I iterate over the lowered input string...
Line 4: ...looking for interesting characters ('t', 'c', 'a', 'g') and storing them in s
Line 5: when I've got three interesting characters, I take the first two (ignoring the third one), look them up in the dictionary and print the associated one code letter.


Running it this happens:


$ python nexus.py 
T
R
A
V
E
L
T
H
E
E
A
R
T
H
W
I
T
H
S
T
S       <-- This is the ugly expected consequence of the untreated ambiguity.
E
E
T
V
I
E
W



The challenge was solving the puzzle in less than a tweet. Finally I got it in 137 characters (tabs, \n and spaces included!).


EDIT: Thanks to this guy http://www.petercollingridge.co.uk/python-bioinformatics-tools/codon-table . I took from him the idea for a short mapping!