CS
Biomolecular computers and programming
Jun 7th
I have recently handed in a report as a part of my masters/candidate in Computer Science at University of Copenhagen – “Visualizing blobs and computation in a biomolecular computation model“. It sounds very fancy and I would like to introduce the subject and my report here in hopefully a less dry way than in the report it self. (This post will be in a “anecdotal” style and will not contain citations for all the facts. The report above should be up to academic par on citations)
Biomolecular computers and computation
Biomolecular computers or “biocomputers” is an area that have been research the last 20 years. At first much hype and hope was attached that this would provide some kind of break through to overcome the limitations of normal silica-based chip computers as we know and use today – your average PC and every microchip controlled device around. This is the same hope that surrounds quantum computers – a new approach for doing currently very long computations, for example integer factorization of large prime products within feasible time. The jury is still out on the possibility of biomolecular computing to deliver on that hope/potential. Other research have also shown “niche” interest in the area of “DNA Doctor” usage of biocomputers where a “biocomputer” is implemented to interact directly with the cells in for example humans.
A biomolecular computer can be seen as a computer that ”.…use systems of biologically derived molecules, such as DNA and proteins, to perform computational calculations involving storing, retrieving, and processing data.“(From wikipedia). Why is this interesting?
First of all: Why not? Tinkering, and playing around with things is interesting IMHO: By drawing the parallel between a biocomputer and a (human) brain you can say that it is a way to learn about ways nature works.
Second of all, a biocomputer will have different properties compared to conventional computers. Some of the first explored ideas was to use DNA interactions to solve very computational hard problems (NP-complete, traveling salesman like problems, Adleman 1994). This is interesting because it is possible to have millions and millions of molecules in lab-tube and thereby allowing for massively parallel computations – compared to a conventional computer which might have 4 (or at least not millions of) cores for parallel computations.
The Blob programming model
When I contacted my supervisor about writing a project this winter I was introduced to “the blob programming model”. At that time it was mostly an article in progress. Just now it has been accepted for the CS2Bio workshop as “Programming in Biomolecular computation” in Amsterdam, June 10, 2010.
The authors, Neil Jones in particular, read lots of the articles around biomolecular computation, turing universality of the models, and formal algebras for describing molecular interactions (Like Kappa calculus and Biochemical Ground Form) but his background as a computer scientist found something missing: Where are the programs?
Lots of interesting computational properties was proven but as a programmer there is no way to write a program as we know it.
Based on that “hole” a machine language was developed and described in the article which might theoretically could be used on a biocomputer. The models was dubbed “The blob programming model” and the article can be found at http://blobvis.appspot.com/blob
My Project – Visualizations of Blob programs
Based on this article I defined a project for doing a literature review of biocomputing literature as well as visualization theory applicable to visualization of blob programs. Normal progrogramming visualizations exists and have been used for many years, but in this case there was a special angle attached to the visualizations. The blob model has a potential physical analog as it might be possible to create a “biomolecular computer” that can execute the instructions and as the instructions is formed to be somewhat like an abstract molecule or similar a visualization of the blob instruction set could/should reveal interesting properties of blob programs with regards to their physical presence.
At http://blobvis.appspot.com my report is available for download as well as the BlobVis visualization tool I developed. From there you can play around with a few simple “Blob Programs”, for example a “list append’” program and see a video of a program executing in BlobVis. As I focused on physical properties the tool uses a physical based algorithm for layouting the blob programs(Via prefuse) which allows you to drag around programs and data in a way that looks like it is immersed in water or similar. That gives an interesting effect and is fun to watch.
Reading up on “Dynamic Graph Layout”
Apr 15th
I’ve recently been reading up on Dynamic Graph Layout specifically with regards to “preserve the mental map” or “maintaining dynamic stability”.
It looks like I’ve gotten all the way around the currently published articles with this list:
What does CodingHorror have in common with Apache H-Base?
Feb 18th
As I was in my “Advanced data management” class today I realized that CodingHorror aka Jeff Atwood of stackoverflow fame agree quite alot with Apache H-base. Apache H-base is a column-store database system based on the Google BigTable ideas leveraging Apache Hadoop.
Jeff Atwood was is not very happy with the default “incredibly pessimistic out of the box” setup of databases in his “Deadlocked!” post. Along the same lines, the H-store people published an article[1]:
The End of an Architectural Era (It’s Time for a Complete Rewrite).
They claim that the time is up for “legacy systems” like the model used by current RDMBS’ like MySQL and SQL Server. The assumption of the H-base people that reasonates with Jeff Atwood – I believe – is:
Every effort should be made to eliminate the cost of traditional dynamic locking for concurrency control, which will also be a bottleneck.
I don’t know if it is something Jeff Atwood is aware of, and I don’t know if that is a sign that a traditional RDBMS is maybe not what he wants, but it is worth think about – IMHO.
[1]Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., Helland, P., 2007. The end of an architectural era: (it’s time for a complete rewrite). In: VLDB ’07: Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, pp. 1150-1160. URL:http://portal.acm.org/citation.cfm?id=1325981 pdf – Bibtex
Logging vs. debugging
Jan 18th
It is probably no news that measurements in an experiment almost by definition affects the experiment.
This is true also for IT systems, where logging is a very used way for observing running code.
From Jeff Atwoods Coding Horror blog about a problem during the beta of stackoverflow.com:
We spent days troubleshooting these deadlocks by .. wait for it .. adding more logging! Which naturally made the problem worse and even harder to figure out.
This illustrates my point exactly. 99% of the time logging affect the system in a neglible way. But you have to keep in mind that it actually could affect the “experiment”. In this case the performance or the behavior of your program.
My job is “Production Support” in the ITIL way. IE. ensuring quality of running services with Incident Management and Problem Management. That means that for anything not related to normal operations we often need the logs.
Also in these days of powerful IDEs and remote debugging is that much logging really nescisary?
Yes it is! We are supporting an ESB developed and customized during the last 3-5years and attaching a debugger isn’t really an option.
So we are basically totally dependent on good logging for troubleshooting.
We have been dealing with a problem relating to big batches of large messages and the 2GB limit of our JVM’s. We started by initiating a project to throw some dedicated hardware after the code handing these large messages so it wouldnt affect the rest of the “stuff” running on the same application server. But unfortunately hardware acquirement and machine setup can be a slow process, and worst of all: It’s out of our hands!
So our current guru took a quick look at the code and started grabbing for some low hanging fruit:

Before ifDebugEnabled

After using if debugEnabled
It turns out that developer had used the same politic as mentioned in the stackoverflow post:
<…
DEBUG Level
- Any parameters passed into the method
…>
The parameter in this case was 1-2MB xml-data which was logged like this:
LOG.debug("Entering part 2.1 of method MyMethod with msg: "+msg.toXml());
In production only INFO and above is logged so the debug message was discarded. But the concat of 2MB data was still performed several times per message (10-12 as far as I recall). So the difference between the above to memory graphs is:
if (LOG.isDebugEnabled()){ LOG.debug("Entering part 2.1 of method MyMethod with msg: "+msg.toXml()); }(as also mentioned here: http://wordstoday.wordpress.com/2007/11/26/log4j-why-use-isdebugenabled-in-your-code/)
This should buy us some time before we can isolate this process on it’s own hardware/jvm.
The guru is currently looking into writing a xpath expression for PMD run over our giant codebase. Could be fun to see what it’ll dig up