CS

What does CodingHorror have in common with Apache H-Base?

As I was in my “Advanced data management” class today I realized that CodingHorror aka Jeff Atwood of stackoverflow fame agree quite alot with Apache H-base. Apache H-base is a column-store database system based on the Google BigTable ideas leveraging Apache Hadoop.

Jeff Atwood was is not very happy with the default “incredibly pessimistic out of the box” setup of databases in his “Deadlocked!” post.  Along the same lines, the H-store people published an article[1]:

The End of an Architectural Era (It’s Time for a Complete Rewrite).

They claim that the time is up for “legacy systems” like the model used by current RDMBS’ like MySQL and SQL Server. The assumption of the H-base people that reasonates with Jeff Atwood – I believe – is:

Every effort should be made to eliminate the cost of traditional dynamic locking for concurrency control, which will also be a bottleneck.

I don’t know if it is something Jeff Atwood is aware of, and I don’t know if that is a sign that a traditional RDBMS is maybe not what he wants, but it is worth think about – IMHO.


[1]Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., Helland, P., 2007. The end of an architectural era: (it’s time for a complete rewrite). In: VLDB ‘07: Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, pp. 1150-1160. URL:http://portal.acm.org/citation.cfm?id=1325981 pdfBibtex

Logging vs. debugging

It is probably no news that measurements in an experiment almost by definition affects the experiment. 

This is true also for IT systems, where logging is a very used way for observing running code.

From Jeff Atwoods Coding Horror blog about a problem during the beta of stackoverflow.com:

We spent days troubleshooting these deadlocks by .. wait for it .. adding more logging! Which naturally made the problem worse and even harder to figure out. 

[link: The problem with logging]

This illustrates my point exactly. 99% of the time logging affect the system in a neglible way. But you have to keep in mind that it actually could affect the “experiment”. In this case the performance or the behavior of your program.

 

My job is “Production Support” in the ITIL way. IE. ensuring quality of running services with  Incident Management and Problem Management. That means that for anything not related to normal operations we often need the logs. 

Also in these days of powerful IDEs and remote debugging is that much logging really nescisary?

[link: Stackoverflow: Code to logging ratio]

Yes it is! We are supporting an ESB developed and customized during the last 3-5years and attaching a debugger isn’t really an option. 

So we are basically totally dependent on good logging for troubleshooting.

We have been dealing with a problem relating to big batches of large messages and the 2GB limit of our JVM’s. We started by initiating a project to throw some dedicated hardware after the code handing these large messages so it wouldnt affect the rest of the “stuff” running on the same application server. But unfortunately hardware acquirement and machine setup can be a slow process, and worst of all: It’s out of our hands! :)

So our current guru took a quick look at the code and started grabbing for some low hanging fruit:

Before ifDebugEnabled

Before ifDebugEnabled

After using if debugEnabled

After using if debugEnabled

 It turns out that developer had used the same politic as mentioned in the stackoverflow post:

<…
DEBUG Level

  • Any parameters passed into the method

…>
 

The parameter in this case was 1-2MB xml-data which was logged like this:

LOG.debug("Entering part 2.1 of method MyMethod with msg: "+msg.toXml());

In production only INFO and above is logged so the debug message was discarded. But the concat of 2MB data was still performed several times per message (10-12 as far as I recall). So the difference between the above to memory graphs is:

if (LOG.isDebugEnabled()){
    LOG.debug("Entering part 2.1 of method MyMethod with msg: "+msg.toXml());
}

(as also mentioned here: http://wordstoday.wordpress.com/2007/11/26/log4j-why-use-isdebugenabled-in-your-code/)

This should buy us some time before we can isolate this process on it’s own hardware/jvm.

The guru is currently looking into writing a xpath expression for PMD run over our giant codebase. Could be fun to see what it’ll dig up