Newton's Law of Citation

More references mean more citations, according to an analysis of papers published in Science.

Comments (0)
Posted

Growth of Sequence and 3D Structure Databases

Comments (0)
Posted

Periodic Table of Irrational Nonsense

Comments (0)
Posted

Numbers in social networking around the world

Comments (0)
Posted

General data set curation workflow.

 

Comments (0)
Posted

Map-shuffle-scan framework used by Crossbow

Comments (0)
Posted

Unexpected entrepreneurs

Comments (0)
Posted

Cassandra vs HBase

CassandraHBase
Lacks concept of a Table. All the documentation will tell you that it's not common to have multiple keyspaces. That means you have to share a key space in a cluster. Furthermore adding a keyspace requires a cluster restart!Concept of Table exists. Each table has it's own key space. This was a big win for us. You can add and remove table as as easily as a RDBMS.
Uses string keys. Very common to use uuids as the keys. You can use TimeUUID if you want your data to be sorted by time.Uses binary keys. It's common to combine three different items together to form a key. This means you can search by more than one key in a give table.
Even if you use TimeUUID, as Cassandra load balances client requests, hot spotting problem won't occur. (All the client requests going to one server in a cluster is known as a hot spot problem).If your key's first component is time or a sequential number, then hotspotting occurs. All of the new keys will be inserted to one region until it fills up (hence by causing a hotspotting problem).
Offers sorting of columns.Does not have sorting of columns.
Concept of Supercolumn allows you to design very flexible, very complex schemas.Does not have supercolumns. But you can design a super column like structure as column names and values are binary.
Does not have any convinience method to to increment a column value. In fact the vary nature of eventual consistency makes it difficult to update/write a record and read it instantly after the update. You have to make sure that R + W > N to achive strong consitency.By design consitent. Offers a nice convinience method to increment counters. Very much suitable for data aggregation.
Map Reduce support is new. You will need a hadoop cluster to run it. Data will be tranferred from Cassandra cluster to the hadoop cluster. No suitable for running large data map reduce jobs.Map Reduce support is native. HBase is built on Hadoop. Data does not get transferred.
Comparatively simpler to maintain if you don't have to have hadoop.Comparatively complicated as you have it has many moving pieces such as Zookeeper, Hadoop and HBase itself.
Does not have a native java api as of now. No java doc. Even though written in Java, you have to use Thrift to communicate with the cluster.Has a nice native java api. Feels much more java system than Cassandra. Being a java shop, it was important for us. HBase has a thrift interface for other laguages too.
No master server, hence no single point of failure.Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure.

Comments (0)
Posted