WEEK ONE MS-1

This week, I started working on my GSoC project. First, I did findings on how CQL differs from SQL and what query clauses work in MariaDB and not in Cassandra.
CQL is purposefully similar to SQL used in relational databases like MariaDB and MySQL. This similarity lowers the barrier of entry for users familiar with relational databases. Many queries are very similar between the two. In fact, a lot of basic things are even exactly the same. The following is both valid CQL and SQL

Switching databases

USE testatebase;

Creating Tables

CREATE TABLE IF NOT EXISTS testTable (id INT PRIMARY KEY);

Altering Tables

ALTER TABLE testTable ADD newField INT;

Creating Indexes

CREATE INDEX myIndex ON testTable (newField);

Inserting Data

INSERT    INTO    testTable    (id, newField) VALUES (1, 7);

Selecting Data

SELECT * FROM testTable WHERE newField = 7;

Counting Data

SELECT COUNT(*) FROM myTable;

Deleting Data

DELETE FROM testTable WHERE newField = 7;

 

Differences

Cassandra is a non-relational database, and so uses different concepts to store and retrieve data. In simple terms, a Cassandra keyspace is a SQL database, and a Cassandra column family is a SQL table (CQL allows you to interchange the words “TABLE” and “COLUMNFAMILY” for convenience). This difference necessitates a different syntax for creating and manipulating data:
-Creating databases:

/* Create a new keyspace in CQL */
CREATE KEYSPACE myDatabase WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 1};

/* Create a new database in SQL */
CREATE DATABASE myDatabase;

Cassandra’s keyspaces require more specifications than a standard relational database. Note that the example above is the simplest form. In a production environment, likely spread across several data-centers, a keyspace would be created with a different strategy and replication factor.

  -Organizing Data

None of the standard relational stuff is going to work in CQL. There is no support for things like JOIN, GROUP BY, or FOREIGN KEY. Leaving these features out is important because it makes writing and retrieving data from Cassandra much more efficient.But sets of data tend to have relationships with one another. So without the relational tools of SQL, we need another strategy to represent these relationships. The problem is that reads in Cassandra tend to be more expensive than you might be used to when dealing with a relational database. Also with CQL, any query you plan to do in the future should already be organized into a column family. You want to look up addresses that correspond to a person? Then make a column family that consists of people and addresses. You can still store them separately, but you should also store them together. Having a column family to represent every query makes reads much more efficient. You’re not just going to have to de-normalize, you’re going to have to forget about normalization all together.

-Inserts vs Updates

The concept of cheap writes extends to updating data. This is because, unlike in SQL, a read is not performed during the update. The syntax is the same in CQL and SQL.

/* Updating data */
UPDATE    testTable    SET    newField = 2    WHERE   id  =  6;

However, if the row does not exist, it will still get created. Similarly as unintuitive, an INSERT statement will actually replace data if it exists. Because again, CQL does not perform a read while inserting. Without a read, there is no way to know if the data being inserted is replacing an existing record. This means that both inserts and updates are extremely fast.

-Time to Live

CQL enables you to set a TTL on a row. Meaning that you can set a row to expire 24 hours from the time it gets created. This is accomplished with the USING TTL command (values are in seconds).

/* Expiring Data in 24 Hours */
INSERT    INTO    testTable (id, newField)    VALUES (2, 9)     USING    TTL   86400;

Twenty-four hours after that query is executed, the data will be deleted.

These are some of the differences between CQL and SQL and cleary CQL is something less powerful and lack most complex operations that can be performed with SQL.


		
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s