Apache Phoenix - QuickTechie | Software & IT Professional Network

About Apache Phoenix

Apache Phoenix is an SQL layer for Apache HBase and provides a programming ANSI SQL interface. Using Apache Phoenix, you can create and interact with tables in the form of typical DDL/DML statements using the Phoenix standard JDBC API. Apache Phoenix is converting HBase into SQL Databases. HBase, is a distributed NoSQL store and if you need OLTP and Analytics over HBase than Phoenix.

Phoenix enables OLTP and operational analytics in Hadoop for low latency applications combining the best of both worlds.

The Power of standard SQL and JDBC APIs with full ACID transaction capabilities.
The flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store.
Apache Phoenix has Embedded JDBC Driver which implements the majority of java.sql interfaces, including metadata API's.
Apache Phoenix allows columns to be modelled as a multi-part row key or key/value cells.
Full query support with predicate push down and optimal scan key formation.
DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns.
Versioned schema repository. Snapshot queries use the schema that was in place when data was written.
DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.

Relational Layer

Apache Phoenix is a relational layer for Apache HBase
Query Engine:
- Transform SQL Queries and parses into native HBase API calls. This is in-directly MapReduce.
- Apache Phoenix pushes as much work as possible onto the cluster for parallel execution.
- Metadata Repository: This is a Phoenix table itself which helps in typed access to data stored in HBase tables. It stores tables, views, sequence definitions, secondary indexes. For your perspective its a JDBC Driver.

Apache Phoenix Integration with Hadoop

Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig Flume and MapReduce. For your perspective Apache Phoenix is just like a JDBC driver.

Apache HBase

Apache HBase is a high performance horizontally scalable datastore engine for BigData, suitable as the store of record for mission critical data.

Phoenix and SQL

Accessing HBase data with Phoenix can be substantially faster than direct HBase API use.
Phoenix parallelizes queries based on stats. HBase does not know how to chunk queries beyond scanning an entire region.
Phoenix pushes processing to the server.
If you write your own API call, this may not use coprocessors.
Phoenix has a huge difference for aggregations vs direct HBase API calls.
Phoenix supports and uses secondary indexes.

Apache Phoenix takes your SQL Query, compiles into a series of HBase scans, and orchestrate the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds from small queries, or seconds for tens of millions of rows.

All standard SQL constructs are supported, including SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY etc.

Apache Phoenix also supports a full set of DML commands as well as table creation and versioned incremental alterations through our DDL commands.

Phoenix not supported SQL Construct

Below is the list of constructs which are currently not supported.

Relational Operators: Intersect, Minus
Miscellaneous Built-In functions:

Phoenix Knobs and Dials

Phoenix provides many different knobs and dials to configure and tune the system to run more optimally on your cluster. The configuration is done through a series of Phoenix-specific properties specified for the most part in your client-side hbase-site.xml file. In addition to these properties, there are of course all the HBase configuration properties.

Cloudera Operational Database

Cloudera Operational Datastore is a real-time auto-scaling operational database powered by Apache HBase and Apache Phoenix. COD is an experience which runs in CDP. Cloudera Operational Database experience allows self-service creation and management of an operational database. You can provision a new database with a single click, build application against it and deploy it on the public cloud without complexity.

Apache Phoenix Use Cases

We can use Apache Phoenix for storing data as a basis for measuring activities and generating reports. You should chose Phoenix because it provides the scalability of HBase and the expressiveness of SQL.
Phoenix can be used for on Demand Data aggregations. If you have floating time range of