Seamless Big Data Integration
Today’s data managers are challenged with a growing ecosystem of data sources and warehouses, making big data integration more complex than ever. Your data lives in many data warehouses and data lakes; it continually flows in through streams or rests as point-in-time files. Regardless of the source, OmniSci easily handles data ingestion of millions of records per second into the OmniSciDB open source SQL engine.
Streaming Big Data Integration
Today’s big data integration and ingestion tools must integrate with a wide variety of data sources and networks. Streaming data originates from sensors, network logs, social media, and web clickstreams from all over the globe. This can produce billions of records per week for large organizations. Streaming ingest engines, such as Apache Kafka, organize and distribute this information before finally funneling it into storage.
Although many platforms offer automated streaming data analytics tools, only OmniSci can ingest this volume of data and make it available for interactive exploration by business analysts. OmniSci provides an easy to use utility for Kafka data integration, allowing you to connect to a Kafka topic for real-time consumption of messages and rapid loading into a OmniSci target table.
Data at Rest
Most of the world’s data is at rest, stored in data warehouses, enterprise databases, or Hadoop data lakes. The vast majority of this data has never been explored or analyzed, and it represents an incredible amount of untapped insight. OmniSci easily supports batch import of data at rest, via these methods:
For Delimited Files:
- Consume files such as CSV or TSV easily into OmniSciDB using OmniSciql.
- OmniSciDB can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ formats.
From Data Lakes or Data Warehouses:
- Pull data from Apache Hadoop Distributed File System (HDFS) or from structured data warehouses with Apache Sqoop.