User Guide
  1. BryteFlow Ingest - Real-time data integration for S3, Redshift
    1. Supported Database Sources
    2. Supported Destinations
  2. Required AWS Services
  3. Prerequisite
  4. Launch BryteFlow Enterprise Edition from AWS Marketplace
  5. Launch BryteFlow Ingest from AWS Marketplace : Standard Edition
  6. Environment Preparation
    1. Network Setting
    2. Testing The Connections
    3. Preparing MS SQL Server
    4. Preparing On-premise Oracle
    5. Preparing Oracle on Amazon RDS
    6. Preparing On-premise MySQL
    7. Preparing MySQL on Amazon RDS
    8. Data Types in MS SQL Server
    9. Data Types in Oracle
  7. Security Permissions Required on Source
    1. Security for MS SQL Server
    2. Security for Oracle
    3. Security for MySQL
  8. Verification of Source
    1. Verification of MS SQL Server source
    2. Verification of Oracle source
  9. Creating Amazon Services
    1. Creating An EC2 System
    2. Creating S3 Bucket
    3. Configuring EMR Cluster
  10. Starting & Stopping BryteFlow Ingest
  11. Configuration of BryteFlow Ingest
  12. Dashboard
  13. Connections
    1. Source
      1. MS SQL Server and SAP (MS SQL Server)
      2. Oracle and SAP (Oracle)
    2. Destination Database
    3. Destination File System
    4. Email Notification
  14. Data
    1. Partitioning
  15. Schedule
    1. Rollback
  16. Configuration
    1. Source
    2. Destination Database
    3. S3
    4. License
    5. High Availability / Recovery
      1. Recovery Configuration
      2. Recovery Utilisation
    6. Remote Monitoring
  17. Log
  18. Appendix: Understanding Extraction Process
    1. Extraction Process
    2. First Extract
    3. Schedule Extract
    4. Add a new table to existing Extracts
    5. Resync data for Existing tables
  19. Appendix: Bryte Events for AWS CloudWatch Logs and SNS
  20. Appendix: Release Notes
    1. BryteFlow Ingest 3.7.3
    2. BryteFlow Ingest 3.7
  21. Optimize usage of AWS resources / Save Cost
  22. Upgrade BryteFlow versions from AWS Marketplace when using AMI

BryteFlow Ingest - Real-time data integration for S3, Redshift

BryteFlow Ingest is a real-time data replication software for S3 and Redshift. Its a high-performance software that facilitates real-time change data capture from sources with zero load on the source systems. BryteFlow Ingest captures the changes and transfers them to the target system. It automates the creation of either an exact copy or a time series copy of the data source in the target. BryteFlow Ingest performs an initial full load from source and then incrementally merges changes to the destination of choice, the entire process being fully automated.

It works with companion software BryteFlow Blend for real-time data extraction and data preparation.

Supported Database Sources

BryteFlow Ingest supports the following database sources using the Amazon Machine Image (AMI) from AWS Marketplace https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • MS SQL Server
  • Oracle
  • SAP (MS SQL Server)
  • SAP (Oracle)
BryteFlow Ingest also supports the following database sources, if you wish to source from these you should contact Bryte directly info@bryteflow.com.
  • MySQL
  • Salesforce
  • Netsuite

Supported Destinations

The supported destinations are as follows:

  • S3
  • Redshift
  • Snowflake

Looking for a different destination?

BryteFlow does custom source/destination on customer request, please contact us directly at  info@bryteflow.com.

Required AWS Services

The following services will be required to be provisioned from AWS

  • EC2
  • EMR
  • S3
  • Redshift (optional, as a destination)
  • Cloudwatch
  • DynamoDB (optional, for high availability feature)

Prerequisite

Prerequisites of using Amazon Machine Image (AMI) from AWS Marketplace

Using the AMI sourced from the AWS Marketplace requires:

  • Selection of BryteFlow Ingest volume
  • Selection of EC2 instance type
  • Ensure connectivity between the server/EC2 hosting the BryteFlow Ingest software and
    • the source
    • Amazon S3
    • Amazon EMR
    • Amazon Redshift (if needed as a destination)
    • Snowflake (if needed as a destination)
    • DynamoDB (if high availability option is required)

Available options with AMI are volume based, recommended options for EC2 and EMR for each of these volumes.

 

Total Data Volume EC2 Recommended EMR Recommended 
< 100 GB t2.small 1 x m4.xlarge master node
2 x c5.xlarge task nodes
100GB – 300GB t2.medium 1 x m4.xlarge master node
2 x c5.xlarge task nodes
300GB – 1TB t2.large 1 x m4.xlarge master node
2 x c5.xlarge task nodes
> 1TB Seek expert advice from support@bryteflow.com Seek expert advice from support@bryteflow.com

 

NOTE: Evaluate the EMR configuration depending on the latency required.

These should be considered a starting point, if you have any questions please seek expert advice from support@bryteflow.com

 

System Requirement when not using Amazon Machine Image (AMI)

  • Port 8081 should be open on the server hosting the BryteFlow Ingest software
  • Google Chrome browser is required as the internet browser on the server hosting BryteFlow Ingest software
  • Java version 8 or higher is required
  • If using MS SQL Server as a source, please download and install the BCP utility
  • Ensure connectivity between the server hosting the BryteFlow Ingest software and the source, Amazon S3, Amazon EMR, Amazon Redshift and DynamoDB (if high availability option is required)

Launch BryteFlow Enterprise Edition from AWS Marketplace

Steps to launch BryteFlow from AWS Marketplace: Enterprise Edition

  • Go to the product URL https://aws.amazon.com/marketplace/pp/B079PWMJ4B
  • Click ‘Continue to Subscribe’
  • Click ‘Continue to Configuration’. This brings up the default ‘Fulfillment Option’ with the latest software version.
  • Choose the AWS Region you would like to go for or else go by the default AWS Region that is already present in the drop-down.

 

  • Click ‘Continue to Launch’
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown or go by the default
  • Choose the ‘Subnet Settings’
  • Update the ‘Security Group Settings’ or go by ‘default’
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

Additional information regarding launching an EC2 instance can be found here

Any trouble launching or connecting to the EC2 instance, please refer to the troubleshooting guides below:

 

** Please note that BryteFlow Blend is a companion product to BryteFlow Ingest. In order to make the most of enterprise capabilities, first setup BryteFlow Ingest completely. Thereafter, no configuration is required in BryteFlow Blend, its all ready to go. Start with the transformations directly off  AWS S3.

Once connected to the EC2 instance:

  • Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘New Instance’ button and do the setup for your environment (refer to the section regarding Configuration Of BryteFlow Ingest in this document for further details)
    • ‘Existing Instance’ should only be clicked when recovering an instance of BryteFlow Ingest (refer to the Recovery section of this document for further details)
    • Once Ingest is all setup and is replicating to the desired destination successfully
    • Launch BryteFlow Blend from the Google chrome browser using bookmark ‘BryteFlow Blend’
    • Or type localhost:8082 into the Google chrome browser to open the BryteFlow Blend web console
    • BryteFlow Blend is tied up to BryteFlow Ingest and no AWS Location configuration is required.
    • This makes users ready to start their data transformations of S3.
    • For details on Blend setup and Usage refer to the BryteFlow Blend User Guide: http://docs.bryteflow.com/Bryteflow-Blend-User-Guide/

 

 

Launch BryteFlow Ingest from AWS Marketplace : Standard Edition

Steps to launch BryteFlow Ingest from AWS Marketplace: Standard Edition

  • Go to the product URL https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • Click ‘Continue to Subscribe’
  • Click ‘Continue to Configuration’. This brings up the default ‘Fulfillment Option’ with the latest software version.
  • Choose the AWS Region you would like to go for or else go by the default AWS Region that is already present in the dropdown.

  • Click ‘Continue to Launch’
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown or go by the default
  • Choose the ‘Subnet Settings’
  • Update the ‘Security Group Settings’ or go by ‘default’
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

Additional information regarding launching an EC2 instance can be found here

Any trouble launching or connecting to the EC2 instance, please refer to the troubleshooting guides below:

Once connected to the EC2 instance:

  • Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘New Instance’ button (refer to the section regarding Configuration Of BryteFlow Ingest in this document for further details)
    • ‘Existing Instance’ should only be clicked when recovering an instance of BryteFlow Ingest (refer to the Recovery section of this document for further details)

Environment Preparation

Network Setting

Opening Ports in Amazon Console & Windows Server

Open ports On Windows Server

Please perform the steps to allow the inbound traffic to your server, mentioned in the following link:

Open ports on Amazon Console.

Please perform the steps to allow the inbound traffic to your Amazon instance, mentioned in the following link:

Testing The Connections

Verify if the connectivity to remote services is available.

To test the remote connections you would need telnet utility. Telnet has to be enabled from the control panel in Turn on Windows Feature.

  1. Go to start and then Run and type CMD, and click OK.
  2.  Type the following at the command prompt.
telnet <IP address or Hostname> Port number

For example

telnet 192.168.1.1 8081

If the connection is unsuccessful then an error will be shown.
If the command prompt window is blank only with the cursor, then the connection is successful and the service is available.

Preparing MS SQL Server

Enable Change Tracking for a database in MS SQL Server

This section applies to MS SQL Server, the version should be higher than 2008.

  • To enable Change Tracking at the database level execute the following query:
    ALTER DATABASE databasename
    SET CHANGE_TRACKING = ON
    (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON)
  • Enable Change Tracking at the table level as below execute the following query:
    ALTER TABLE tablename
    ENABLE CHANGE_TRACKING
    WITH (TRACK_COLUMNS_UPDATED = ON)
  • Enable Change Tracking at the database and table level for all the databases and tables to be replicated.
  • To enable view permission to event viewer logs execute the following query:
    GRANT VIEW SERVER STATE TO "AccountName"

*** Please Note:  If you are configuring BryteFlow Ingest for a completely new SQL Server database, Please make sure to perform at least ‘One Transaction’ on the database to generate the log sequence no. for BryteFlow to startwith.

Preparing On-premise Oracle

Enable Change Tracking for an On-Premise Oracle Server

Execute the following queries on Oracle Server to enable change tracking.

  • Oracle database should be in ARCHIVELOG mode.
  • The supplemental logging has to be turned on at the database level. Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following statements:

    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  • Alternatively to turn on minimal database supplemental logging execute the following statements:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; 
    ALTER DATABASE FORCE LOGGING;
  • In Oracle, ensure that supplemental logging is turned on at the table level. To turn on supplemental logging at the table level, execute the following statement:
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing Oracle on Amazon RDS

Enable Change Tracking for a database on Amazon Oracle RDS

  • In Oracle on Amazon RDS, the supplemental logging should be turned on at the database level.
  • Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following queries.

    exec
    rdsadmin.rdsadmin_util.alter_supplemental_logging('ADD','ALL');
  • To retain archived redo logs on your DB instance, execute the following command (example 24 hours)
    exec
    rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);
  • To turn on supplemental logging at the table level, execute the following statement
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing On-premise MySQL

To prepare MySQL for change tracking perform the following steps.

To enable binary logging, the following parameters need to be configured as below in my.ini file on MySQL on Windows or in my.cnf file on MySQL on UNIX:

Parameter Value
server_id Any value from 1.
E.g. server_id = 1
 log_bin=<path> Path to the binary log file.
E.g. log_bin = D:\MySQLLogs\BinLog
binlog_format binlog_format=row
expire_logs_days To avoid disk space issues it is strongly recommended not to use the default value (0).
E.g. expire_log_days = 4
 binlog_checksum This parameter can be set to binlog_checksum=none.
BryteFlow does support CRC32 as well
binlog_row_image binlog_row_image=full

Preparing MySQL on Amazon RDS

Enabling Change tracking on MySQL on Amazon RDS

To enable change tracking MySQL on Amazon RDS perform the following steps.

  1. In the AWS management console, for MySQL on Amazon RDS create a new DB parameter group and the following parameters should be configured as shown.
  2. The MySQL RDS DB instance should use the newly created DB parameter group for binary logging to be enabled.
binlog_format: binlog_format=row
binlog_checksum : binlog_checksum=none OR CRC32.

Data Types in MS SQL Server

BryteFlow Ingest source supports most MS SQL Server data types, see the following table for the supported list:

MS SQL Server Data Types

BIGINT REAL VARCHAR (max)
BIT FLOAT NCHAR
DECIMAL DATETIME NVARCHAR (length)
INT DATETIME2 NVARCHAR (max)
MONEY SMALLDATETIME BINARY
NUMERIC (p,s) DATE VARBINARY
SMALLINT TIME VARBINARY (max)
SMALLMONEY DATETIMEOFFSET TIMESTAMP
TINYINT CHAR UNIQUEIDENTIFIER
VARCHAR HIERARCHYID XML

Data Types in Oracle

BryteFlow Ingest source supports most Oracle data types, see the following table for the supported list:

Oracle Data Types

BINARY_DOUBLE BINARY_FLOAT CHAR
DATE INTERVAL DAY TO SECOND LONG
LONG RAW NCHAR NUMBER
NVARCHAR RAW REF
TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE VARCHAR2

 

Security Permissions Required on Source

Security for MS SQL Server

The BryteFlow Ingest database replication login  user should have VIEW CHANGE TRACKING permission to view the Change Tracking information.

--Review all change tracking tables that are = 1 enabled, or = 0 disabled
SELECT *
  FROM sys.all_objects
 WHERE object_id IN (SELECT object_id 
                       FROM sys.change_tracking_tables
                      WHERE is_track_columns_updated_on = 1);

Security for Oracle

The Oracle user running BryteFlow Ingest must have the following security privileges:

SELECT access on all tables to be replicated

The following statement should return records…

SELECT * FROM  V$ARCHIVED_LOG;

If no records are returned, select access on V_$ARCHIVED_LOG should be provided, or check if the database is in ACHIVELOG mode.

The following security permissions should be assigned to the user

CREATE SESSION
SELECT access on V_$LOGMNR_CONTENTS
SELECT access on V_$LOGMNR_LOGS
SELECT access on ANY TRANSACTION
SELECT access on DBA_OBJECTS
EXECUTE access on DBMS_LOGMNR

Run the following grant statements for <user> for the above requirements

GRANT SELECT ON V_$ARCHIVED_LOG TO <user>;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO <user>;
GRANT EXECUTE ON DBMS_LOGMNR TO <user>;
GRANT SELECT ON V_$LOGMNR_LOGS TO <user>;
GRANT SELECT ANY TRANSACTION TO <user>;
GRANT SELECT ON DBA_OBJECTS TO <user>;

 

Security for MySQL

The Ingest user id must have the following privileges:

  1. Replication client, and Replication Slave.
  2. Select privileges on the source tables designated for replication.
  3. Execute the following queries to grant permissions to a MySQL user.
CREATE USER 'bflow_ingest_user' IDENTIFIED BY '*****';
GRANT SELECT, REPLICATION CLIENT, SHOW DATABASES ON *.* TO bflow_ingest_user IDENTIFIED BY '******';
GRANT SELECT, REPLICATION slave, SHOW DATABASES ON *.* TO bflow_ingest_user IDENTIFIED BY '*****';

Verification of Source

Execute the following checks to confirm the change detection/tracking is set up correctly for MS SQL Server and Oracle sources

Verification of MS SQL Server source

To verify if change tracking is already enabled on the database run the following SQL queries. If a row is returned then Change Tracking has been enabled for the database

SELECT *
  FROM sys.change_tracking_databases
 WHERE database_id = DB_ID('databasename');

The following SQL will list all the tables for which Change Tracking has been enabled for the selected database

USE databasename;
SELECT sys.schemas.name as schema_name,
       sys.tables.name as table_name
  FROM sys.change_tracking_tables
  JOIN sys.tables ON sys.tables.object_id = sys.change_tracking_tables.object_id
  JOIN sys.schemas ON sys.schemas.schema_id = sys.tables.schema_id;

Verification of Oracle source

To verify if Oracle is setup correctly for change detection execute the following queries.

Condition to be checked SQL to be executed Result expected
Is ArchiveLog mode enabled?
SELECT log_mode 
  FROM V$DATABASE;
ARCHIVELOG
Is Supplemental logging turned on at database level?
SELECT supplemental_log_data_min
  FROM V$DATABASE;
YES
Is Supplemental Logging turned on at table level?
SELECT log_group_name, 
       table_name, 
       always,
       log_group_type
  FROM dba_log_groups;
RESULT <log group name>, <table name>, ALWAYS, ALL COLUMN LOGGING

Creating Amazon Services

Creating An EC2 System

Please refer AWS documentation on how to create EC2 System.

Creating S3 Bucket

Please refer AWS documentation for creating S3 bucket.

Configuring EMR Cluster

NOTE: Not required when you source from the AWS Marketplace.

 

  1. Login to your AWS account and select the correct AWS region where your S3 bucket and EC2 container are located.
  2. Click on the services drop down in the header.
  3. Select EMR under Analytics or you can search for EMR.
  4. Click on the ‘Create cluster’ button
  5. In Create Cluster – Quick Options please type in Cluster Name (Name you will identify the Cluster with)
    keep the Logging check box selected, the S3 folder will be selected by default. Launch mode should be Cluster.
  6. Under Software configuration select release emr-5.14.0 and in Applications select Core Hadoop: Hadoop 2.8.3 with Ganglia 3.7.2, Hive 2.3.2, Hue 4.1.0, Mahout 0.13.0, Pig 0.17.0, and Tez 0.8.4
  7. Hardware configuration- Please select Instance type and number of Instances you want to run.
  8. Security and access –
    Please select the EC2 key pair that you want to use with the EMR Cluster. This key will be used to SSH into the Cluster. Permission should be set to default.
  9. You can add tags to your EMR cluster and configure the tag in Ingest to avoid the re-configuration in the software in case you plan to terminate the cluster and create a new.  This helps user to keep control of their clusters and save cost on AWS resources.
  10. Click on the ‘Create cluster’ button (provisioning of a cluster can take up to 15-20 min).

Starting & Stopping BryteFlow Ingest

If you are using the AMI from AWS Marketplace, BryteFlow Ingest will be preinstalled as a service in Windows.

Alternatively, you can install the service by executing the following command using the Command Prompt(Admin).

  1. Navigate to the directory of the installation.
  2. service.exe --WinRun4J:RegisterService

To Start BryteFlow Ingest

  1. Start the BryteFlow Ingest service using Windows Services or  Windows Task Manager
  2. Type the URL in the Chrome browser
localhost:8081

To Stop Bryteflow Ingest

  1. Stop the BryteFlow Ingest service
  2. Replication processes can also be aborted immediately by going to Task Manager
    -> Processes -> service.exe – and selecting “End Task”

Configuration of BryteFlow Ingest

The configuration of BryteFlow Ingest is performed though the web console

  1. Type the URL in the Chrome browser
localhost:8081

The screen will then present the following tabs (left side of the screen)

  • Dashboard
  • Connections
  • Data
  • Schedule
  • Configuration
  • Log

Dashboard

The dashboard provides a central screen when the overall status of this instance of BryteFlow Ingest can be monitored

  • The Data Sources Transfer Summary shows the number of records transferred. When hourly is selected you can view the transfer statistics for 24 hours, if daily is selected the monthly statistics are displayed.
    • The pie chart displays the status of the process
      • Extraction, denoted by red
      • Loading, denoted by orange
      • Loaded, denoted by green
    • Hovering on the bar graph gives the exact number of records transferred.
  • Schedule Extract Status displays the schedule status.
  • The Configure icon will take you to the configuration of the source tables, specifically the table, type of transfer, table primary key(s) and the selection of masked columns.
  • The Dashboard provides quick access for configuration of BryteFlow Ingest (Source, Destination Database, Destination File System and Email Notification)

 

Connections

The connections tab provides access to the the following sub-tabs

  • Source
  • Destination Database
  • Destination File System
  • Email Notification

Source

Configuration of MS SQL Server, Oracle, SAP (MS SQL Server) or SAP (Oracle) as a source

MS SQL Server and SAP (MS SQL Server)

  1. In the Database Type select “Microsoft SQL Server” from the drop-down list. For an SAP source with MS SQL Server database use “SAP (SQL Server)” from the drop-down list
  2. In the Database Host field please enter the IP address or hostname of the database server
  3. In the Database Port field please enter the port number on which the database server is listening on. The default port for MS SQL Server is 1433
  4. In the Database Name field please enter the name of your database e.g. BryteMSSQL
  5. Enter a valid MS SQL Server database user Id that will be used with BryteFlow Ingest. If a Windows user is required, please contact BryteFlow support info@bryteflow.com to understand how to configure this
  6. Enter Password; then confirm it by re-entering in Confirm Password
    • Please note, passwords are encrypted within BryteFlow Ingest
  7. Click on the ‘Test Connection’ button to test connectivity
  8. Click on the ‘Apply’ button to confirm and save the details

Oracle and SAP (Oracle)

  1. In the database type select ‘Oracle Log Miner’ from the drop-down list. For an SAP source with Oracle database use ‘SAP (Oracle)’ from the drop-down list
  2. In the database host field please enter the IP address or hostname of the database server.
  3. In the Database Port field please enter the port number on which the database server is listening on. The default port for Oracle is 1521
  4. In the database name field please enter Oracle SID
  5. Enter a valid Oracle database user id that will be used with Bryteflow Ingest.
  6. Enter Password; then confirm it by re-entering in Confirm Password
    • Please note, passwords are encrypted within BryteFlow Ingest
  7. Click on the ‘Test Connection’ button to test connectivity
  8. Click on the ‘Apply’ button to confirm and save the details

Please note: When using SID to connect to a dedicated Oracle server instance use ‘:SID’  in the Database Name of source configuration.

Destination Database

Available Destinations

  • S3 files using EMR
  • S3 files using EMR + Load to Redshift
  • Load to Redshift direct
  • Load to Snowflake direct

S3 files using EMR

  1. Enter Database Type: To use Amazon S3 as the destination, please use “S3 Files using EMR” from the drop-down list
  2. Enter Database Host: Enter the id of the EMR cluster e.g.  a-bcdefghijklm
  3. Click on the ‘Test Connection’ button to test the connection details
  4. Click on the ‘Apply’ button to confirm and save the details

 

S3 files using EMR + Load to Redshift

  1. Enter Database Type: To use Amazon S3 and Amazon Redshift as your destination, select “
  2. S3 files using EMR + Load to Redshift” from the drop-down list.
  3. Enter Database Host: Enter the endpoint for Amazon Redshift (excluding port)
    • eg. bryte-dc1.hdyesjdsdf.us-west-2.redshift.amazonaws.com
  4. Enter Database Port: Redshift default port being 5439
  5. Enter Database Name
    • eg dev
  6. Enter User Id: This is the Redshift user id that will load the schemas, tables, and data automatically to Redshift:
    • eg redshift_user
  7. Enter Password; re-enter to confirm
    • Please note, passwords are encrypted within BryteFlow Ingest
  8. Click on the ‘Test Connection’ button to test the connection details
  9. Click on the ‘Apply’ button to confirm and save the details

 

Load to Snowflake direct

  1. Enter ‘Database Type’ as ‘Load to Snowflake Direct’
  2. Database host is the Snowflake account URL For eg: abc123.ap-southeast-2.snowflakecomputing.com
  3. Database name to be in format <account>:<warehouse>:<db> For eg: abc123:COMPUTE_WH:DEMO_DB
  4. Enter Snowflake UserID in the Userid field
  5. Password to be configured under Password and Confirm Password section.

Destination File System

To Configure S3 as the filesystem perform the following steps.

  • Select File System as “AWS S3 with EMR” from the drop-down.
  • In the bucket name field, enter the bucket name that you have created on Amazon S3.
  • In the Delta Directory and Data Directory field, type in the name of the folders on Amazon S3
  • Enter the Amazon EMR instance ID eg. j-1ARB3SOSWXZUZ
  • EMR instance can be specified by Instance ID (as before) or a tag ‘value’ for the tag ‘BryteflowIngest’ or a tag and value expressed as ‘tag=value’. If more than one instance fits the criteria, the first one in the list will be picked.
  • In EMR Region and S3 Region select the correct regions from the drop-down list.
  • Enter AWS access key id and AWS secret key for accessing the S3 service.
    • Please note, keys are encrypted within BryteFlow Ingest
  • If you are using the KMS enter the KMS key
    • Please note, keys are encrypted within BryteFlow Ingest
  • Click on the ‘Test Directory’ button to test connectivity
  • Click on the ‘Apply’ button to confirm and save the details

Email Notification

To configure email updates to sent perform the following steps

  • Choose Mail Type: SMTP using TLS from the drop-down
  • In the Email Host, field type in the address of your SMTP server.
  • In the Email Port field, type in the port number on which the SMTP server is listening.
  • In the user id field type your complete email address from which will authenticate with the SMTP server.
  • Enter Password for the email; confirm.
    • Please note, passwords are encrypted within BryteFlow Ingest
  • In Send From, enter the email id from which the email will be send from, it has to be a valid email address on the server.
  • In Send To field enter the email address to which the notifications are sent to.
  • Click on Test Connection and then Apply to test the connection and save the settings.

Data

NOTE: Please review this section in conjunction with Appendix: Understanding Extraction Process

To select the table for transferring to destination database on Amazon Redshift and/or Amazon S3 bucket perform the following steps.

  1. Expand the Database.
  2. Browse to the table you want to be synced with Amazon Redshift or Amazon S3.
  3. Select the checkbox next to the table and then click on the table.
  4. On the right-hand side pane, select the type of transfer for the table i.e. By Primary Key or By Primary Key with History. With the Primary Key option, the table is replicated like for like to the destination. With the Primary Key with History option, the table is replicated as time series data with very change recorded with Slowly Changing Dimension type2 history (aka point in time)
  5. In the Primary Key column, select the Primary Key for the table by checking the checkbox next to the column name.
  6. You can also mask a column by checking the checkbox. By masking a column, the selected column will not be transferred to the destination.
  7. Click on the ‘Apply’ button to confirm and save the details
  8. Click on the ‘Full Extract’ button to request a full load of the table

 

This process of selecting tables, configuring primary keys and mask columns should be repeated for each of the tables. Once complete the next step is to…

  1. Navigate to Schedule tab
  2. Click on the ‘Sync New Tables’ button to initiate the process

 

Partitioning

Amazon S3 And Amazon Redshift
Partitioning can dramatically improve efficiency and performance, it can be set up when replicating to S3 (data is partitioned in folders) and/or Redshift (data is partitioned into tables).  The partitioning string is entered into the Partitioning folder field.  The format for partitioning is as follows

/@<column index>(<partition prefix><partition_format>)

 

Column Index

To build a partitioning folder structure the column index (starting from 1) of the column(s) to be used in the partition need to be known, in this simple table there are 3 columns…

  • customer.contact_id would be column index 1
  • customer.fullname would be column index 2
  • customer.email would be column index 3

 

Partition Prefix (optional)

Each partition can be prefixed with a named fixed string. The last character the Partition Prefix can be set to ‘=’, ending with ‘=’ is useful when creating partitions on S3 as this facilitates the automated build/recovery of partitions (see below).

  • The partition prefix string should be in lower case
  • The partition prefix string should not be the same as any of the existing column names

 

An example for partitioning on the first letter of of column 2 (fullname in this case) is as follows:

/@2(fullname_start=%1s)

Refer to the MSCK REPAIR TABLE command in AWS Athena documentation. A lower case partition prefix is recommended as an upper/mixed case partition prefix can result in issues when using Athena.

--Builds/recovers partitions and data associated with partitions 
MSCK REPAIR TABLE <athena_table_name>;

 

Once the MSCK REPAIR TABLE <athena_table_name>; has been executed all data will be added to the relevant partitions….any new data will be automatically added to the existing partitions. However if new partitions are created by BryteFlow Ingest the MSCK REPAIR TABLE <athena_table_name>; command will have to be re-executed to make the data available for query purposes in the Athena table.

 

Format

The format is applied to the column index specified above, for example to partition the data by year (on a date column) you’d use the format %y, to partition by the 24 hour format of time you’d use the format %H.

Partition Examples

Example 1: Year
Assuming Column Index 7 was a date field…

/@7(%y)

This would create partition folders such as

  • 2016
  • 2017
  • 2018
  • 2019

 

Example 2: YearMonthDay
Assuming Column Index 7 was a date field…

/@7(%y%M%d)

This would create partition folders such as

  • 20190101
  • 20190102
  • 20190103
  • 20190104

 

Example 3: yyyymmdd=YearMonthDay
Assuming Column Index 7 was a date field…

/@7(yyymmdd=%y%M%d)

This would create partition folders such as (useful format to automate recovery/initial population of data associated with partitions when using Athena)

  • yyyymmdd=20190101
  • yyyymmdd=20190102
  • yyyymmdd=20190103
  • yyyymmdd=20190104

 

Example 4: DOB column was used to create sub partitions of yr, mth and day
Assuming DOB Column Index 4 was a date

/@4(yr=%y)/@4(mth=%M)/@4(day=%d)

 

Example 5: model_nm=model_values and then sub partitions of yearmonth=YearMonth (multiple column partitioning)
Assuming Column Index 6 was a string (containing for example model_name_a, example model_name_b and example model_name_c) and Column Index 13 was a date field…

/@6(model_nm=%s)/@13(yearmonth=%y%M)
  • model_nm=model_name_a
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803
  • model_nm=model_name_b
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803
  • model_nm=model_name_c
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803

 

 

Available Partition Options

Format Datatype Description
%y
TIMESTAMP
Four digit year e.g. 2018
%M
TIMESTAMP
Two digit month with zero prefix e.g. March -> 03
%d
TIMESTAMP
Two digit date with zero prefix e.g. 01
%H
TIMESTAMP
Two digit 24 hour with zero prefix e.g. 00
%q
TIMESTAMP
Two digit month indicating the start month of the quarter e.g. March -> 01
%Q
TIMESTAMP
Two digit month indicating the end month of the quarter e.g. March -> 03
%r
TIMESTAMP
Two digit month indicating the start of the half year e.g. March -> 01
%R
TIMESTAMP
Two digit month indicating the end of the half year e.g. March -> 06
%i
INTEGER
Value of the integer e.g. 12345
%<n>i
INTEGER
Value of the integer prefixed by zeros to specified width e.g. %8i for 12345 is 00012345
%<m>.<n>i
INTEGER
Value of the integer is truncated to the number of zeros specified by <n> and prefixed by zeros to the width specified by <m> e.g. %8.2i for 12345 is 00012300
%.<n>i
INTEGER
Value of the integer is truncated to the number of zeros specified by <n> e.g. %.2i for 12345 is 12300
%s
VARCHAR
Value of the string e.g. ABCD
%<n>s
VARCHAR
Value of the string truncated to the specified width e.g. %2s for ABCD is AB

 

Schedule

 To configure extracts to run at a specific time perform the following steps.

  1. In case of Oracle Automatic is preselected and other options are disabled by default.
  2. For MS SQL Server you can choose the period in minutes.
  3. A daily extraction can be done at a specific time of the day by choosing hour and minutes in the drop-down.
  4. Extraction can also be scheduled on specific days of the week at a fixed time by checking the checkboxes next to the days and selecting hours and minutes in the drop-down.
  5. Click on the ‘Apply’ button to save the schedule.

 

Add a new table to existing Extracts

You can add additional table(s) if the replication is up and running and the need arises to add a new table to extraction process…

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • Navigate to the ‘Data’ tab
    • Select the new table(s) by navigating into database instance name, schema name and table name(s)
    • Configure the table, considering the following
      • Transfer type
      • Partitioning folder (refer to the Partitioning section of this document for details)
      • Primary key column(s)
      • Columns to be masked (optional, masked columns are excluded from replication, for example salary data)
      • Click on the ‘Apply’ button
      • Click on the ‘Full Extract’ button
      • Repeat process for each table that is required
  • Navigate to the ‘Schedule’ Tab
    • Click on the ‘Sync New Tables’ button

This will initiate the new table(s) for a full extract, once completed BryteFlow Ingest will automatically resume with processing deltas for the new and all the previously configured tables.

Resync data for Existing tables

If the Table transfer type is Primary Key with History, to resynch all the data from source, perform the following steps

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)

 

  • For Resync Data on ALL configured tables…
    • Navigate to Schedule tab
    • Click on the ‘Full Extract’ button

 

  • For Resync Data on selected tables…
    • Navigate to Data Sources tab
      • Select the table(s) by navigating into database instance name, schema name and table name(s)
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
    • Navigate to the ‘Schedule’ Tab
      • Click on the ‘Sync New Tables’ button

Rollback

In the event of unexpected issues (such as intermittent source database outages or perhaps network connectivity issues etc) it is possible to wind back in time the status of BryteFlow Ingest and replay all of the changes. Suppose there was a problem that occurred at say perhaps 16:04 hours, you can rollback BryteFlow Ingest to a point in time before these issues starting occurring, say 15:00.  To perform this operation….

  1. Navigate to the Schedule tab.

  1. Click on the ‘Rollback’ button
  2. The rollback screen appears, it provides a list of all of the points in time you can rollback to
    • Dependent upon the source database log retention policy
  3. Select the required date (radio button) and click ‘Select’

  1. Click on ‘Rollback’ to initiate the Rollback
  2. The rollback will now catch up from 15:00 to ‘now’ automatically replaying all of the log entries and applying them to the destination

Configuration

The configuration tab provides access to the the following sub-tabs

  • Source
  • Destination Database
  • S3
  • License
  • Recovery
  • Remote Monitoring

Source

Web Port: The port on which the BrtyeFlow Ingest server will run on.

Max Catchup Log: The number of Oracle archive logs will be processed at one instance.

Minimum Interval between Catchups: The minimum minutes between catchup batches.

Default transfer type: Default transfer option applied when not defined at the table level.

Handle Oracle raw columns: Handle raw columns by converting to hex string instead of ignoring as CHAR(1).

Destination Database

Max Updates: Combine updates that exceed this value.

Loading threads:  Number of Redshift loading threads.

Schema for all tables:  Ignore source schema and put all tables in this schema on destination

Schema for staging tables:  Schema for staging tables.

Retaining staging tables:  Retain staging tables.

Source Start Date:  Column name for source date.

History End Date:  Column name for history end date

End Date Value:  End date used for history.

Ignore database name in schema:  Don’t use DB as part of schema prefix for MS SQL Server.

No. of data slices:  Number of slices to split data file in to.

File compression:  Compression method, available options are as follows

  • None
  • BZIP2
  • GZIP
  • Parquet
  • ORC(snappy)
  • ORC(zlib)

 

S3

Keep S3 Files: Retain files in S3 after loading into AWS Redshift.

Use SSE:  Store in S3 using SSE (server-side encryption).

S3 Proxy Host: S3 proxy host name.

S3 Proxy Host Port:  S3 proxy port.

S3 Proxy user ID:  S3 proxy user id.

S3 Proxy Password:  S3 proxy password.

 

License

To get a valid license go to Configuration tab, then to the License tab and email the “Product ID” to the Bryte support team – support@bryteflow.com

NOTE: Licensing is not applicable when sourced from the AWS Marketplace.

 

High Availability / Recovery

BryteFlow Ingest provides High Availability Support, this means it automatically saves the current configuration and execution state to S3 and DynamoDB. As a result an instance of BryteFlow Ingest (including it’s current state) can be recovered should it be catastrophically lost. Before use this must be configured, select the Configuration tab and then the Recovery sub-tab to enter the required configuration.

Recovery Configuration

  1. In the Instance Name field enter a business friendly name for the current instance of BryteFlow Ingest
  2. Check Enable Recovery
  3. Enter the destination of the recovery data in S3, for example s3://your_bucket_name/your_folder_name/Your_Ingest_name
  4. Click on the ‘Apply’ button to confirm and save the details

The recovery data is stored in the DynamoDB (AWS fully managed NoSQL database service). The recovery data for the named instance (in this example Your_Ingest_Name is stored in a DynamoDB table called BryteFlow.Ingest as shown below:

Recovery Utilisation

To recover an instance of BryteFlow Ingest, you should source a new Instance of BryteFlow Ingest from the AWS Marketplace

  1. Use the AMI sourced from the AWS Marketplace
  2. Type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  3. Click on the ‘Existing Instance’ button

  1. Select the existing instance you wish to restore from the list displayed, in this example there is only one (it being ‘Your_Ingest_Name’), once the required instance has been selected click on the the ‘Select’ button

BryteFlow Ingest will collect the configuration and saved execution state of the instance selected (in this case ‘Your_Ingest_Name’) and restore accordingly.

 

NOTE: Recovery can also be a method of partial migration between environments (for example DEV to PROD stacks). As the restore will clone the exact source environment and source state further configuration will be required (for example updating configuration options of the PROD stack EMR instance, S3 location etc). But this method could cut down on some of the workload in cases where there are 100’s of tables to be configured and you’re moving to a new EC2 instance.

Remote Monitoring

BryteFlow Ingest comes pre-configured with remote monitoring capabilities. These capabilities leverage existing AWS technology such as CloudWatch Logs/Events. CloudWatch can be used (in conjunction with other assets in the AWS ecosystem) to monitor the execution of BryteFlow Ingest and in the event of errors/failures raise the appropriate alarms.

In addition to the integration with CloudWatch , BryteFlow Ingest also also writes the internal logs directly to S3 (BryteFlow Ingest console execution and error logs).

 

To Configure the remote monitoring perform the following steps :

  1. Enter an Instance Name, this being a business friendly name for the current instance of BryteFlow Ingest.
  2. Check Enable S3 Logging if you want to record data to S3 (console/execution logs).
  3. Enter the destination of the logging data in S3, for example s3://your_bucket_name/your_folder_name
  4. Enter the name of the CloudWatch Log Group (this needs to be created first in the AWS console)
  5. Enter the name of the CloudWatch Log Stream under the aforementioned Log Group (again this needs to be created first in the AWS console)
  6. Check Enable CloudWatch Metrics if required
  7. Check Enable SNS Notifications
  8. Enter the Topic ARN in the SNS Topic input box
  9. Click apply to save the changes

The events that BryteFlow Ingest pushes to AWS CloudWatch Logs console are as follows, please refer to Appendix: Understanding Extraction Process for a more detailed breakdown.

Bryte Events Description
LogfileProcessed Archive log file processed (Oracle only)
TableExtracted Source table extract complete MS SQL Server and Oracle (initial extracts only)
ExtractCompleted Source extraction batch is complete
TableLoaded Destination table load is complete
LoadCompleted All destination table loads in a batch is complete
HaltError Unrecoverable error occurred and turned the Scheduler to OFF
RetryError Error occurred but will retry

 

Log

You can monitor the progress of your extracts by navigating to the Log tab.

BryteFlow Ingest stores the log files under your install folder, specifically under the \log folder.
The path to log  file is as follows <install folder of Ingest>\log\sirus*.log, for example

c:\Bryte\Bryte_Ingest_37\log\sirus-2019-01.log

The error files are also stored under the \log folder.
The path to log  file is as follows <install folder of Ingest>\log\error*.log, for example

c:\Bryte\Bryte_Ingest_37\log\error-2019-01.log

These logs can also be reviewed/stored in S3, please refer to the following section on Remote Monitoring for details.

Appendix: Understanding Extraction Process

Extraction Process

Understanding Extraction.

Extraction has two parts to it.

  1. Initial Extract.
  2. Delta Extract.

Initial Extract.

An initial extract is done for the first time when we are connecting a database to BryteFlow Ingest software. In this extract, the entire table is replicated from the source database to the destination (AWS S3 or AWS Redshift).

A typical extraction goes through the following processes. Below example shown is the extraction from MS SQL Server as source and Amazon S3 bucket and destination.

Extracting 1
Full Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Extracted 1
Full Extract database_name:table_name complete (4 records)
Load file 1
Loading table emr_database:dbo.names with 4 records(220 bytes)
Transferring null to S3
Transferred null 10,890 bytes in 8s to S3
Transferring database_name_table_name to S3

Delta Extract.

After the initial extract, when the database is replicated to the destination, we do a delta extract. In delta extracts, only the changes on the source database are extracted and merged with the destination.

After the initial extraction is done all the further extract are Delta Extracts (changes since the last extract.)

A typical delta extracts log file is shown below.

Extracting 2
Delta Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Delta Extract database_name complete (10 records)
Extracted 2
Load file 2
Loaded file 2

First Extract

Extracting Database for the first time.

Keep all defaults. Click on Full Extract.

The first extract always has to be a Full Extract. This gets the entire table across and then the delta’s are populated periodically with the desired frequency.

Schedule Extract

 

To configure extracts to run at a specific time perform the following steps.

  1. In case of Oracle Automatic is preselected and other options are disabled by default.
  2. For MS SQL Server you can choose the period in minutes.
  3. A daily extraction can be done at a specific time of the day by choosing hour and minutes in the drop-down.
  4. Extraction can also be scheduled on specific days of the week at a fixed time by checking the checkboxes next to the days and selecting hours and minutes in the drop-down.
  5. Click Apply to save the schedule.

Add a new table to existing Extracts

After database have been selected for extraction and they are replicating. If a need arises to add a new table to extraction process then it can be done by following steps.

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • Navigate to Data tab
    • Select the new table(s) by navigating into database instance name, schema name and table name(s)
    • Configure the table, considering the following
      • Select transfer type
      • Select partitioning folder (refer to Partitioning section for details)
      • Select primary key column(s) where applicable
      • Select columns to be masked (optional, these are excluded from extraction, for example salary data)
      • Click on the ‘Apply’ button
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
  • Navigate to the Schedule Tab
    • Click on ‘Sync New Tables’ button

This will include the new table(s) for a full extract and also resume with deltas for all the previously configured tables and the newly added table(s).

Resync data for Existing tables

If the Table transfer type is Primary Key with History, to resync all the data from source, perform the following steps

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • For Resync Data on ALL configured tables…
    • Navigate to Schedule tab
    • Click on the ‘Full Extract’ button
  • For Resync Data on selected tables..
    • Navigate to Data Sources tab
      • Select the table(s) by navigating into database instance name, schema name and table name(s)
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
    • Navigate to the Schedule Tab
      • Click on ‘Sync New Tables’ button

 

Appendix: Bryte Events for AWS CloudWatch Logs and SNS

BryteFlow Ingest supports connection to AWS Cloudwatch Logs, Cloudwatch Metrics and SNS. This can be used to monitor the operation of Bryteflow Ingest and integrate with other assets leveraging the AWS infrastructure.

AWS Cloudwatch Logs can be used to send logs of events like load completion or failure from Bryteflow Ingest. Cloudwatch Logs can be used to monitor error conditions and raise alarms.

Below are the list of Events that BryteFlow Ingest pushes to AWS CloudWatch Logs console  and for AWS SNS :

Bryte Events Description
LogfileProcessed Archive log file processed (Oracle only)
TableExtracted Source table extract complete MS SQL Server and Oracle (initial extracts only)
ExtractCompleted Source extraction batch is complete
TableLoaded Destination table load is complete
LoadCompleted All destination table loads in a batch is complete
HaltError Unrecoverable error occurred and turned the Scheduler to OFF
RetryError Error occurred but will retry

Below is the detail for each of the Bryte Events :

Event : LogfileProcessed

Attribute

Is Metric(Y/N)?

Description

type N “LogfileProcessed”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
fileSeq N File sequence
file N File name
dictLoadMS Y Time taken to load dictionary in ms
CurrentDBDate N Current database date
CurrentServerDate N Current Bryte server date
parseMS Y Time taken to parse file in ms
parseComplete N Timestamp when parsing is complete
sourceDate N Source date
Event : TableExtracted

Attribute

Is Metric(Y/N)?

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
Event : ExtractCompleted

Attribute

Is Metric(Y/N)?

Description

type N “ExtractCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “EXTRACT”
jobSubType N Extract type
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
Event:TableLoaded

Attribute

Is Metric(Y/N)?

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
destInserts Y No. of Inserts in destination
destUpdates Y No. of Updates in destination
destDeletes Y No. of Deletes in destination
Event : LoadCompleted

Attribute

Is Metric(Y/N)?

Description

type N “LoadCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “LOAD”
jobSubType N Sub type of the “LOAD”
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
Event : HaltError

Attribute

Is Metric (Y/N)?

Description

type N “HaltError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier
Event : RetryError

Attribute

Is Metric (Y/N) ?

Description

type N “RetryError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier

Appendix: Release Notes

Release details (by date descending, latest version first)

BryteFlow Ingest 3.7.3

Release Notes BryteFlow Ingest – v3.7.3

Released April 2019

  • New Features 
    • Notifications : A notification icon appears on the top bar. The number of issues appear as a bubble. The bubble is red if there at least one error  and orange for warnings. Hovering gives the count of errors and warnings in a tooltip. Clicking on the icon lists the issues in a dropdown list.  
    • Help: Clicking on the help icon takes you to the online documentation.  
    • Drivers: The supported source and destination drivers has been streamlined.  
    • EMR tags: EMR instance can be specified by Cluster ID or a tag value for the tag BryteFlowIngest or a tag and value expressed as “tag=value”. 
    • Port: Changing the web port is allowed for non-AMI installations.  

 

  • Bug Fixes
    • An issue with an error on no initial extract even when skip initial extract is done for a table is now fixed.  
    • Some attribute values in Cloudwatch Logs which were previously blank have now been fixed. 
    • An issue where all pending jobs were cancelled on a failure in the current job is now fixed.  
    • A redundant field when “S3 Files using EMR” is selected as a destination has been removed.  
    • The Apply button is disabled if no changes have been made in Source/Destination/File screens. This gets around the problem of the connection being flagged with a warning on pressing apply even if no fields have been changed.  
    • Initial extract in Oracle now set the effective date to the database date instead of the server date. 

 

  • Known Issues
    •  Non-AMI EC2 may show some warning messages on startup.  

BryteFlow Ingest 3.7

Released:  January 2019

  • BryteFlow Ingest 3.7 available on AWS Marketplace as an AMI
    • Pay as you go on AWS Marketplace
    • Hourly/Annual billing options on AWS Marketplace
    • No licence keys on AWS Marketplace
    • 5 day trial on AMI (new customers only)
  • Volume based licensing
    • 100GB
    • 300GB
    • 1TB
    • > 1TB (contact Bryte Support)
  • High Availability
    • Automatic backup of current state
    • Automatic cut-over and recovery following EC2 failure
    • IAM support
  • Rollback to previous saved point in time
    • Dependent upon source db logs
  • Partitioning
    • Built in to Interface
    • Partition names configurable, wide range of formats
    • AWS Athena friendly partition names
  • New S3 compression options
    • Parquet
    • ORC(Snappy)
    •  ORC(Zlib)
  • Remote Monitoring (integrated with AWS Services)
    • Cloudwatch
    • Logging
    • Metrics
    • SNS

Optimize usage of AWS resources / Save Cost

EMR Tagging

BryteFlow Ingest supports EMR Tagging feature which helps you dramatically to save cost on the EMR Clusters. This helps customers to control EMR cost by terminating the cluster when not in use without interrupting Ingest config and schedule.

You can add default tag ‘BryteflowIngest’ when creating a new Amazon EMR cluster for Ingest or you can add, edit, or remove tags from a running Amazon EMR cluster.  And, use the tag name and value in the EMR Configuration section of Ingest as in below image.

 

Upgrade software versions from AWS Marketplace

Users already using BryteFlow AMI Standard Edition can easily upgrade to the latest version of the software directly from AWS Marketplace by following few easy steps .

Steps to perform in your current install :

  • As you are planning to upgrade you need to make sure you have all your setup backed up.
  • To save your current instance setup and stats go to ‘Configuration’ ->  ‘Recovery’ 
  • In the Instance Name field enter a business friendly name for the current instance of BryteFlow Ingest
  • Check Enable Recovery
  • Enter the destination of the recovery data in S3, for example s3://your_bucket_name/your_folder_name/Your_Ingest_name
  • Click on the ‘Apply’ button to confirm and save the details
  • Once the recovery is setup, you are good to turn the ‘Schedule to OFF’ of the current version and let it come to a complete pause.
  • Go to product URL from AWS marketplace https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • In the product configuration settings, choose the latest available version from the ‘software version’ dropdown.
  • And, ‘Continue to Launch’ your new instance.
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown or go by the default
  • Choose the ‘Subnet Settings’
  • Update the ‘Security Group Settings’ or go by ‘default’
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

 

Steps to perform in your new install :

  • Connect to the new instance using ‘Remote Desktop Connections’ to the EC2 launched via AMI.
  • Once connected to the EC2 instance, Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘Existing Instance’ button as we need to resume BryteFlow Ingest from the last saved
    • Select the existing instance you wish to restore from the list displayed, in this example there is only one (it being ‘Your_Ingest_Name’), once the required instance has been selected click on the the ‘Select’ button

  • BryteFlow Ingest will collect the configuration and saved execution state of the instance selected (in this case ‘Your_Ingest_Name’) and restore accordingly.
  • Go to the ‘Connections’ tab, and test the ‘Source’, Destination’ and ‘File System’ connections prior to turning the ‘Schedule On’.
  • In  case of any connection issues, please check the firewall settings of the EC2 and source systems.
  • Once all connections are ‘Tested OK’, go to ‘Schedule’ tab and turn the schedule to ‘ON’.
  • This completes the upgrade and resumes the Ingestion as per the specified schedule.

 

Suggest Edit