User Guide
  1. BryteFlow Ingest - Real-time data integration for S3, Redshift
    1. Supported Database Sources
    2. Supported Destinations
  2. BryteFlow Ingest Architecture
  3. Prerequisite
  4. Additional AWS services
  5. Launch BryteFlow Enterprise Edition from AWS Marketplace
  6. Launch BryteFlow Ingest from AWS Marketplace : Standard Edition
  7. Environment Preparation
    1. Recommended Network ACL Rules for EC2
    2. AWS Identity and Access Management (IAM) for BryteFlow
    3. Managing Access Keys
    4. Data Encryption
    5. Testing The Connections
    6. Preparing MS SQL Server
    7. Preparing On-premise Oracle
    8. Preparing Oracle on Amazon RDS
    9. Preparing On-premise MySQL
    10. Preparing MySQL on Amazon RDS
    11. Data Types in MS SQL Server
    12. Data Types in Oracle
  8. Security Permissions Required on Source
    1. Security for MS SQL Server
    2. Security for Oracle
    3. Security for MySQL
  9. Verification of Source
    1. Verification of MS SQL Server source
    2. Verification of Oracle source
  10. Creating Amazon Services
    1. Creating An EC2 System
    2. Creating S3 Bucket
    3. Configuring EMR Cluster
  11. Starting & Stopping BryteFlow Ingest
  12. Configuration of BryteFlow Ingest
  13. Dashboard
  14. Connections
    1. Source
      1. MS SQL Server and SAP (MS SQL Server)
      2. Oracle and SAP (Oracle)
    2. Destination Database
    3. Destination File System
    4. Email Notification
  15. Data
    1. Partitioning
  16. Schedule
    1. Rollback
  17. Configuration
    1. Source
    2. Destination Database
    3. S3
    4. License
    5. High Availability / Recovery
      1. Recovery Configuration
      2. Recovery Utilisation
      3. Recovery from Faults
      4. Time to Recover
      5. Recovery Testing
    6. Recommended Risk Audit mechanisms
    7. Remote Monitoring
  18. Log
  19. Optimize usage of AWS resources / Save Cost
    1. Tagging AWS Resources
  20. Upgrade BryteFlow versions from AWS Marketplace when using AMI
  21. BryteFlow: Licencing Model
  22. BryteFlow Ingest : Pricing
  23. BryteFlow Support Information
  24. Appendix: Understanding Extraction Process
    1. Extraction Process
    2. First Extract
    3. Schedule Extract
    4. Add a new table to existing Extracts
    5. Resync data for Existing tables
  25. Appendix: Bryte Events for AWS CloudWatch Logs and SNS
  26. Appendix: Release Notes
    1. BryteFlow Ingest 3.8
    2. BryteFlow Ingest 3.7.3
    3. BryteFlow Ingest 3.7

BryteFlow Ingest - Real-time data integration for S3, Redshift

BryteFlow Ingest is a real-time data replication software for S3 and Redshift. Its a high-performance software that facilitates real-time change data capture from sources with zero load on the source systems. BryteFlow Ingest captures the changes and transfers them to the target system. It automates the creation of either an exact copy or a time series copy of the data source in the target. BryteFlow Ingest performs an initial full load from source and then incrementally merges changes to the destination of choice, the entire process being fully automated.

It works with companion software BryteFlow Blend for real-time data extraction and data preparation.

Supported Database Sources

BryteFlow Ingest supports the following database sources using the Amazon Machine Image (AMI) from AWS Marketplace https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • MS SQL Server
  • Oracle
  • SAP (MS SQL Server)
  • SAP (Oracle)
  • API integration with files
  • Any Files
BryteFlow Ingest also supports the following database sources, if you wish to source from these you should contact Bryte directly info@bryteflow.com.
  • MySQL
  • Salesforce
  • Netsuite

Supported Destinations

The supported destinations are as follows:

  • S3
  • Redshift
  • Snowflake

Looking for a different destination?

BryteFlow does custom source/destination on customer request, please contact us directly at  info@bryteflow.com.

BryteFlow Ingest Architecture

BryteFlow Ingest can replicate data from any database, any API and any flat file to Amazon S3, Redshift or Snowflake, through a simple point and click interface. It is an entirely self-service and automated data replication tool.

BryteFlow offers various deployment strategies to its Customers, mentioned below:

  • Standard deployment on AWS Environment
  • High Availability deployment in an AWS Environment
  • Hybrid deployment– using On-premise and Cloud infrastructure

BryteFlow Ingest uses log-based Change Data Capture for data replication. Below is the Technical Architecture Diagram, showcasing the same for a standard setup in AWS Environment.

The above architecture diagram describes a Standard deployment type showcasing the below features:

  • AWS services running along with BryteFlow Ingest
  • BryteFlow Architecture recommended for a VPC in AWS.
  • Data Flow between source, AWS and destination with security and monitoring feature used.
  • Security which includes IAM is in separate group and is interfaced with BryteFlow Ingest
  • All supported destinations and AWS services are listed to which BryteFlow integrates

High Availability Architecture

The high availability architecture explains the way BryteFlow is deployed in a multi-AZ setup. In case of any instance or AZ failures it can be auto scaled in another AZ, without incurring any data loss.

Hybrid Architecture

BryteFlow also offers a hybrid deployment model to its Customers, which is mix of services on-premise and in the AWS Cloud. BryteFlow Ingest can be easily setup on a Windows server which is in an on-premise environment. Whereas, all the destination end-points reside on AWS Cloud, making it a hybrid model. Its recommended to follow secure connectivity between on-premise and AWS services which can be achieved by using VPN connection or AWS Direct Connect, refer to the blog which talks about choices for hybrid cloud connectivity.

Prerequisite

Prerequisites of using Amazon Machine Image (AMI) from AWS Marketplace

Using the AMI sourced from the AWS Marketplace requires:

  • Selection of BryteFlow Ingest volume
  • Selection of EC2 instance type
  • Ensure connectivity between the server/EC2 hosting the BryteFlow Ingest software and
    • the source
    • Amazon S3
    • Amazon EMR
    • Amazon Redshift (if needed as a destination)
    • Snowflake (if needed as a destination)
    • DynamoDB (if high availability option is required)

Follow below steps prior to launching BryteFlow in AWS via AMI or Custom install on an EC2:

  1. Create a policy giving relevant name for EC2 i.e. “BryteFlowEc2Policy”. Refer AWS guide on creating policies.
  2. Use the policy json provided in the below section “AWS Identity and Access Management (IAM) for BryteFlow”
  3. Create an IAM role “BryteFlowEc2Role”. Refer AWS guide for step-by-step instruction on creating roles .
  4. Attach the policy “BryteFlowEc2Policy” to the role.
  5. Similarly, create a Lambda policy which is required for disk checks and attach the Lambda policy json  provided in below section “Recovery from faults”.

Available options with AMI are volume based, recommended options for EC2 and EMR for each of these volumes.

 

Total Data Volume EC2 Recommended EMR Recommended 
< 100 GB t2.small 1 x m4.xlarge master node
2 x c5.xlarge task nodes
100GB – 300GB t2.medium 1 x m4.xlarge master node
2 x c5.xlarge task nodes
300GB – 1TB t2.large 1 x m4.xlarge master node
2 x c5.xlarge task nodes
> 1TB Seek expert advice from support@bryteflow.com Seek expert advice from support@bryteflow.com

 

NOTE: Evaluate the EMR configuration depending on the latency required.

These should be considered a starting point, if you have any questions please seek expert advice from support@bryteflow.com

 

System Requirement when not using Amazon Machine Image (AMI)

  • Port 8081 should be open on the server hosting the BryteFlow Ingest software
  • Google Chrome browser is required as the internet browser on the server hosting BryteFlow Ingest software
  • Java version 8 or higher is required
  • If using MS SQL Server as a source, please download and install the BCP utility
  • Ensure connectivity between the server hosting the BryteFlow Ingest software and the source, Amazon S3, Amazon EMR, Amazon Redshift and DynamoDB (if high availability option is required)

Additional AWS services

As BryteFlow uses several AWS resources to fulfill user requirements, the cost of these services are separate to BryteFlow charges and are charged by AWS for your account. If you are using Snowflake as a destination the cost of Snowflake Data warehouse is separate to BryteFlow.

Below list provides list of other billable services within BryteFlow:

Service Mandatory Billing Type
AWS EC2                   Y Pay-as-you-go
Additional EBS storage attached to EC2                   Y Based on size
AWS S3                   Y Pay-as-you-go
AWS EMR                   Y (except direct load options to Redshift and Snowflake) Pay-as-you-go
AWS Redshift                    N Pay-as-you-go
AWS CloudWatch Logs and metrics                    N Pay-as-you-go
AWS SNS                    N Pay-as-you-go
AWS Dynamo DB

(5 WCUs /5 RCUs)

                   N Pay-as-you-go
Snowflake DW                    N Pay-as-you-go
AWS Lambda                    Y Pay-as-you-go
AWS KMS                    Y Pay-as-you-go

BryteFlow recommends to use below mentioned instance types for EC2 with EBS volumes attached:

EC2 Instance Type BryteFlow Standard Edition BryteFlow Enterprise Edition Recommended EBS volumes EBS Volume Type
t2.small Volume < 100 GB NA 50 GB General Purpose SSD (gp2) Volumes
t2.medium Volume >100 and < 300 GB Volume < 100 GB 100 GB General Purpose SSD (gp2) Volumes
t2.large Volume > 300 GB and < 1 TB Volume >100 and < 300 GB 500 GB General Purpose SSD (gp2) Volumes
m4.large NA Volume > 300 GB and < 1 TB  500 GB General Purpose SSD (gp2) Volumes

Launch BryteFlow Enterprise Edition from AWS Marketplace

Steps to launch BryteFlow from AWS Marketplace: Enterprise Edition

  • Please refer to the ‘Environment Preparation’ section before proceeding to launch BryteFlow from an AMI.
  • Go to the product URL https://aws.amazon.com/marketplace/pp/B079PWMJ4B
  • Click ‘Continue to Subscribe’
  • Click ‘Continue to Configuration’. This brings up the default ‘Fulfillment Option’ with the latest software version.
  • Choose the AWS Region you would like to go for or else go by the default AWS Region that is already present in the drop-down.

 

  • Click ‘Continue to Launch’
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown or go by the default
  • Please select the ‘Private Subnet’ under ‘Subnet Settings’. If none exists, its recommended to create one. Please follow detail AWS User Guide for Creating a Subnet.
  • Update the ‘Security Group Settings’ or create one based on BryteFlow recommended steps as below:
    • Assign a name for the security group, for eg: BryteFlowIngest
    • Enter a description of your choice
    • Add inbound rule(s) to RDP the EC2 with the Custom IP address
    • Add outbound rule(s) to allow the EC2 instance access the source db. DB ports will vary based on the source database , please add rules to allow the instance access to specific source database ports.
    • For more details, refer to BryteFlow recommendation on Network ACLs for your VPC in the below section ‘Recommended Network ACL Rules
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

Additional information regarding launching an EC2 instance can be found here

Any trouble launching or connecting to the EC2 instance, please refer to the troubleshooting guides below:

 

** Please note that BryteFlow Blend is a companion product to BryteFlow Ingest. In order to make the most of enterprise capabilities, first setup BryteFlow Ingest completely. Thereafter, no configuration is required in BryteFlow Blend, its all ready to go. Start with the transformations directly off  AWS S3.

Once connected to the EC2 instance:

  • Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘New Instance’ button and do the setup for your environment (refer to the section regarding Configuration Of BryteFlow Ingest in this document for further details)
    • ‘Existing Instance’ should only be clicked when recovering an instance of BryteFlow Ingest (refer to the Recovery section of this document for further details)
    • Once Ingest is all setup and is replicating to the desired destination successfully
    • Launch BryteFlow Blend from the Google chrome browser using bookmark ‘BryteFlow Blend’
    • Or type localhost:8082 into the Google chrome browser to open the BryteFlow Blend web console
    • BryteFlow Blend is tied up to BryteFlow Ingest and no AWS Location configuration is required.
    • This makes users ready to start their data transformations of S3.
    • For details on Blend setup and Usage refer to the BryteFlow Blend User Guide: http://docs.bryteflow.com/Bryteflow-Blend-User-Guide/

 

 

Launch BryteFlow Ingest from AWS Marketplace : Standard Edition

Steps to launch BryteFlow Ingest from AWS Marketplace: Standard Edition

  • Please refer to the ‘Environment Preparation’ section before proceeding to launch BryteFlow from an AMI.
  • Go to the product URL https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • Click ‘Continue to Subscribe’
  • Click ‘Continue to Configuration’. This brings up the default ‘Fulfillment Option’ with the latest software version.
  • Choose the AWS Region you would like to go for or else go by the default AWS Region that is already present in the dropdown.

  • Click ‘Continue to Launch’
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown
  • Please select the ‘Private Subnet’ under ‘Subnet Settings’. If none exists, its recommended to create one. Please follow detail AWS User Guide for Creating a Subnet.
  • Update the ‘Security Group Settings’ or create one based on BryteFlow recommended steps as below:
    • Assign a name for the security group, for eg: BryteFlowIngest
    • Enter a description of your choice
    • Add inbound rule(s) to RDP the EC2 with the Custom IP address
    • Add outbound rule(s) to allow the EC2 instance access the source db. DB ports will vary based on the source database , please add rules to allow the instance access to specific source database ports.
    • For more details, refer to BryteFlow recommendation on Network ACLs for your VPC in the below section ‘Recommended Network ACL Rules
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

Additional information regarding launching an EC2 instance can be found here

Any trouble launching or connecting to the EC2 instance, please refer to the troubleshooting guides below:

Once connected to the EC2 instance:

  • Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘New Instance’ button (refer to the section regarding Configuration Of BryteFlow Ingest in this document for further details)
    • ‘Existing Instance’ should only be clicked when recovering an instance of BryteFlow Ingest (refer to the Recovery section of this document for further details)

Environment Preparation

Below is the guide provided to prepare an environment for BryteFlow in AWS :

  1. Creating a Subnet in Your VPC : Considering that BryteFlow Ingest needs to be setup in Customers VPC, its recommended to create a new subnet within the VPC for BryteFlow. Please follow detail AWS User Guide for Creating a Subnet.
  2. Creating Security Group: security group acts as a virtual firewall for your instance to control inbound and outbound traffic. Security groups acts at the instance level, not the subnet level. Therefore, each instance in a subnet in your VPC can be assigned to a different set of security groups. If you don’t specify a particular group at launch time, the instance is automatically assigned to the default security group for the VPC which is highlnot-recommended. For each security group, you add rules that control the inbound traffic to instances, and a separate set of rules that control the outbound traffic. This section describes the basic things that you need to know about security groups for your VPC and their rules. For more details, refer to AWS guide for security Groups.
  3. Security Group Rules: You can add or remove rules for a security group which is authorizing or revoking inbound or outbound access. A rule applies either to inbound traffic (ingress) or outbound traffic (egress). You can grant access to a specific CIDR range, or to another security group in your VPC or in a peer VPC (requires a VPC peering connection).

  4. Creating Access Key ID and Secret Access Key, BryteFlow uses access key id and secret access key to connect to AWS services from an on-premise setup. its recommended to have a separate set of access keys for the BryteFlow User account. Please follow the below steps from Admin User account to create access keys:
    1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
    2. In the navigation pane, choose Users.
    3. Choose the name of the ‘BryteFlow’ user whose access keys you want to manage, and then choose the Security credentials tab.
    4. In the Access keys section, to create an access key, choose Create access key. Then choose Download .csv file to save the access key ID and secret access key to a CSV file on your computer. Store the file in a secure location. You will not have access to the secret access key again after this dialog box closes. After you download the CSV file, choose Close. When you create an access key, the key pair is active by default, and you can use the pair right away.

For more information on secret keys refer to AWS documentation here.

For security reasons, when using access keys for on-premise installation its recommended to rotate keys after certain time, say at a period of 90 days. More details mentioned in the section ‘Managing Access Keys’

5. Creating Auto-Scaling Group, When BryteFlow Ingest needs to be deployed in a HA environment, its recommended to have your EC2 alongwith an Auto Scaling Group. Please follow the steps here to launch the same via AWS console.

AWS Identity and Access Management (IAM) for BryteFlow

You can use IAM roles to delegate access to your AWS resources. With IAM roles, you can establish trust relationships between your trusting account and other AWS trusted accounts. The trusting account owns the resource to be accessed and the trusted account contains the users who need access to the resource.

BryteFlow’s Recommendations: 

  • Create Individual User for BryteFlow – DO NOT use Root User account
  • Use Groups to Assign Permissions to BryteFlow IAM User

    Instead of defining permissions for individual Bryte IAM users, it’s usually more convenient to create groups that relate to job functions (administrators, developers, accounting, etc.). Next, define the relevant permissions for each group. Finally, assign IAM users to those groups. All the users in an IAM group inherit the permissions assigned to the group. That way, you can make changes for everyone in a group in just one place.

  • Grant Least privilege – Its recommended to grant only minimal required permissions to the IAM user/groups. BryteFlow User requires the basic permissions on S3, CloudWatch, Dynamodb and Redshift (if needed).
  • BryteFlow needs access to S3 , EC2 , EMR, SNS and Redshift (if needed as a destination) with the below listed minimum privileges.
    • Sample policy that is required by BryteFlow is shared below:
      {
      	"Version": "2012-10-17",
      	"Statement": [{
      			"Sid": "1",
      			"Action": [
      				"s3:DeleteObject",
      				"s3:GetObject",
      				"s3:ListBucket",
      				"s3:PutObject"
      			],
      			"Effect": "Allow",
      			"Resource": "arn:aws:s3:::<bucket_name>"
      		},
      		{
      			"Sid": "2",
      			"Action": [
      				"ec2:AcceptVpcEndpointConnections",
      				"ec2:AcceptVpcPeeringConnection",
      				"ec2:AssociateIamInstanceProfile",
      				"ec2:CreateTags",
      				"ec2:DescribeTags",
      				"ec2:RebootInstances"
      			],
      			"Effect": "Allow",
      			"Resource": "arn:aws:ec2:<ec2_instance_id>"
      		},
                     {
                             "Sid": "3",
                             "Action": [
                                        "elasticmapreduce:AddJobFlowSteps",
                                        "elasticmapreduce:DescribeStep",
                                        "elasticmapreduce:ListSteps",
                                        "elasticmapreduce:RunJobFlow"
                                       ],
                             "Effect": "Allow",
                             "Resource": "arn:aws:elasticmapreduce:<region>:<account>:<resourceType>/<resourceId>"
                     },
                     { 
                             "Sid": "4", 
                             "Action": [
                                        "sns:Publish"
                                       ],
                             "Effect": "Allow", 
                             "Resource": "arn:aws:sns:<regionid>:<account_id>:<sns_name>"
      		},
      		{
      			"Sid": "5",
      			"Action": ["redshift:ExecuteQuery",
      				"redshift:FetchResults",
      				"redshift:ListTables"
      			],
      			"Effect": "Allow",
      			"Resource": "arn:aws:redshift:<region>:<accountID>:<relative-id>"
      		}
      	]
      }
  • For more details on setting up IAM roles and policies refer to AWS documentation : https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-set-up.html

Managing Access Keys

BryteFlow uses Access key and secret key to authenticate to AWS services like S3, Redshift etc. If installation is on-premise it requires to enter AWS access key id and AWS secret key for accessing the S3 service, else IAM roles will be used for AMI based installation.

For security reasons, when using access keys for on-premise installation its recommended to rotate keys after certain time, say at a period of 90 days. After the new keys are generated it needs to updated in Ingest’s configuration. Please follow below mentioned steps:

  1. Open Ingest instance which needs to be updated with new key in the web browser
  2. Go to ‘Schedule’ tab. Stop the replication schedule for BryteFlow Ingest by turning ‘OFF’ the Schedule button.
  3. Go to ‘Connections’-> ‘Destination File System’
  4. Enter the new ‘Access key’ and ‘Secret access key’ in the respective text box and hit ‘Apply’
  5. Once the keys are saved, resume replication by turning ‘ON’ the schedule.

Details of key rotation can be found in AWS documentation https://docs.aws.amazon.com/kms/latest/developerguide/rotate-keys.html

When deploying the Project via AMI on an EC2, IAM roles are getting used by default. IAM roles should have recommended policies attached. Please refer to section ‘BryteFlow and AWS Identity and Access Management (IAM)‘ for the list of policies and permissions.

 

Data Encryption

BryteFlow ensures various mechanisms for data security by applying encryption,

  1. With KMS, BryteFlow Ingest uses customer specified KMS key to encrypt customer data on AWS S3. Configure the customer KMS id in BryteFlow, which is used to encrypt data on S3.
  2. With AES-256, BryteFlow Ingest supports server side encryption. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data, which is supported by BryteFlow by default.

Configure Data Encryption

BryteFlow adheres to AWS recommendation of applying encryption of data at rest and in transit. It can be achieved by creating the keys and certificates that are used for encryption.

For more information, refer to AWS documentation on Providing Keys for Encrypting Data at Rest with Amazon EMR and Providing Certificates for Encrypting Data in Transit with Amazon EMR Encryption.

Specifying Encryption Options Using the AWS Console

Choose options under Encryption according to the following guidelines :

  • Choose options under At rest encryption to encrypt data stored within the file system.

Under S3 data encryption, for Encryption mode, choose a value to determine how Amazon EMR encrypts Amazon S3 data with EMRFS. BryteFlow Ingest supports the below encryption mechanism.

  • SSE-S3
  • SSE-KMS or CSE-KMS

Encryption in-transit

BryteFlow uses SSL to establish any connection(AWS services, databases etc.) for data flow, ensuring secure communication in-transit.

SSL involves complexity of managing security certificates and its important to keep the certificates active all the time for uninterrupted service.

AWS Certificate Manager handles the complexity of creating and managing public SSL/TLS certificates. Customers can have settings to get notified before the expiry date is approaching and can renew upfront, so that the services run uninterruptedly. Refer AWS guide to manage ACM here.

Testing The Connections

Verify if the connectivity to remote services is available.

To test the remote connections you would need telnet utility. Telnet has to be enabled from the control panel in Turn on Windows Feature.

  1. Go to start and then Run and type CMD, and click OK.
  2.  Type the following at the command prompt.
telnet <IP address or Hostname> Port number

For example

telnet 192.168.1.1 8081

If the connection is unsuccessful then an error will be shown.
If the command prompt window is blank only with the cursor, then the connection is successful and the service is available.

Preparing MS SQL Server

Enable Change Tracking for a database in MS SQL Server

This section applies to MS SQL Server, the version should be higher than 2008.

  • To enable Change Tracking at the database level execute the following query:
    ALTER DATABASE databasename
    SET CHANGE_TRACKING = ON
    (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON)
  • Enable Change Tracking at the table level as below execute the following query:
    ALTER TABLE tablename
    ENABLE CHANGE_TRACKING
    WITH (TRACK_COLUMNS_UPDATED = ON)
  • Enable Change Tracking at the database and table level for all the databases and tables to be replicated.
  • To enable view permission to event viewer logs execute the following query:
    GRANT VIEW SERVER STATE TO "AccountName"

*** Please Note:  If you are configuring BryteFlow Ingest for a completely new SQL Server database, Please make sure to perform at least ‘One Transaction’ on the database to generate the log sequence no. for BryteFlow to startwith.

Preparing On-premise Oracle

Enable Change Tracking for an On-Premise Oracle Server

Execute the following queries on Oracle Server to enable change tracking.

  • Oracle database should be in ARCHIVELOG mode.
  • The supplemental logging has to be turned on at the database level. Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following statements:

    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  • Alternatively to turn on minimal database supplemental logging execute the following statements:
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; 
    ALTER DATABASE FORCE LOGGING;
  • In Oracle, ensure that supplemental logging is turned on at the table level. To turn on supplemental logging at the table level, execute the following statement:
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing Oracle on Amazon RDS

Enable Change Tracking for a database on Amazon Oracle RDS

  • In Oracle on Amazon RDS, the supplemental logging should be turned on at the database level.
  • Supplemental logging is required so that additional details are logged in the archive logs.
    To turn on supplemental logging at the database level, execute the following queries.

    exec
    rdsadmin.rdsadmin_util.alter_supplemental_logging('ADD','ALL');
  • To retain archived redo logs on your DB instance, execute the following command (example 24 hours)
    exec
    rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);
  • To turn on supplemental logging at the table level, execute the following statement
    ALTER TABLE <schema>.<tablename> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Preparing On-premise MySQL

To prepare MySQL for change tracking perform the following steps.

To enable binary logging, the following parameters need to be configured as below in my.ini file on MySQL on Windows or in my.cnf file on MySQL on UNIX:

Parameter Value
server_id Any value from 1.
E.g. server_id = 1
 log_bin=<path> Path to the binary log file.
E.g. log_bin = D:\MySQLLogs\BinLog
binlog_format binlog_format=row
expire_logs_days To avoid disk space issues it is strongly recommended not to use the default value (0).
E.g. expire_log_days = 4
 binlog_checksum This parameter can be set to binlog_checksum=none.
BryteFlow does support CRC32 as well
binlog_row_image binlog_row_image=full

Preparing MySQL on Amazon RDS

Enabling Change tracking on MySQL on Amazon RDS

To enable change tracking MySQL on Amazon RDS perform the following steps.

  1. In the AWS management console, for MySQL on Amazon RDS create a new DB parameter group and the following parameters should be configured as shown.
  2. The MySQL RDS DB instance should use the newly created DB parameter group for binary logging to be enabled.
binlog_format: binlog_format=row
binlog_checksum : binlog_checksum=none OR CRC32.

Data Types in MS SQL Server

BryteFlow Ingest source supports most MS SQL Server data types, see the following table for the supported list:

MS SQL Server Data Types

BIGINT REAL VARCHAR (max)
BIT FLOAT NCHAR
DECIMAL DATETIME NVARCHAR (length)
INT DATETIME2 NVARCHAR (max)
MONEY SMALLDATETIME BINARY
NUMERIC (p,s) DATE VARBINARY
SMALLINT TIME VARBINARY (max)
SMALLMONEY DATETIMEOFFSET TIMESTAMP
TINYINT CHAR UNIQUEIDENTIFIER
VARCHAR HIERARCHYID XML

Data Types in Oracle

BryteFlow Ingest source supports most Oracle data types, see the following table for the supported list:

Oracle Data Types

BINARY_DOUBLE BINARY_FLOAT CHAR
DATE INTERVAL DAY TO SECOND LONG
LONG RAW NCHAR NUMBER
NVARCHAR RAW REF
TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE VARCHAR2

 

Security Permissions Required on Source

Security for MS SQL Server

The BryteFlow Ingest database replication login  user should have VIEW CHANGE TRACKING permission to view the Change Tracking information.

--Review all change tracking tables that are = 1 enabled, or = 0 disabled
SELECT *
  FROM sys.all_objects
 WHERE object_id IN (SELECT object_id 
                       FROM sys.change_tracking_tables
                      WHERE is_track_columns_updated_on = 1);

Security for Oracle

The Oracle user running BryteFlow Ingest must have the following security privileges:

SELECT access on all tables to be replicated

The following statement should return records…

SELECT * FROM  V$ARCHIVED_LOG;

If no records are returned, select access on V_$ARCHIVED_LOG should be provided, or check if the database is in ACHIVELOG mode.

The following security permissions should be assigned to the user

CREATE SESSION
SELECT access on V_$LOGMNR_CONTENTS
SELECT access on V_$LOGMNR_LOGS
SELECT access on ANY TRANSACTION
SELECT access on DBA_OBJECTS
EXECUTE access on DBMS_LOGMNR

Run the following grant statements for <user> for the above requirements

GRANT SELECT ON V_$ARCHIVED_LOG TO <user>;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO <user>;
GRANT EXECUTE ON DBMS_LOGMNR TO <user>;
GRANT SELECT ON V_$LOGMNR_LOGS TO <user>;
GRANT SELECT ANY TRANSACTION TO <user>;
GRANT SELECT ON DBA_OBJECTS TO <user>;

 

Security for MySQL

The Ingest user id must have the following privileges:

  1. Replication client, and Replication Slave.
  2. Select privileges on the source tables designated for replication.
  3. Execute the following queries to grant permissions to a MySQL user.
CREATE USER 'bflow_ingest_user' IDENTIFIED BY '*****';
GRANT SELECT, REPLICATION CLIENT, SHOW DATABASES ON *.* TO bflow_ingest_user IDENTIFIED BY '******';
GRANT SELECT, REPLICATION slave, SHOW DATABASES ON *.* TO bflow_ingest_user IDENTIFIED BY '*****';

Verification of Source

Execute the following checks to confirm the change detection/tracking is set up correctly for MS SQL Server and Oracle sources

Verification of MS SQL Server source

To verify if change tracking is already enabled on the database run the following SQL queries. If a row is returned then Change Tracking has been enabled for the database

SELECT *
  FROM sys.change_tracking_databases
 WHERE database_id = DB_ID('databasename');

The following SQL will list all the tables for which Change Tracking has been enabled for the selected database

USE databasename;
SELECT sys.schemas.name as schema_name,
       sys.tables.name as table_name
  FROM sys.change_tracking_tables
  JOIN sys.tables ON sys.tables.object_id = sys.change_tracking_tables.object_id
  JOIN sys.schemas ON sys.schemas.schema_id = sys.tables.schema_id;

Verification of Oracle source

To verify if Oracle is setup correctly for change detection execute the following queries.

Condition to be checked SQL to be executed Result expected
Is ArchiveLog mode enabled?
SELECT log_mode 
  FROM V$DATABASE;
ARCHIVELOG
Is Supplemental logging turned on at database level?
SELECT supplemental_log_data_min
  FROM V$DATABASE;
YES
Is Supplemental Logging turned on at table level?
SELECT log_group_name, 
       table_name, 
       always,
       log_group_type
  FROM dba_log_groups;
RESULT <log group name>, <table name>, ALWAYS, ALL COLUMN LOGGING

Creating Amazon Services

Creating An EC2 System

Please refer AWS documentation on how to create EC2 System.

Creating S3 Bucket

Please refer AWS documentation for creating S3 bucket.

Configuring EMR Cluster

Prior to launching an EMR cluster its recommended to verify the service limits for EMR within your AWS region.

When using BryteFlow in,

  • Standard or Hybrid environment its recommended to have 3 instances for the EMR cluster(1 master and 2 core nodes)
  • High Availability mode its recommended to have 6 instances. 3 additional are for DR mode whenever any failure occurs.

To know more about AWS service limits  and how to manage service limits click on the respective links.

Launch EMR Cluster from AWS console :

Login to your AWS account and select the correct AWS region where your S3 bucket and EC2 container are located.

  1. Click on the services drop down in the header.
  2. Select EMR under Analytics or you can search for EMR.
  3. Click on the ‘Create cluster’ button
  4. In Create Cluster – Quick Options please type in Cluster Name (Name you will identify the Cluster with)
    keep the Logging check box selected, the S3 folder will be selected by default. Launch mode should be Cluster.
  5. Under Software configuration select release emr-5.14.0 and in Applications select Core Hadoop: Hadoop 2.8.3 with Ganglia 3.7.2, Hive 2.3.2, Hue 4.1.0, Mahout 0.13.0, Pig 0.17.0, and Tez 0.8.4
  6. Hardware configuration- Please select Instance type and number of Instances you want to run.
  7. Security and access –
    Please select the EC2 key pair that you want to use with the EMR Cluster. This key will be used to SSH into the Cluster. Permission should be set to default.
  8. You can add tags to your EMR cluster and configure the tag in Ingest to avoid the re-configuration in the software in case you plan to terminate the cluster and create a new.  This helps user to keep control of their clusters and save cost on AWS resources.
  9. Click on the ‘Create cluster’ button (provisioning of a cluster can take up to 15-20 min).

Starting & Stopping BryteFlow Ingest

If you are using the AMI from AWS Marketplace, BryteFlow Ingest will be preinstalled as a service in Windows.

Alternatively, you can install the service by executing the following command using the Command Prompt(Admin).

  1. Navigate to the directory of the installation.
  2. service.exe --WinRun4J:RegisterService

To Start BryteFlow Ingest

  1. Start the BryteFlow Ingest service using Windows Services or  Windows Task Manager
  2. Type the URL in the Chrome browser
localhost:8081

To Stop Bryteflow Ingest

  1. Stop the BryteFlow Ingest service
  2. Replication processes can also be aborted immediately by going to Task Manager
    -> Processes -> service.exe – and selecting “End Task”

Configuration of BryteFlow Ingest

The configuration of BryteFlow Ingest is performed though the web console

  1. Type the URL in the Chrome browser
localhost:8081

The screen will then present the following tabs (left side of the screen)

  • Dashboard
  • Connections
  • Data
  • Schedule
  • Configuration
  • Log

Dashboard

The dashboard provides a central screen when the overall status of this instance of BryteFlow Ingest can be monitored

  • The Data Sources Transfer Summary shows the number of records transferred. When hourly is selected you can view the transfer statistics for 24 hours, if daily is selected the monthly statistics are displayed.
    • The pie chart displays the status of the process
      • Extraction, denoted by red
      • Loading, denoted by orange
      • Loaded, denoted by green
    • Hovering on the bar graph gives the exact number of records transferred.
  • Schedule Extract Status displays the schedule status.
  • The Configure icon will take you to the configuration of the source tables, specifically the table, type of transfer, table primary key(s) and the selection of masked columns.
  • The Dashboard provides quick access for configuration of BryteFlow Ingest (Source, Destination Database, Destination File System and Email Notification)

 

Connections

The connections tab provides access to the the following sub-tabs

  • Source
  • Destination Database
  • Destination File System
  • Email Notification

Source

Configuration of MS SQL Server, Oracle, SAP (MS SQL Server) or SAP (Oracle) as a source

MS SQL Server and SAP (MS SQL Server)

  1. In the Database Type select “Microsoft SQL Server” from the drop-down list. For an SAP source with MS SQL Server database use “SAP (SQL Server)” from the drop-down list
  2. In the Database Host field please enter the IP address or hostname of the database server
  3. In the Database Port field please enter the port number on which the database server is listening on. The default port for MS SQL Server is 1433
  4. In the Database Name field please enter the name of your database e.g. BryteMSSQL
  5. Enter a valid MS SQL Server database user Id that will be used with BryteFlow Ingest. If a Windows user is required, please contact BryteFlow support info@bryteflow.com to understand how to configure this
  6. Enter Password; then confirm it by re-entering in Confirm Password
    • Please note, passwords are encrypted within BryteFlow Ingest
  7. Click on the ‘Test Connection’ button to test connectivity
  8. Click on the ‘Apply’ button to confirm and save the details

Oracle and SAP (Oracle)

  1. In the database type select ‘Oracle Log Miner’ from the drop-down list. For an SAP source with Oracle database use ‘SAP (Oracle)’ from the drop-down list
  2. In the database host field please enter the IP address or hostname of the database server.
  3. In the Database Port field please enter the port number on which the database server is listening on. The default port for Oracle is 1521
  4. In the database name field please enter Oracle SID
  5. Enter a valid Oracle database user id that will be used with Bryteflow Ingest.
  6. Enter Password; then confirm it by re-entering in Confirm Password
    • Please note, passwords are encrypted within BryteFlow Ingest
  7. Click on the ‘Test Connection’ button to test connectivity
  8. Click on the ‘Apply’ button to confirm and save the details

Please note: When using SID to connect to a dedicated Oracle server instance use ‘:SID’  in the Database Name of source configuration.

Destination Database

Available Destinations

  • S3 files using EMR
  • S3 files using EMR + Load to Redshift
  • Load to Redshift direct
  • Load to Snowflake direct

S3 files using EMR

  1. Enter Database Type: To use Amazon S3 as the destination, please use “S3 Files using EMR” from the drop-down list
  2. Click on the ‘Test Connection’ button to test the connection details
  3. Click on the ‘Apply’ button to confirm and save the details

S3 files using EMR + Load to Redshift

  1. Enter Database Type: To use Amazon S3 and Amazon Redshift as your destination, select “
  2. S3 files using EMR + Load to Redshift” from the drop-down list.
  3. Enter Database Host: Enter the endpoint for Amazon Redshift (excluding port)
    • eg. bryte-dc1.hdyesjdsdf.us-west-2.redshift.amazonaws.com
  4. Enter Database Port: Redshift default port being 5439
  5. Enter Database Name
    • eg dev
  6. Enter User Id: This is the Redshift user id that will load the schemas, tables, and data automatically to Redshift:
    • eg redshift_user
  7. Enter Password; re-enter to confirm
    • Please note, passwords are encrypted within BryteFlow Ingest
  8. Click on the ‘Test Connection’ button to test the connection details
  9. Click on the ‘Apply’ button to confirm and save the details

 

Load to Snowflake direct

  1. Enter ‘Database Type’ as ‘Load to Snowflake Direct’
  2. Database host is the Snowflake account URL For eg: abc123.ap-southeast-2.snowflakecomputing.com
  3. Database name to be in format <account>:<warehouse>:<db> For eg: abc123:COMPUTE_WH:DEMO_DB
  4. Enter Snowflake UserID in the Userid field
  5. Password to be configured under Password and Confirm Password section.

Destination File System

To Configure S3 as the filesystem perform the following steps.

  • Select File System as “AWS S3 with EMR” from the drop-down.
  • In the bucket name field, enter the bucket name that you have created on Amazon S3.
  • In the Delta Directory and Data Directory field, type in the name of the folders on Amazon S3
  • Enter the Amazon EMR instance ID eg. j-1ARB3SOSWXZUZ
  • EMR instance can be specified by Instance ID (as before) or a tag ‘value’ for the tag ‘BryteflowIngest’ or a tag and value expressed as ‘tag=value’. If more than one instance fits the criteria, the first one in the list will be picked.
  • In EMR Region and S3 Region select the correct regions from the drop-down list.
  • Enter AWS access key id and AWS secret key for accessing the S3 service if installation is on-premises, else IAM roles will be used.
    • Please note, keys are encrypted within BryteFlow Ingest
  • If you are using the KMS enter the KMS key
    • Please note, keys are encrypted within BryteFlow Ingest
  • Click on the ‘Test Directory’ button to test connectivity
  • Click on the ‘Apply’ button to confirm and save the details

Email Notification

To configure email updates to sent perform the following steps

  • Choose Mail Type: SMTP using TLS from the drop-down
  • In the Email Host, field type in the address of your SMTP server.
  • In the Email Port field, type in the port number on which the SMTP server is listening.
  • In the user id field type your complete email address from which will authenticate with the SMTP server.
  • Enter Password for the email; confirm.
    • Please note, passwords are encrypted within BryteFlow Ingest
  • In Send From, enter the email id from which the email will be send from, it has to be a valid email address on the server.
  • In Send To field enter the email address to which the notifications are sent to.
  • Click on Test Connection and then Apply to test the connection and save the settings.

Data

NOTE: Please review this section in conjunction with Appendix: Understanding Extraction Process

To select the table for transferring to destination database on Amazon Redshift and/or Amazon S3 bucket perform the following steps.

  1. Expand the Database.
  2. Browse to the table you want to be synced with Amazon Redshift or Amazon S3.
  3. Select the checkbox next to the table and then click on the table.
  4. On the right-hand side pane, select the type of transfer for the table i.e. By Primary Key or By Primary Key with History. With the Primary Key option, the table is replicated like for like to the destination. With the Primary Key with History option, the table is replicated as time series data with very change recorded with Slowly Changing Dimension type2 history (aka point in time)
  5. In the Primary Key column, select the Primary Key for the table by checking the checkbox next to the column name.
  6. You can also mask a column by checking the checkbox. By masking a column, the selected column will not be transferred to the destination.
  7. Click on the ‘Apply’ button to confirm and save the details
  8. Click on the ‘Full Extract’ button to request a full load of the table

 

This process of selecting tables, configuring primary keys and mask columns should be repeated for each of the tables. Once complete the next step is to…

  1. Navigate to Schedule tab
  2. Click on the ‘Sync New Tables’ button to initiate the process

 

Partitioning

Amazon S3 And Amazon Redshift
Partitioning can dramatically improve efficiency and performance, it can be set up when replicating to S3 (data is partitioned in folders) and/or Redshift (data is partitioned into tables).  The partitioning string is entered into the Partitioning folder field.  The format for partitioning is as follows

/@<column index>(<partition prefix><partition_format>)

 

Column Index

To build a partitioning folder structure the column index (starting from 1) of the column(s) to be used in the partition need to be known, in this simple table there are 3 columns…

  • customer.contact_id would be column index 1
  • customer.fullname would be column index 2
  • customer.email would be column index 3

 

Partition Prefix (optional)

Each partition can be prefixed with a named fixed string. The last character the Partition Prefix can be set to ‘=’, ending with ‘=’ is useful when creating partitions on S3 as this facilitates the automated build/recovery of partitions (see below).

  • The partition prefix string should be in lower case
  • The partition prefix string should not be the same as any of the existing column names

 

An example for partitioning on the first letter of of column 2 (fullname in this case) is as follows:

/@2(fullname_start=%1s)

Refer to the MSCK REPAIR TABLE command in AWS Athena documentation. A lower case partition prefix is recommended as an upper/mixed case partition prefix can result in issues when using Athena.

--Builds/recovers partitions and data associated with partitions 
MSCK REPAIR TABLE <athena_table_name>;

 

Once the MSCK REPAIR TABLE <athena_table_name>; has been executed all data will be added to the relevant partitions….any new data will be automatically added to the existing partitions. However if new partitions are created by BryteFlow Ingest the MSCK REPAIR TABLE <athena_table_name>; command will have to be re-executed to make the data available for query purposes in the Athena table.

 

Format

The format is applied to the column index specified above, for example to partition the data by year (on a date column) you’d use the format %y, to partition by the 24 hour format of time you’d use the format %H.

Partition Examples

Example 1: Year
Assuming Column Index 7 was a date field…

/@7(%y)

This would create partition folders such as

  • 2016
  • 2017
  • 2018
  • 2019

 

Example 2: YearMonthDay
Assuming Column Index 7 was a date field…

/@7(%y%M%d)

This would create partition folders such as

  • 20190101
  • 20190102
  • 20190103
  • 20190104

 

Example 3: yyyymmdd=YearMonthDay
Assuming Column Index 7 was a date field…

/@7(yyymmdd=%y%M%d)

This would create partition folders such as (useful format to automate recovery/initial population of data associated with partitions when using Athena)

  • yyyymmdd=20190101
  • yyyymmdd=20190102
  • yyyymmdd=20190103
  • yyyymmdd=20190104

 

Example 4: DOB column was used to create sub partitions of yr, mth and day
Assuming DOB Column Index 4 was a date

/@4(yr=%y)/@4(mth=%M)/@4(day=%d)

 

Example 5: model_nm=model_values and then sub partitions of yearmonth=YearMonth (multiple column partitioning)
Assuming Column Index 6 was a string (containing for example model_name_a, example model_name_b and example model_name_c) and Column Index 13 was a date field…

/@6(model_nm=%s)/@13(yearmonth=%y%M)
  • model_nm=model_name_a
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803
  • model_nm=model_name_b
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803
  • model_nm=model_name_c
    • yearmonth=201801
    • yearmonth=201802
    • yearmonth=201803

 

 

Available Partition Options

Format Datatype Description
%y
TIMESTAMP
Four digit year e.g. 2018
%M
TIMESTAMP
Two digit month with zero prefix e.g. March -> 03
%d
TIMESTAMP
Two digit date with zero prefix e.g. 01
%H
TIMESTAMP
Two digit 24 hour with zero prefix e.g. 00
%q
TIMESTAMP
Two digit month indicating the start month of the quarter e.g. March -> 01
%Q
TIMESTAMP
Two digit month indicating the end month of the quarter e.g. March -> 03
%r
TIMESTAMP
Two digit month indicating the start of the half year e.g. March -> 01
%R
TIMESTAMP
Two digit month indicating the end of the half year e.g. March -> 06
%i
INTEGER
Value of the integer e.g. 12345
%<n>i
INTEGER
Value of the integer prefixed by zeros to specified width e.g. %8i for 12345 is 00012345
%<m>.<n>i
INTEGER
Value of the integer is truncated to the number of zeros specified by <n> and prefixed by zeros to the width specified by <m> e.g. %8.2i for 12345 is 00012300
%.<n>i
INTEGER
Value of the integer is truncated to the number of zeros specified by <n> e.g. %.2i for 12345 is 12300
%s
VARCHAR
Value of the string e.g. ABCD
%<n>s
VARCHAR
Value of the string truncated to the specified width e.g. %2s for ABCD is AB

 

Schedule

 To configure extracts to run at a specific time perform the following steps.

  1. In case of Oracle Automatic is preselected and other options are disabled by default.
  2. For MS SQL Server you can choose the period in minutes.
  3. A daily extraction can be done at a specific time of the day by choosing hour and minutes in the drop-down.
  4. Extraction can also be scheduled on specific days of the week at a fixed time by checking the checkboxes next to the days and selecting hours and minutes in the drop-down.
  5. Click on the ‘Apply’ button to save the schedule.

 

Add a new table to existing Extracts

You can add additional table(s) if the replication is up and running and the need arises to add a new table to extraction process…

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • Navigate to the ‘Data’ tab
    • Select the new table(s) by navigating into database instance name, schema name and table name(s)
    • Configure the table, considering the following
      • Transfer type
      • Partitioning folder (refer to the Partitioning section of this document for details)
      • Primary key column(s)
      • Columns to be masked (optional, masked columns are excluded from replication, for example salary data)
      • Click on the ‘Apply’ button
      • Click on the ‘Full Extract’ button
      • Repeat process for each table that is required
  • Navigate to the ‘Schedule’ Tab
    • Click on the ‘Sync New Tables’ button

This will initiate the new table(s) for a full extract, once completed BryteFlow Ingest will automatically resume with processing deltas for the new and all the previously configured tables.

Resync data for Existing tables

If the Table transfer type is Primary Key with History, to resynch all the data from source, perform the following steps

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)

 

  • For Resync Data on ALL configured tables…
    • Navigate to Schedule tab
    • Click on the ‘Full Extract’ button

 

  • For Resync Data on selected tables…
    • Navigate to Data Sources tab
      • Select the table(s) by navigating into database instance name, schema name and table name(s)
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
    • Navigate to the ‘Schedule’ Tab
      • Click on the ‘Sync New Tables’ button

Rollback

In the event of unexpected issues (such as intermittent source database outages or perhaps network connectivity issues etc) it is possible to wind back in time the status of BryteFlow Ingest and replay all of the changes. Suppose there was a problem that occurred at say perhaps 16:04 hours, you can rollback BryteFlow Ingest to a point in time before these issues starting occurring, say 15:00.  To perform this operation….

  1. Navigate to the Schedule tab.

  1. Click on the ‘Rollback’ button
  2. The rollback screen appears, it provides a list of all of the points in time you can rollback to
    • Dependent upon the source database log retention policy
  3. Select the required date (radio button) and click ‘Select’

  1. Click on ‘Rollback’ to initiate the Rollback
  2. The rollback will now catch up from 15:00 to ‘now’ automatically replaying all of the log entries and applying them to the destination

Configuration

The configuration tab provides access to the the following sub-tabs

  • Source
  • Destination Database
  • S3
  • License
  • Recovery
  • Remote Monitoring

Source

Web Port: The port on which the BrtyeFlow Ingest server will run on.

Max Catchup Log: The number of Oracle archive logs will be processed at one instance.

Minimum Interval between Catchups: The minimum minutes between catchup batches.

Default transfer type: Default transfer option applied when not defined at the table level.

Handle Oracle raw columns: Handle raw columns by converting to hex string instead of ignoring as CHAR(1).

Destination Database

Max Updates: Combine updates that exceed this value.

Loading threads:  Number of Redshift loading threads.

Schema for all tables:  Ignore source schema and put all tables in this schema on destination

Schema for staging tables:  Schema for staging tables.

Retaining staging tables:  Retain staging tables.

Source Start Date:  Column name for source date.

History End Date:  Column name for history end date

End Date Value:  End date used for history.

Ignore database name in schema:  Don’t use DB as part of schema prefix for MS SQL Server.

No. of data slices:  Number of slices to split data file in to.

File compression:  Compression method, available options are as follows

  • None
  • BZIP2
  • GZIP
  • Parquet
  • ORC(snappy)
  • ORC(zlib)

 

S3

Keep S3 Files: Retain files in S3 after loading into AWS Redshift.

Use SSE:  Store in S3 using SSE (server-side encryption).

S3 Proxy Host: S3 proxy host name.

S3 Proxy Host Port:  S3 proxy port.

S3 Proxy user ID:  S3 proxy user id.

S3 Proxy Password:  S3 proxy password.

 

License

To get a valid license go to Configuration tab, then to the License tab and email the “Product ID” to the Bryte support team – support@bryteflow.com

NOTE: Licensing is not applicable when sourced from the AWS Marketplace.

 

High Availability / Recovery

BryteFlow Ingest provides High Availability Support, this means it automatically saves the current configuration and execution state to S3 and DynamoDB. As a result an instance of BryteFlow Ingest (including it’s current state) can be recovered should it be catastrophically lost. Before use this must be configured, select the Configuration tab and then the Recovery sub-tab to enter the required configuration.

Recovery Configuration

  1. In the Instance Name field enter a business friendly name for the current instance of BryteFlow Ingest
  2. Check Enable Recovery
  3. Enter the destination of the recovery data in S3, for example s3://your_bucket_name/your_folder_name/Your_Ingest_name
  4. Click on the ‘Apply’ button to confirm and save the details

The recovery data is stored in the DynamoDB (AWS fully managed NoSQL database service). The recovery data for the named instance (in this example Your_Ingest_Name is stored in a DynamoDB table called BryteFlow.Ingest as shown below:

Recovery Utilisation

To recover an instance of BryteFlow Ingest, you should source a new Instance of BryteFlow Ingest from the AWS Marketplace

  1. Use the AMI sourced from the AWS Marketplace
  2. Type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  3. Click on the ‘Existing Instance’ button

  1. Select the existing instance you wish to restore from the list displayed, in this example there is only one (it being ‘Your_Ingest_Name’), once the required instance has been selected click on the the ‘Select’ button

BryteFlow Ingest will collect the configuration and saved execution state of the instance selected (in this case ‘Your_Ingest_Name’) and restore accordingly.

 

NOTE: Recovery can also be a method of partial migration between environments (for example DEV to PROD stacks). As the restore will clone the exact source environment and source state further configuration will be required (for example updating configuration options of the PROD stack EMR instance, S3 location etc). But this method could cut down on some of the workload in cases where there are 100’s of tables to be configured and you’re moving to a new EC2 instance.

Recovery from Faults

BryteFlow supports high availability and auto recovery mechanisms in case of faults and failures.

  • In case of AZ faults or Instance failures or when,
    • EC2 instance is terminated/effected, BryteFlow saves the last successful loads as a savepoint. Its resumes from the savepoint when re-started again in another AZ or on another EC2 instance.
    • EMR cluster failures, BryteFlow does continuous retries until successful.
    • EMR step failures, BryteFlow does continuous retries with exponential backoff feature, this prevents throttling exception to occur.
    • Redshift/ Snowflake connection issues, BryteFlow does continuous retries until successful.
    • Source DB Connection issues, BryteFlow does continuous retries until successful.
    • AWS S3 connection issue, BryteFlow does continuous retries until successful.

Customers looking for high availability support are recommended to configure their BryteFlow Ingest instance for High availability and recovery. Details to setup this feature is mentioned in the High Availability / Recovery section of BryteFlow user guide

  • In case of Application faults and failures
    • Get notified by enabling CloudWatch Logs and metrics in BryteFlow Ingest. AWS CloudWatch events can be used for alerting by writing Lamdba functions for customer specific requirements.
    • Or Enable SNS’s in BryteFlow Ingest and subscribe to the SNS topic from AWS console

Details to setup these features are mentioned in the Remote Monitoring section of BryteFlow user guide

  • For disk space, Bryteflow Ingest sends status to CW Logs that includes free disk space in GB. Users can write Lambda functions around this to raise alarms.
Below is small guide to setup Lambda for disk checks:

Prerequisite:

  • Create an IAM role
  • Attach policy that allows Lambda functions to call AWS services as below:
    • {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "Stmt1",
            "Action": [
              "logs:FilterLogEvents",
              "logs:GetLogEvents",
              "logs:PutLogEvents"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:logs:"
          },
          { 
           "Sid": "Stmt2", 
           "Action": [ 
              "sns:Publish", 
              "sns:TagResource" 
              ], 
           "Effect": "Allow", 
           "Resource": "arn:aws:sns:<region>:<account_ID>:<topic_name>" 
          }
        ]
      
  • Create an SNS topic (refer AWS documentation) and use the ARN in your AWS Lambda code.

Step to create a Lambda function:

  1. Login to AWS console and go to AWS Lambda dashboard.
  2. Click on Create function
  3. Give Function name
  4. Choose the runtime language Python 3.8 from the dropdown.
  5. Click on create the function
  6. On the next screen click on Add trigger
  7. Select CloudWatch Log from the dropdown
  8. Choose your log group
  9. Give filter name
  10. Click on Add (You can add multiple log group 1 by 1)
  11. Add your Lambda code in the function code window (scroll down the screen)
  12. Choose the ‘Lambda Execution role’
  13. Click on the Save button.
Sample Lambda code for disk check is provided below:
import json
import boto3
def lambda_handler(event, context):
freeGb = 100;
cloudwatch = boto3.client(‘logs’)
response = cloudwatch.get_log_events(
logGroupName=’Oracle_LogGroup’,
logStreamName=’Oracle_LogStream’,
startFromHead=False,
limit=100
)
#print(“events list “, response[“events”])
print (“———————> “)
for i in response[“events”]:
#print(“message –> “, i[“message”])
msg = json.loads(i[“message”])
#print(“type –> “,msg[“type”])
if msg[“type”] == “Status” and msg[“diskFreeGB”] < freeGb:
#print(“diskFreeGB –> “, msg[“diskFreeGB”])
sns = boto3.client(‘sns’)
sns.publish(TopicArn=’arn:aws:sns:us-west-2:689564010160:LambdaTrigger’,
Message=’Your free disk size is getting low please contact concerned team!’
)
Note: Email body can be customized in the above code according to Customers specifications.

Time to Recover

Recovery Point Objective(RPO)

BryteFlow does auto recovery of the instance and as it uses most durable services like S3 and Dynamo db to store its data, the data has unlimited retention.

In case of customer data, it totally depends on the Customer’s source db settings for data retention.  If the source data is available BryteFlow Ingest can recover and replicate from thereon.

BryteFlow ensures it meets the customer expectation of near real time latency and hence tries to recover automatically in most of the failure scenarios.

Recovery Time Objective(RTO)

For EC2 failures, RTO for BryteFlow applications is very minimal(in minutes) as it maintains the save-point of Ingest application in near real-time onto Dynamo DB, which is highly durable AWS service. When the Ingest instance is back online after a restart or after it was terminated abruptly(mostly in case of EC2 failures). It resumes from the last successful save-point and continues onward replication, without needing any full reload.

For EMR Failures, the RTO depends on the time taken to launch a new cluster so when using single EMR cluster the time varies from 10-30 minutes. Until the EMR is up it retries with exponential back-off mechanism until a successful connection is established and replication continues from the same point, without any data loss.

Please note: No full-reload is needed unlike other available solutions.

Recovery Testing

Once BryteFlow Ingest is recovered from any failure, follow below steps to perform basic checks before starting the replication in order to avoid any further errors or issues:

  1. Start BryteFlow Ingest service
  2. Open Ingest web console in chrome browser
  3. Go to ‘Connections’ tab in the left menu
  4. Under ‘Source’ check your source db configurations and do a ‘Test Connection’ to check the connectivity between BryteFlow and source db.
  5. If all is good, do the same for ‘Destination’ connections.
  6. If any hiccups encountered in source or destination db connectivity, troubleshoot further to resolve the issue until successful connection is established.
  7. Turn the ‘Schedule to ‘ON’ and resume ongoing replication.

Remote Monitoring

BryteFlow Ingest comes pre-configured with remote monitoring capabilities. These capabilities leverage existing AWS technology such as CloudWatch Logs/Events. CloudWatch can be used (in conjunction with other assets in the AWS ecosystem) to monitor the execution of BryteFlow Ingest and in the event of errors/failures raise the appropriate alarms.

In addition to the integration with CloudWatch , BryteFlow Ingest also also writes the internal logs directly to S3 (BryteFlow Ingest console execution and error logs).

 

To Configure the remote monitoring perform the following steps :

  1. Enter an Instance Name, this being a business friendly name for the current instance of BryteFlow Ingest.
  2. Check Enable S3 Logging if you want to record data to S3 (console/execution logs).
  3. Enter the destination of the logging data in S3, for example s3://your_bucket_name/your_folder_name
  4. Enter the name of the CloudWatch Log Group (this needs to be created first in the AWS console)
  5. Enter the name of the CloudWatch Log Stream under the aforementioned Log Group (again this needs to be created first in the AWS console)
  6. Check Enable CloudWatch Metrics if required
  7. Check Enable SNS Notifications
  8. Enter the Topic ARN in the SNS Topic input box
  9. Click apply to save the changes

The events that BryteFlow Ingest pushes to AWS CloudWatch Logs console are as follows, please refer to Appendix: Understanding Extraction Process for a more detailed breakdown.

Bryte Events Description
LogfileProcessed Archive log file processed (Oracle only)
TableExtracted Source table extract complete MS SQL Server and Oracle (initial extracts only)
ExtractCompleted Source extraction batch is complete
TableLoaded Destination table load is complete
LoadCompleted All destination table loads in a batch is complete
HaltError Unrecoverable error occurred and turned the Scheduler to OFF
RetryError Error occurred but will retry

 

Log

You can monitor the progress of your extracts by navigating to the Log tab.

BryteFlow Ingest stores the log files under your install folder, specifically under the \log folder.
The path to log  file is as follows <install folder of Ingest>\log\sirus*.log, for example

c:\Bryte\Bryte_Ingest_37\log\sirus-2019-01.log

The error files are also stored under the \log folder.
The path to log  file is as follows <install folder of Ingest>\log\error*.log, for example

c:\Bryte\Bryte_Ingest_37\log\error-2019-01.log

These logs can also be reviewed/stored in S3, please refer to the following section on Remote Monitoring for details.

Optimize usage of AWS resources / Save Cost

EMR Tagging

BryteFlow Ingest supports EMR Tagging feature which helps you dramatically to save cost on the EMR Clusters. This helps customers to control EMR cost by terminating the cluster when not in use without interrupting Ingest config and schedule.

You can add default tag ‘BryteflowIngest’ when creating a new Amazon EMR cluster for Ingest or you can add, edit, or remove tags from a running Amazon EMR cluster.  And, use the tag name and value in the EMR Configuration section of Ingest as in below image.

 

Tagging AWS Resources

AWS allows customers to assign metadata to their AWS resources in the form of tags. It is also recommended that you tag all the AWS resources created for and by BryteFlow for managing and organizing resources,  access control, cost tracking, automation, and organization.

Its recommended to use tags with names which are specific to the instance being created, for example, for a BryteFlow instance which is replicating source db which is a Production database server for Billing and Finances, tag names should reflect the dbname it is dedicated to like ‘BryteFlowIngest_BFS_EC2_Prod’, similarily for UAT environment it can be ‘BryteFlowIngest_BFS_EC2_UAT’. By doing this Customers can easily differentiate between the various AWS resources being used within their environment. Use similar tag names for each service.

BryteFlow recommends to tag the below listed AWS services used by with unique identifiable tag name.

  • AWS EC2
  • AWS EMR
  • AWS Redshift instances

For the detail guide on tagging resources in AWS refer to the AWS documentation links provided:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-tags.html

https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-tagging.html

Upgrade software versions from AWS Marketplace

Users already using BryteFlow AMI Standard Edition can easily upgrade to the latest version of the software directly from AWS Marketplace by following few easy steps .

Steps to perform in your current install :

  • As you are planning to upgrade you need to make sure you have all your setup backed up.
  • To save your current instance setup and stats go to ‘Configuration’ ->  ‘Recovery’ 
  • In the Instance Name field enter a business friendly name for the current instance of BryteFlow Ingest
  • Check Enable Recovery
  • Enter the destination of the recovery data in S3, for example s3://your_bucket_name/your_folder_name/Your_Ingest_name
  • Click on the ‘Apply’ button to confirm and save the details
  • Once the recovery is setup, you are good to turn the ‘Schedule to OFF’ of the current version and let it come to a complete pause.
  • Go to product URL from AWS marketplace https://aws.amazon.com/marketplace/pp/B01MRLEJTK
  • In the product configuration settings, choose the latest available version from the ‘software version’ dropdown.
  • And, ‘Continue to Launch’ your new instance.
  • Choose Action ‘Launch from Website’
  • Select your EC2 instance type based on your data volume, recommendations available in the product detail page
  • Choose your VPC from the dropdown or go by the default
  • Please select the ‘Private Subnet’ under ‘Subnet Settings’. If none exists, its recommended to create one. Please follow detail AWS User Guide for Creating a Subnet.
  • Update the ‘Security Group Settings’ or create one based on BryteFlow recommended steps as below:
    • Assign a name for the security group, for eg: BryteFlowIngest
    • Enter a description of your choice
    • Add inbound rule(s) to RDP the EC2 with the Custom IP address
    • Add outbound rule(s) to allow the EC2 instance access the source db. DB ports will vary based on the source database , please add rules to allow the instance access to specific source database ports.
    • For more details, refer to BryteFlow recommendation on Network ACLs for your VPC in the section ‘Recommended network acl rules for ec2
  • Provide the Key Pair Settings by choosing an EC2 key pair of your own or ‘Create a key pair in EC2
  • Click ‘Launch’ to launch the EC2 instance.
  • The endpoint will be an EC2 instance running BryteFlow Ingest (as a windows service) on port 8081

 

Steps to perform in your new install :

  • Connect to the new instance using ‘Remote Desktop Connections’ to the EC2 launched via AMI.
  • Once connected to the EC2 instance, Launch BryteFlow Ingest from the google chrome browser using bookmark ‘BryteFlow Ingest’
  • Or type localhost:8081 into the Chrome browser to open the BryteFlow Ingest web console
  • This will bring up a page requesting either a ‘New Instance’ or an ‘Existing Instance’
    • Click on the ‘Existing Instance’ button as we need to resume BryteFlow Ingest from the last saved
    • Select the existing instance you wish to restore from the list displayed, in this example there is only one (it being ‘Your_Ingest_Name’), once the required instance has been selected click on the the ‘Select’ button

  • BryteFlow Ingest will collect the configuration and saved execution state of the instance selected (in this case ‘Your_Ingest_Name’) and restore accordingly.
  • Go to the ‘Connections’ tab, and test the ‘Source’, Destination’ and ‘File System’ connections prior to turning the ‘Schedule On’.
  • In  case of any connection issues, please check the firewall settings of the EC2 and source systems.
  • Once all connections are ‘Tested OK’, go to ‘Schedule’ tab and turn the schedule to ‘ON’.
  • This completes the upgrade and resumes the Ingestion as per the specified schedule.

 

BryteFlow: Licencing Model

BryteFlow’s Licensing model is based on the data volumes at the source getting replicated across to destination.

Volume based licensing are classified into below groups:

  • 100GB
  • 300GB
  • 1TB
  • > 1TB (contact Bryte Support)

BryteFlow products are available to use from the AWS marketplace for data volumes upto 1TB. For source data volumes > 1TB its recommended to contact BryteFlow support(email: support@bryteflow.com)  for detail information and inquiry.

BryteFlow Ingest : Pricing

BryteFlow products are available for use via AWS Marketplace. It comes in two different flavors:

  1. BryteFlow Standard Edition-Data Integration for S3, Redshift, Snowflake
  2. BryteFlow Enterprise Edition-Data Integration S3, Redshift, Snowflake

 

BryteFlow Support Information

Each of our products is backed by our responsive support team. Please allow for 24 hours for us to get back to you.  To get in touch with our support team, shoot an email to support@bryteflow.com

Standard Support SLA 24 hrs
Support timings Business hours( 9:00 am AEST – 5:00 pm AEST)
Support Language English(US&UK)
Support Plan/Charges General support is inclusive in your contract, no additional charges are incurred for standard support. In case of getting into L2/L3 support contract please reach out to BryteFlow support in order to discuss further in detail.

Appendix: Understanding Extraction Process

Extraction Process

Understanding Extraction.

Extraction has two parts to it.

  1. Initial Extract.
  2. Delta Extract.

Initial Extract.

An initial extract is done for the first time when we are connecting a database to BryteFlow Ingest software. In this extract, the entire table is replicated from the source database to the destination (AWS S3 or AWS Redshift).

A typical extraction goes through the following processes. Below example shown is the extraction from MS SQL Server as source and Amazon S3 bucket and destination.

Extracting 1
Full Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Extracted 1
Full Extract database_name:table_name complete (4 records)
Load file 1
Loading table emr_database:dbo.names with 4 records(220 bytes)
Transferring null to S3
Transferred null 10,890 bytes in 8s to S3
Transferring database_name_table_name to S3

Delta Extract.

After the initial extract, when the database is replicated to the destination, we do a delta extract. In delta extracts, only the changes on the source database are extracted and merged with the destination.

After the initial extraction is done all the further extract are Delta Extracts (changes since the last extract.)

A typical delta extracts log file is shown below.

Extracting 2
Delta Extract database_name:table_name
Info(ME188): Stage pre-bcp
Info(ME190): Stage post-bcp
Info(ME260): Stage post-process
Delta Extract database_name complete (10 records)
Extracted 2
Load file 2
Loaded file 2

First Extract

Extracting Database for the first time.

Keep all defaults. Click on Full Extract.

The first extract always has to be a Full Extract. This gets the entire table across and then the delta’s are populated periodically with the desired frequency.

Schedule Extract

 

To configure extracts to run at a specific time perform the following steps.

  1. In case of Oracle Automatic is preselected and other options are disabled by default.
  2. For MS SQL Server you can choose the period in minutes.
  3. A daily extraction can be done at a specific time of the day by choosing hour and minutes in the drop-down.
  4. Extraction can also be scheduled on specific days of the week at a fixed time by checking the checkboxes next to the days and selecting hours and minutes in the drop-down.
  5. Click Apply to save the schedule.

Add a new table to existing Extracts

After database have been selected for extraction and they are replicating. If a need arises to add a new table to extraction process then it can be done by following steps.

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • Navigate to Data tab
    • Select the new table(s) by navigating into database instance name, schema name and table name(s)
    • Configure the table, considering the following
      • Select transfer type
      • Select partitioning folder (refer to Partitioning section for details)
      • Select primary key column(s) where applicable
      • Select columns to be masked (optional, these are excluded from extraction, for example salary data)
      • Click on the ‘Apply’ button
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
  • Navigate to the Schedule Tab
    • Click on ‘Sync New Tables’ button

This will include the new table(s) for a full extract and also resume with deltas for all the previously configured tables and the newly added table(s).

Resync data for Existing tables

If the Table transfer type is Primary Key with History, to resync all the data from source, perform the following steps

  • Click the Schedule ‘off’ (top right of screen under Schedule tab)
  • For Resync Data on ALL configured tables…
    • Navigate to Schedule tab
    • Click on the ‘Full Extract’ button
  • For Resync Data on selected tables..
    • Navigate to Data Sources tab
      • Select the table(s) by navigating into database instance name, schema name and table name(s)
      • Click on the ‘Full Extract’ button
      • Repeat process if more than one table is required
    • Navigate to the Schedule Tab
      • Click on ‘Sync New Tables’ button

 

Appendix: Bryte Events for AWS CloudWatch Logs and SNS

BryteFlow Ingest supports connection to AWS Cloudwatch Logs, Cloudwatch Metrics and SNS. This can be used to monitor the operation of Bryteflow Ingest and integrate with other assets leveraging the AWS infrastructure.

AWS Cloudwatch Logs can be used to send logs of events like load completion or failure from Bryteflow Ingest. Cloudwatch Logs can be used to monitor error conditions and raise alarms.

Below are the list of Events that BryteFlow Ingest pushes to AWS CloudWatch Logs console  and for AWS SNS :

Bryte Events Description
LogfileProcessed Archive log file processed (Oracle only)
TableExtracted Source table extract complete MS SQL Server and Oracle (initial extracts only)
ExtractCompleted Source extraction batch is complete
TableLoaded Destination table load is complete
LoadCompleted All destination table loads in a batch is complete
HaltError Unrecoverable error occurred and turned the Scheduler to OFF
RetryError Error occurred but will retry

Below is the detail for each of the Bryte Events :

Event : LogfileProcessed

Attribute

Is Metric(Y/N)?

Description

type N “LogfileProcessed”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
fileSeq N File sequence
file N File name
dictLoadMS Y Time taken to load dictionary in ms
CurrentDBDate N Current database date
CurrentServerDate N Current Bryte server date
parseMS Y Time taken to parse file in ms
parseComplete N Timestamp when parsing is complete
sourceDate N Source date
Event : TableExtracted

Attribute

Is Metric(Y/N)?

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
Event : ExtractCompleted

Attribute

Is Metric(Y/N)?

Description

type N “ExtractCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “EXTRACT”
jobSubType N Extract type
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
Event:TableLoaded

Attribute

Is Metric(Y/N)?

Description

type N “TableLoaded”
subType N Table name
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
tabName N Table name
success N true/false
message N Status message
sourceTS N Source date time
sourceInserts Y No. of Inserts in source
sourceUpdates Y No. of Updates in source
sourceDeletes Y No. of Deletes in source
destInserts Y No. of Inserts in destination
destUpdates Y No. of Updates in destination
destDeletes Y No. of Deletes in destination
Event : LoadCompleted

Attribute

Is Metric(Y/N)?

Description

type N “LoadCompleted”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
jobType N “LOAD”
jobSubType N Sub type of the “LOAD”
success N Y/N
message N Status message
runId N Run Id
sourceDate N Source date
dbDate N Current database date
fromSeq N Start file sequence
toSeq N End file sequence
extractId N Run id for extract
tableErrors Y Count of table errors
tableTotals Y Count of total tables
Event : HaltError

Attribute

Is Metric (Y/N)?

Description

type N “HaltError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier
Event : RetryError

Attribute

Is Metric (Y/N) ?

Description

type N “RetryError”
generated N Timestamp of message
source N Instance name
sourceType N “CDC”
message N Error message
errorId N Short identifier

Appendix: Release Notes

Release details (by date descending, latest version first)

BryteFlow Ingest 3.8

Release Notes BryteFlow Ingest – v3.8 Build 1189

Released November 2019

New Features

  •  Optimization for access to on-premise Oracle from Bryteflow server on AWS
  •  Support for Parquet and ORC formats for S3/EMR + Redshift
  • Process for loading large tables using Ingest and Ingest XL
  • User defined namespace for Cloudwatch Metrics
  • Exponential backoff on throttling of EMR access
  • Support for Oracle log mining on a separate Oracle instance
  • Cutoff date now has a cutoff offset as well
  • Support for processing partitions in a specified order
  • Support for Snowflake destination

Bug Fixes

  • Support for UTF-8 characters in data
  • String now appear correctly in Athena for files in Parquet format

Known Issues

  • Some Sync operations may show structure differences in Snowflake where none exist

 

BryteFlow Ingest 3.7.3

Release Notes BryteFlow Ingest – v3.7.3

Released April 2019

  • New Features 
    • Notifications : A notification icon appears on the top bar. The number of issues appear as a bubble. The bubble is red if there at least one error  and orange for warnings. Hovering gives the count of errors and warnings in a tooltip. Clicking on the icon lists the issues in a dropdown list.  
    • Help: Clicking on the help icon takes you to the online documentation.  
    • Drivers: The supported source and destination drivers has been streamlined.  
    • EMR tags: EMR instance can be specified by Cluster ID or a tag value for the tag BryteFlowIngest or a tag and value expressed as “tag=value”. 
    • Port: Changing the web port is allowed for non-AMI installations.  

 

  • Bug Fixes
    • An issue with an error on no initial extract even when skip initial extract is done for a table is now fixed.  
    • Some attribute values in Cloudwatch Logs which were previously blank have now been fixed. 
    • An issue where all pending jobs were cancelled on a failure in the current job is now fixed.  
    • A redundant field when “S3 Files using EMR” is selected as a destination has been removed.  
    • The Apply button is disabled if no changes have been made in Source/Destination/File screens. This gets around the problem of the connection being flagged with a warning on pressing apply even if no fields have been changed.  
    • Initial extract in Oracle now set the effective date to the database date instead of the server date. 

 

  • Known Issues
    •  Non-AMI EC2 may show some warning messages on startup.  

BryteFlow Ingest 3.7

Released:  January 2019

  • BryteFlow Ingest 3.7 available on AWS Marketplace as an AMI
    • Pay as you go on AWS Marketplace
    • Hourly/Annual billing options on AWS Marketplace
    • No licence keys on AWS Marketplace
    • 5 day trial on AMI (new customers only)
  • Volume based licensing
    • 100GB
    • 300GB
    • 1TB
    • > 1TB (contact Bryte Support)
  • High Availability
    • Automatic backup of current state
    • Automatic cut-over and recovery following EC2 failure
    • IAM support
  • Rollback to previous saved point in time
    • Dependent upon source db logs
  • Partitioning
    • Built in to Interface
    • Partition names configurable, wide range of formats
    • AWS Athena friendly partition names
  • New S3 compression options
    • Parquet
    • ORC(Snappy)
    •  ORC(Zlib)
  • Remote Monitoring (integrated with AWS Services)
    • Cloudwatch
    • Logging
    • Metrics
    • SNS
Suggest Edit