Steps to perform on how to restart Ingest.
Steps to perform Rollback in Ingest.
Steps to perform Full Extract when an Initial Sync needs to be performed for all tables configured in a pipeline.
Steps to perform Redo Initial Extract for existing tables.
Steps to perform deltas without an Initial Sync of the data or for XL-Ingest.
Steps to perform adding only new tables into existing pipelines.
Steps to perform Sync Struct for existing tables when source structure changes.
Steps to perform on how to stop Ingest.
BryteFlow setup guide details on how to enable supplemental logging for source table(s) for Delta replication. Please refer to the setup guide.
https://docs.bryteflow.com/bryteflow-setup-guide#prerequisite-for-oracle-source
Important note:
Hence, both Database and Table level supplemental loggings are required.
Steps to create an additional Ingest pipeline.
Latency is calculated based on time from the record is committed to source till the record is available on the destination.
S3 Delta files contains the incremental data for a particular run from the source database.
The file has 2 types of fields:
2. seq_no – This field determines the order of each record. The value is a sequence NUMBER and is incremented by 1 for each record within a file. For incremental files, the value determines the order of DML operations performed on the source.
3. eff_dt – The column is present in the files for which the table Delta replication is set to ‘Primary Key with History‘ . This determines the commit time for the record on the source. Each record will have its commit time as ‘EFF_DT’ in the delta files.
BryteFlow Ingest delivers data to the S3 data lake from relational databases like SAP, Oracle, SQL Server, Postgres, and MySQL in real-time or changed data in batches (as per configuration) using log-based CDC.
The upsert on the S3 data lake is automated and requires no coding nor integration with any third-party applications.
It prepares a manifest file on AmazonS3 with the details of the files loaded onto the data bucket.
Location: [data directory path]/manifest/[table_name_{partitionname}].manifest
File Type: manifest
File format: JSON
File Details : File contains the list of S3 URL’s for all the data files created for each table respectively.
Sample entry for the manifest file:
{
“entries”: [
{“url”:”s3://samplebucket/data/SAMPLETABLE_A/20230615130423/part-r-00000.snappy.parquet”, “meta”: { “content_length”: 26267},”mandatory”:true}
]
}
BryteFlow Ingest delivers data to the S3 data lake from relational databases like SAP, Oracle, SQL Server, Postgres, and MySQL in real-time or changed data in batches (as per configuration) using log-based CDC.
The upsert on the S3 data lake is automated and requires no coding nor integration with any third-party application.
It prepares a .struct file on AmazonS3 with the details of the table structure of the loaded tables.
Location: [data directory path]/manifest/[table_name_{partitionname}].struct
File Type: .struct
File format: JSON
File Details : File contains the structure of the table with column list and its data types.
Sample entry for the STRUCT file:
{
“manifest”: “s3://samplebucket/data/manifest/SAMPLETABLE_A.manifest”,
“srcTable”: “default:SAMPLETABLE_A”,
“dstFile”: “SAMPLETABLE_A”,
“dstCleanFile”: “SAMPLETABLE_A”,
“outputFormat”: “parquet.snappy”,
“structure”: [
{
“name”: “COLUMN1”,
“type”: “VARCHAR”
},
{
“name”: “COLUMN2”,
“type”: “INTEGER”
},
{
“name”: “COLUMN3”,
“type”: “VARCHAR”
},
{
“name”: “COLUMN4”,
“type”: “INTEGER”
}
]
}
Steps for enabling Windows authentication in BryteFlow Ingest:
1. Stop the BryteFlow Ingest service.
2. Download and Copy the sqljdbc_auth.dll to the bin directory of your java. You can download the DLL from “https://bryteflow.com/
3. Start Ingest service and add the following to the JDBC option in the Source database connection settings :
integratedSecurity=true
BryteFlow Ingest can be setup to Rollover in a parallel pipeline.
Below are the steps to rollover, Please note this can be implemented using REST API calls of BryteFlow Ingest, details below: