Standardising Data Ingestion for a Leading India-Based VC Firm | Case Study

Value	Label
Automated ingestion	Replaced manually run data loading pipelines
Standard framework	Created a reusable approach for multiple data sources and formats
Core stack	`BigQuery`, `Dagster`, `DBT`, `DLT`, `Sling`, `Dataflow`

A leading venture capital fund relies on startup and company data from many online and offline sources to support investment decisions. Analysts need broad, current, and reliable information about companies, and the data engineering team works closely with them to make that analysis possible.

The firm already consumed data from several public and paid providers, including well-known third-party data platforms. But ingestion into the warehouse had grown into a set of manually run pipelines with inconsistent implementation patterns.

The data varied significantly by source:

Size: from a few MB to terabytes
Format: Parquet, JSON, JSONL, and CSV
Source type: GCP, Snowflake, SFTP, and REST
Frequency: daily, weekly, and monthly dumps
Schema: different structures across providers

The existing ingestion setup created engineering friction:

Pipelines used different languages and tools.
Some jobs followed non-standard ETL patterns where ELT was more appropriate.
Teams used different ways to run jobs and interact with GCP services around BigQuery.
Manual steps were required across the ingestion process.

For a firm where data quality and coverage directly influence investment analysis, this created unnecessary operational load and made the ingestion layer harder to extend.

Technogise helped the firm move from manually run, inconsistent pipelines to a standard, reusable ingestion framework.

BigQuery remained the primary data warehouse. Around it, Technogise introduced a clearer orchestration and transformation model:

Dagster became the primary DAG orchestrator for building, running, and observing data and ML pipelines.
DBT handled SQL-based transformations, primary cleanup, freshness checks, and filtering.
DLT, Sling, and Dataflow supported intermediate transformation, format conversion, and loading from sources such as SFTP and GCS.

The framework was designed to handle the range of source characteristics the firm deals with: different file sizes, formats, schemas, delivery frequencies, and source systems.

Technogise worked closely with the data engineering team and the firm's analysts. This collaboration kept the ingestion platform aligned with the investment workflow, where analysts depend on the warehouse to evaluate startups and companies.

The team also used AI coding agents to speed up development of the platform while keeping the engineering approach consistent across pipelines.

The firm now has an automated ingestion framework instead of a collection of manually run, inconsistent pipelines.

The new setup gives the data engineering team a standard way to ingest data from multiple sources into BigQuery, orchestrate jobs through Dagster, and apply SQL-based transformation and checks through DBT.

The direction of change is clear:

Manual pipeline runs have been replaced by automated ingestion.
Inconsistent implementation patterns have been replaced by a shared framework.
Source-specific handling is now supported within a common engineering model.
The data engineering team has a more maintainable foundation for supporting analysts and the investment process.

For a VC firm whose investment decisions depend on broad and timely company data, the ingestion layer is now better aligned with how the business evaluates opportunities.

On this page

Leading VC Firm

How we automated and standardised data pipelines that feed a critical investment analytics warehouse.

Overview

Problem

Solution

Result

Suggested Client Stories

Cloud deployment and containerisation of a microservices based application.

Engineering a Productivity Tool for Data Access, Management, and Aggregation

Single-Page Web App to Automate Returns for a Large-Scale E-commerce Platform