Leading VC Firm

How we automated and standardised data pipelines that feed a critical investment analytics warehouse.

Overview

Value Label
Automated ingestion Replaced manually run data loading pipelines
Standard framework Created a reusable approach for multiple data sources and formats
Core stack BigQuery, Dagster, DBT, DLT, Sling, Dataflow

Problem

A leading venture capital fund relies on startup and company data from many online and offline sources to support investment decisions. Analysts need broad, current, and reliable information about companies, and the data engineering team works closely with them to make that analysis possible.

The firm already consumed data from several public and paid providers, including well-known third-party data platforms. But ingestion into the warehouse had grown into a set of manually run pipelines with inconsistent implementation patterns.

The data varied significantly by source:

  • Size: from a few MB to terabytes
  • Format: Parquet, JSON, JSONL, and CSV
  • Source type: GCP, Snowflake, SFTP, and REST
  • Frequency: daily, weekly, and monthly dumps
  • Schema: different structures across providers

The existing ingestion setup created engineering friction:

  • Pipelines used different languages and tools.
  • Some jobs followed non-standard ETL patterns where ELT was more appropriate.
  • Teams used different ways to run jobs and interact with GCP services around BigQuery.
  • Manual steps were required across the ingestion process.

For a firm where data quality and coverage directly influence investment analysis, this created unnecessary operational load and made the ingestion layer harder to extend.

Solution

Technogise helped the firm move from manually run, inconsistent pipelines to a standard, reusable ingestion framework.

BigQuery remained the primary data warehouse. Around it, Technogise introduced a clearer orchestration and transformation model:

  • Dagster became the primary DAG orchestrator for building, running, and observing data and ML pipelines.
  • DBT handled SQL-based transformations, primary cleanup, freshness checks, and filtering.
  • DLT, Sling, and Dataflow supported intermediate transformation, format conversion, and loading from sources such as SFTP and GCS.

The framework was designed to handle the range of source characteristics the firm deals with: different file sizes, formats, schemas, delivery frequencies, and source systems.

Technogise worked closely with the data engineering team and the firm's analysts. This collaboration kept the ingestion platform aligned with the investment workflow, where analysts depend on the warehouse to evaluate startups and companies.

The team also used AI coding agents to speed up development of the platform while keeping the engineering approach consistent across pipelines.

Result

The firm now has an automated ingestion framework instead of a collection of manually run, inconsistent pipelines.

The new setup gives the data engineering team a standard way to ingest data from multiple sources into BigQuery, orchestrate jobs through Dagster, and apply SQL-based transformation and checks through DBT.

The direction of change is clear:

  • Manual pipeline runs have been replaced by automated ingestion.
  • Inconsistent implementation patterns have been replaced by a shared framework.
  • Source-specific handling is now supported within a common engineering model.
  • The data engineering team has a more maintainable foundation for supporting analysts and the investment process.

For a VC firm whose investment decisions depend on broad and timely company data, the ingestion layer is now better aligned with how the business evaluates opportunities.

Data Pipelines
Data Analytics
Workflow Automation

Suggested Client Stories