A leading venture capital fund relies on startup and company data from many online and offline sources to support investment decisions. Analysts need broad, current, and reliable information about companies, and the data engineering team works closely with them to make that analysis possible.
The firm already consumed data from several public and paid providers, including well-known third-party data platforms. But ingestion into the warehouse had grown into a set of manually run pipelines with inconsistent implementation patterns.
The data varied significantly by source:
Parquet, JSON, JSONL, and CSVGCP, Snowflake, SFTP, and RESTThe existing ingestion setup created engineering friction:
ETL patterns where ELT was more appropriate.GCP services around BigQuery.For a firm where data quality and coverage directly influence investment analysis, this created unnecessary operational load and made the ingestion layer harder to extend.
Technogise helped the firm move from manually run, inconsistent pipelines to a standard, reusable ingestion framework.
BigQuery remained the primary data warehouse. Around it, Technogise introduced a clearer orchestration and transformation model:
Dagster became the primary DAG orchestrator for building, running, and observing data and ML pipelines.DBT handled SQL-based transformations, primary cleanup, freshness checks, and filtering.DLT, Sling, and Dataflow supported intermediate transformation, format conversion, and loading from sources such as SFTP and GCS.The framework was designed to handle the range of source characteristics the firm deals with: different file sizes, formats, schemas, delivery frequencies, and source systems.
Technogise worked closely with the data engineering team and the firm's analysts. This collaboration kept the ingestion platform aligned with the investment workflow, where analysts depend on the warehouse to evaluate startups and companies.
The team also used AI coding agents to speed up development of the platform while keeping the engineering approach consistent across pipelines.
The firm now has an automated ingestion framework instead of a collection of manually run, inconsistent pipelines.
The new setup gives the data engineering team a standard way to ingest data from multiple sources into BigQuery, orchestrate jobs through Dagster, and apply SQL-based transformation and checks through DBT.
The direction of change is clear:
For a VC firm whose investment decisions depend on broad and timely company data, the ingestion layer is now better aligned with how the business evaluates opportunities.