We Want To Change The World. We Believe It Starts With Education.
In India, 100 million children from low-income communities grow up without access to the 21st-century skills they need to thrive  from emotional intelligence and digital literacy to everyday money management. This gap leads to a lifetime of limited opportunity, intergenerational poverty, and rising mental health concerns.

What Are We Doing?
India’s classrooms are stuck in the past. We’re building what kids need for the future. 
At The Apprentice Project (TAP), we’re rewriting the rules of learning — using AI to unlock deeply personalized, adaptive education for every child, no matter their background. Our flagship product, TAP Buddy, uses an AI-first engine (PAL) powered by NLP, RAG systems, graph databases, and continuous student insight loops.
But for PAL to work, we need data that is clean, connected, real-time, and reliable.
That’s where you come in.
As our Data Engineer, you will build the data backbone that powers TAP Buddy and all our learning insights — from pipelines to warehouses to governance systems. You will shape how millions of learning interactions flow into intelligence that transforms student growth.
This isn’t just backend work. It’s infrastructure that fuels impact.

Who We’re Looking For
A builder who understands the full data lifecycle — ingestion, transformation, orchestration, warehousing, governance, and scaling. You thrive in fast-paced environments, care deeply about data quality, and know how to move between open-source tools, modern data platforms, and applied AI workflows.
If you’re someone who can architect data systems that are robust, real-time, and ready for AI — we want to meet you.

About The Role
Location: Remote / Hybrid (India Preferred) 
Experience: 3–5 years of relevant experience 
Compensation: ₹ 08-12 LPA 
Reports to: Associate Senior Manager - Data & MEAL
Type: Full-Time
Tech Stack: Python, Frappe, LangChain, Neo4j, FAISS, React.js, AWS/GCP/Azure

What You’ll Do
Data Pipeline Design & Development
Design, develop, and maintain scalable data pipelines (batch and streaming) using Spark, Flink, Beam, or similar frameworks.
Ensure seamless data ingestion from multiple learning platforms and content repositories.
Automate ETL/ELT processes for real-time and historical data workflows.
Data Warehousing & Storage
Design and maintain data warehouse structures to support analytics, reporting, and AI model consumption.
Build efficient storage models for structured, semi-structured, and unstructured data.
Optimize performance for querying, indexing, and pipeline execution.
Data Quality, Governance & Metadata
Implement strong frameworks for data quality, validation, lineage, and consistency across systems.
Set up metadata management and master data management (MDM) processes.
Collaborate with cross-functional teams to ensure robust data governance practices.
Data in Motion & Data at Rest
Architect systems that support both real-time streaming data and large-scale data-at-rest frameworks.
Implement monitoring systems that track freshness, reliability, and anomaly detection.
Automation & GenAI-Driven Efficiency
Use SQL and modern GenAI tools to simplify repetitive development workflows.
Leverage open-source tools to accelerate end-to-end data engineering tasks.
Contribute to internal automations that improve speed, accuracy, and developer productivity.
Open-Source Collaboration
Contribute to TAP’s open-source initiatives around data engineering and student learning insights.
Build maintainable, documented code that reflects open-source best practices.

Who You Are
You have 3–5 years of experience in data engineering, data management, or infrastructure development.
You understand metadata management, MDM processes, data quality, and data governance deeply.
You have solid experience with Spark, Flink, Beam, or similar distributed data tools.
You write strong SQL and can design, optimize, and manage complex queries.
You use GenAI tools to accelerate development, debugging, or documentation.
You thrive in open-source environments with 1–2 years of contributions to community or internal open-source projects.
You’re excited about applying data engineering to education, equity, and social impact.

Why Join Us? 
Because you're not here to build yet another dashboard. You’re here to build the future.
At TAP, we’re not building dashboards for the sake of dashboards — we’re building a system that gives every child in India the opportunity to learn with confidence, curiosity, and choice. As our Data Engineer, you will create the data foundation that makes personalized learning at scale possible.
You’ll work at the intersection of AI, education, and impact — shaping the future of how India learns.

What You Can Expect?
Real ownership – You’ll lead end-to-end data engineering initiatives.
Bold challenges – Build data systems powering millions of student interactions.
Cross-functional collaboration – Work with AI, product, pedagogy, and engineering teams.
Remote flexibility – Work from anywhere, create impact everywhere.
Growth path – Opportunities to evolve into Senior Data Engineer or Data Lead roles.
Clarity & transparency – Competitive compensation and a values-driven culture.

You can always reach out to us for any questions on hr@theapprenticeproject.org