> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kombo.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Fetching Data

> Learn how to get the newest data back from Kombo.

To minimize latency between data changes in the connected tool and your system,
and to prevent data drift, we recommend combining data-changed notifications
with periodic full fetches.

Best-practice implementation:

* Listen to our **data-changed webhook** to receive notifications of data
  changes.
* Fetch only updated data.
* Periodically **fetch all data** to fully align your dataset and correct any
  potential data drift.
* Combine both strategies for a setup that is robust and efficient.

## Overview

```mermaid theme={null}
sequenceDiagram
    participant K as Kombo
    participant YS as Your System
    participant T as Timer

    Note over K,YS: Data-changed notifications
    K->>YS: Webhook: Data Changed
    YS->>K: Fetch Updated Data
    K-->>YS: Updated Data Response

    Note over T,YS: Periodic full fetch
    T->>YS: Periodic Trigger
    YS->>K: Fetch All Data
    K-->>YS: Full Data Response
```

## Listening to data-changed webhook

We provide a webhook called `data-changed`. This is sent to your system whenever
data has changed inside Kombo. For example, after:

* We finish syncing (full or delta sync)
* We receive an update through [upstream
  webhooks](/hris/guides/upstream-webhooks)

By listening to this webhook, you can receive updates from Kombo efficiently,
allowing for the best possible UX.

<Note>
  {' '}

  For near real-time updates, the connected tool must have upstream webhooks
  enabled for the integration. Some tools allow Kombo to enable these
  automatically, while others require a one-time manual step in the
  integration's Setup Flow by the end-customer. If upstream webhooks are not
  enabled or not supported, `data-changed` will reflect updates after the next
  scheduled sync.{' '}
</Note>

The webhook will look similar to this, with an array of models that have changed
in our database since we last sent you that webhook:

```json theme={null}
{
  "id": "FhghqjnCi9WuAoLT8Z75CFcs",
  "type": "data-changed",
  "data": {
    "integration_id": "bombohr:hris-dev",
    "integration_tool": "bombohr",
    "integration_category": "HRIS",
    "changed_models": [
      {
        "name": "hris_employees"
      }
    ]
  }
}
```

**Simplified approach**

To make things simple, you can, whenever you receive a new data-changed webhook,
pull the models you're interested in, independent of the models we're telling
you changed.

**Recommended approach**

You can also listen to the data models we're telling you changed and pull data
based on those models. For that please look into the [list of
models](#all-models) that can appear.

Based on this list of changed models, figure out which requests you would like
to send. Keep in mind that models do not map 1:1 to our endpoints. For example,
you might be using the `/employees` endpoint to fetch both employees and their
employments and team memberships.

Furthermore, after every successful fetch from us, store the timestamp of when
the respective fetch started. Use this timestamp during your next fetch and pass
it with your requests to the Kombo API as the `updated_after` query parameter.
Kombo will then only return the entries that changed since that timestamp. Read
more about this
[here](/hris/implementation-guide/reading-employees#fetching-only-updated-data).

**Good to know:** The `updated_after` filter also considers changes in models
that are returned by the endpoint as nested values. For example, the
`/employees` endpoint will include every employee that has had changes to
related employments or team memberships, even if the employee profile itself did
not change. Read more below.

## All models

You should call the endpoints you care about based on the models we tell you
have changed. The following table shows endpoints connected to the relevant
models that will be part of the data-changed webhook.

We recommend the following approach:

1. **Decide which endpoints are relevant for your use case** (e.g., if your
   system is centered around employees, you primarily use Get employees).
2. **Map changed models to those endpoints** using the table.<br />For instance,
   if both hris\_employees and hris\_employments are listed in the webhook, but
   you primarily care about employees, you only need to call Get employees,
   since employment and team membership data is included there too.

This helps avoid unnecessary API calls while still ensuring your data is up to
date.

| **Endpoint**                                                | **Models**                                                                                                                                                               |
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [**Get absences**](/hris/v1/get-absences)                   | hris\_absences<br />hris\_absence\_types                                                                                                                                 |
| [**Get employees**](/hris/v1/get-employees)                 | hris\_employees<br />hris\_employments<br />hris\_join\_employees\_teams<br />hris\_legal\_entities<br />hris\_locations<br />hris\_teams<br />hris\_time\_off\_balances |
| [**Get employments**](/hris/v1/get-employments)             | hris\_employments                                                                                                                                                        |
| [**Get groups**](/hris/v1/get-groups)                       | hris\_teams                                                                                                                                                              |
| [**Get time off balances**](/hris/v1/get-time-off-balances) | hris\_time\_off\_balances<br />hris\_absence\_types                                                                                                                      |
| [**Get timesheets**](/hris/v1/get-timesheets)               | hris\_timesheets                                                                                                                                                         |

## Preventing Data Drift

In addition to fetching data from Kombo's endpoints in response to receiving the
data-changed webhook, we recommend running a periodic full data fetch for the
data models you care about. In most cases, a 7-day schedule is ideal.

This helps to remedy any drift that may occur in your data, e.g. from accidental
manual changes or lost webhooks.

To implement this, perform GET requests for the Kombo data models you care about
without passing the `updated_after` query parameter. Then upsert the data
returned by Kombo and merge it with your existing data copy.

## Full Example Code

```ts theme={null}
async function handleDataChangedWebhook(body) {
  // Verify webhook sender here

  // Assuming we are using prisma as our ORM
  const integration = await prisma.integration.findUniqueOrThrow({
    where: {
      kombo_integration_id: body.data.integration_id,
    },
    select: {
      id: true, // Need the customer ID for the update
      last_fetched_from_kombo_at: true,
      kombo_integration_id: true, // Also select the integration ID
    },
  })

  // Track when WE start fetching (not when Kombo synced from the HRIS)
  const fetchStartDate = new Date() // This time should be in UTC
  const lastFetchStartDate =
    integration.last_fetched_from_kombo_at?.toISOString()

  // Fetch data only when data-changed webhook indicates a change
  const employeesChanged = !!body.data.changed_models.find(
    entry => entry.name === 'hris_employees',
  )
  if (employeesChanged) {
    await fetchEmployees(integration.kombo_integration_id, lastFetchStartDate)
  }

  await prisma.integration.update({
    where: {
      id: integration.id,
    },
    data: {
      last_fetched_from_kombo_at: fetchStartDate,
    },
  })
}

async function handleCronJob(integration) {
  const fetchStartDate = new Date() // This time should be in UTC

  // Do not pass a starting date here; fetch everything
  // Fetch the models you care about (repeat for employees, employments, groups, ...)
  await fetchEmployees(integration.kombo_integration_id)

  await prisma.integration.update({
    where: {
      id: integration.id,
    },
    data: {
      last_fetched_from_kombo_at: fetchStartDate,
    },
  })
}

async function fetchEmployees(integrationId: string, updatedAfter?: string) {
  let cursor
  do {
    const resp = await axios.get('https://api.kombo.dev/v1/hris/employees', {
      headers: {
        Authorization: `Bearer ${KOMBO_API_KEY}`,
        'X-Integration-Id': integrationId, // Use the stored integration ID
      },
      params: {
        cursor: cursor,
        updated_after: updatedAfter,
      },
    })

    cursor = resp.data.data.next

    // Implement your handling logic here
    // Usually, you will upsert the data into your database and build specific
    // domain logic here.
    await handleEmployeeData(integrationId, resp.data.data.results)
  } while (cursor)
}
```

## Understanding `changed_at` vs `updated_after` Behavior

A common source of confusion is understanding when records are returned by the
`updated_after` filter and how this relates to each record's `changed_at`
timestamp. Here's the key distinction:

### Record-level `changed_at` Field

Each record has a `changed_at` timestamp that **only updates when properties
directly on that record change**. For example:

* If an employee's `first_name` changes, the employee's `changed_at` updates
* If an employment's `job_title` changes, the employment's `changed_at` updates
* **However**: If an employment's job title changes, the related employee's
  `changed_at` field does **NOT** update

### Endpoint Filtering with `updated_after`

The `updated_after` parameter works differently - it returns records when
**either the record itself OR its nested data has been updated**:

#### Example: Employees Endpoint

When you call `GET /employees` with `updated_after`, you'll receive employees
if:

1. **Direct employee changes**: The employee itself was modified (name, email,
   etc.)
2. **Nested employment changes**: Employment data was updated (job title,
   salary, etc.)
3. **Nested team changes**: Team membership or team details were updated
4. **Nested location changes**: Work location details were updated

This means an employee can be returned even if their own `changed_at` timestamp
hasn't changed.

#### Concrete Scenario

Let's say you call `GET /employees` at 9:00 AM and get:

```json theme={null}
{
  "id": "emp123",
  "first_name": "Sarah",
  "changed_at": "2023-10-01T08:00:00Z",
  "employments": [
    {
      "id": "employment456",
      "job_title": "Software Engineer",
      "changed_at": "2023-10-01T08:00:00Z"
    }
  ]
}
```

At 10:00 AM, the employee's job title is changed to "Senior Software Engineer"
in the HRIS. If you then call
`GET/employees?updated_after=2023-10-01T09:00:00Z`, you'll receive:

```json theme={null}
{
  "id": "emp123",
  "first_name": "Sarah",
  "changed_at": "2023-10-01T08:00:00Z", // ← Same timestamp!
  "employments": [
    {
      "id": "employment456",
      "job_title": "Senior Software Engineer", // ← Updated data
      "changed_at": "2023-10-01T10:00:00Z" // ← New timestamp
    }
  ]
}
```

**Key Point**: The employee's `changed_at` remains unchanged, but the employee
is still returned because it contains updated nested employment data.

### Best Practice

When using `updated_after` filtering:

1. **Don't assume** a record was directly modified just because it's returned
2. **Compare nested data** to determine what actually changed
3. **Use the nested objects' `changed_at` fields** to identify which parts were
   updated
4. **Design your sync logic** to handle both direct and indirect changes

## FAQ

### Why do you not provide the changed data inside of data-changed?

We want to make integrating with Kombo as simple as possible for you. Since we
recommend running occasional full fetches anyways, it means you can re-use most
of your logic for both fetch types. Furthermore, this allows you to iterate and
upsert data at your own pace. Unlike if we were to send you a *really* large
payload after an initial sync. By listening to just this single unified webhook,
you are always guaranteed to benefit from our latest innovations to help you get
the freshest data.

### How often do you send the data-changed event? Do you debounce the webhook?

Yes, by default we debounce data-changed with a 30-second window. That means no
matter how many events we're receiving, you'll receive at most one webhook every
30 seconds.

This works as follows: The first time this event fires, we will pass it through
to you right away. Then we will wait 30 seconds. All the events we receive in
those 30 seconds will then be sent to you as a single update with the next
webhook.

For more details, please refer to the
[webhooks](/hris/guides/webhooks#data-changed) page.
