GTFS Ingest¶
This page documents ingest functions in the GTFS module.
ingestor.chalicelib.gtfs.ingest
¶
load_session_models(session)
¶
Query all GTFS models from a SQLAlchemy session and index them into dicts keyed by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
Session
|
A SQLAlchemy session connected to a GTFS SQLite database. |
required |
Returns:
| Type | Description |
|---|---|
SessionModels
|
A SessionModels instance with all models indexed by their respective keys. |
Source code in ingestor/chalicelib/gtfs/ingest.py
create_gl_route_date_totals(totals)
¶
Aggregate Green Line branch totals into a single combined Green Line total.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
totals
|
List[RouteDateTotals]
|
List of RouteDateTotals for all routes on a given date. |
required |
Returns:
| Type | Description |
|---|---|
RouteDateTotals
|
A single RouteDateTotals representing the combined Green Line service. |
Source code in ingestor/chalicelib/gtfs/ingest.py
create_route_date_totals(today, models)
¶
Create scheduled service totals for all valid routes on a given date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
today
|
date
|
The date to compute totals for. |
required |
models
|
SessionModels
|
The SessionModels containing all GTFS data. |
required |
Returns:
| Type | Description |
|---|---|
List[RouteDateTotals]
|
A list of RouteDateTotals for each valid route, including a combined |
List[RouteDateTotals]
|
Green Line entry. |
Source code in ingestor/chalicelib/gtfs/ingest.py
ingest_feed_to_dynamo(dynamodb, session, start_date, end_date)
¶
Compute and write scheduled service totals to DynamoDB for a date range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dynamodb
|
A boto3 DynamoDB resource. |
required | |
session
|
Session
|
A SQLAlchemy session connected to a GTFS SQLite database. |
required |
start_date
|
date
|
The first date to ingest (inclusive). |
required |
end_date
|
date
|
The last date to ingest (inclusive). |
required |
Source code in ingestor/chalicelib/gtfs/ingest.py
ingest_feeds(dynamodb, feeds, start_date, end_date, force_rebuild_feeds=False)
¶
Process a list of GTFS feeds by building/downloading them and ingesting to DynamoDB.
Each feed is either built locally, downloaded from S3, or reused if already present. The resulting SQLite database is then ingested into DynamoDB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dynamodb
|
A boto3 DynamoDB resource. |
required | |
feeds
|
List[GtfsFeed]
|
List of GtfsFeed objects to process. |
required |
start_date
|
date
|
The first date to ingest (inclusive). |
required |
end_date
|
date
|
The last date to ingest (inclusive). |
required |
force_rebuild_feeds
|
bool
|
If True, forces all feeds to be rebuilt locally and re-uploaded to S3. Defaults to False. |
False
|
Source code in ingestor/chalicelib/gtfs/ingest.py
ingest_gtfs_feeds_to_dynamo_and_s3(date_range=None, feed_key=None, local_archive_path=None, boto3_session=None, force_rebuild_feeds=False)
¶
Orchestrate the full GTFS ingestion pipeline from archive to DynamoDB and S3.
Either a date_range or a feed_key must be provided to identify which feeds to process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_range
|
Union[None, Tuple[date, date]]
|
A tuple of (start_date, end_date) to select feeds covering that range. Defaults to None. |
None
|
feed_key
|
Union[None, str]
|
A specific feed key to ingest. Defaults to None. |
None
|
local_archive_path
|
str | None
|
Path to store local feed files. If None, a temporary directory is used. Defaults to None. |
None
|
boto3_session
|
Session | None
|
An optional boto3 Session to use for AWS operations. If None, a new session is created. Defaults to None. |
None
|
force_rebuild_feeds
|
bool
|
If True, forces all feeds to be rebuilt locally. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
Exception
|
If neither date_range nor feed_key is provided. |
Source code in ingestor/chalicelib/gtfs/ingest.py
get_feed_keys_for_date_range(start_date, end_date)
¶
Retrieve the feed keys for all GTFS feeds that cover a given date range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
date
|
The start of the date range (inclusive). |
required |
end_date
|
date
|
The end of the date range (inclusive). |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
A list of feed key strings for feeds overlapping the date range. |