Skip to content

Commit c4d4e95

Browse files
committed
Adds support for AWS OpenSearch Serverless (AOSS)
Why are these changes being introduced: * We are planning a migration to AOSS * We need to maintain our existing AWS OpenSearch Service (ES) integration while migrating to AOSS Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-423 How does this address that need: * Added support for AWS OpenSearch Serverless (AOSS) using either expiring credentials by passing a session token or by assuming a role. * Configured the application to support AWS OpenSearch Serverless (AOSS) in addition to the existing AWS OpenSearch Service (ES). * Added logic to choose the appropriate client based on environment variables. * Implemented AWS SigV4 signing for AOSS authentication. Document any side effects to this change: * Updated lambda configuration to support session tokens. It does not have assume role configuration at this time, but we needed to support temporary credentials in the lambda to support them locally in OpenSearch Serverless (AOSS) so I included that in this change. * Reorganized documentation on environment variables * Allow changing log level in development * Changed a log level for a metric we aren't using yet
1 parent 7c02df2 commit c4d4e95

9 files changed

Lines changed: 488 additions & 64 deletions

File tree

Gemfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ git_source(:github) { |repo| "https://github.com/#{repo}.git" }
44
ruby '3.4.8'
55

66
gem 'aws-sdk-lambda'
7+
gem 'aws-sdk-sts'
8+
gem 'aws-sigv4'
79
gem 'bootsnap', require: false
810
gem 'devise'
911
gem 'faraday_middleware-aws-sigv4'
@@ -12,6 +14,7 @@ gem 'graphql'
1214
gem 'jwt'
1315
gem 'lograge'
1416
gem 'mitlibraries-theme', git: 'https://github.com/mitlibraries/mitlibraries-theme', tag: 'v1.4'
17+
gem 'opensearch-aws-sigv4'
1518
gem 'opensearch-ruby'
1619
gem 'puma'
1720
gem 'rack-attack'
@@ -24,7 +27,7 @@ gem 'sentry-ruby'
2427
gem 'uglifier'
2528

2629
group :production do
27-
gem 'connection_pool', '< 3' # 3.x requires keyword args; pin to 2.x for Rails 7.2.3
30+
gem 'connection_pool', '< 3' # 3.x requires keyword args; pin to 2.x for Rails 7.2.3
2831
gem 'pg'
2932
end
3033

Gemfile.lock

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ GEM
102102
aws-sdk-lambda (1.176.0)
103103
aws-sdk-core (~> 3, >= 3.244.0)
104104
aws-sigv4 (~> 1.5)
105+
aws-sdk-sts (1.12.0)
106+
aws-sdk-core (~> 3, >= 3.110.0)
107+
aws-sigv4 (~> 1.1)
105108
aws-sigv4 (1.12.1)
106109
aws-eventstream (~> 1, >= 1.0.2)
107110
base64 (0.3.0)
@@ -277,6 +280,9 @@ GEM
277280
nokogiri (1.19.1)
278281
mini_portile2 (~> 2.8.2)
279282
racc (~> 1.4)
283+
opensearch-aws-sigv4 (1.3.0)
284+
aws-sigv4 (>= 1)
285+
opensearch-ruby (>= 1.0.1, < 4.0)
280286
opensearch-ruby (3.4.0)
281287
faraday (>= 1.0, < 3)
282288
multi_json (>= 1.0)
@@ -478,6 +484,8 @@ PLATFORMS
478484
DEPENDENCIES
479485
annotate
480486
aws-sdk-lambda
487+
aws-sdk-sts
488+
aws-sigv4
481489
bootsnap
482490
byebug
483491
capybara
@@ -497,6 +505,7 @@ DEPENDENCIES
497505
minitest (< 6)
498506
mitlibraries-theme!
499507
mocha
508+
opensearch-aws-sigv4
500509
opensearch-ruby
501510
pg
502511
puma

README.md

Lines changed: 83 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,26 @@
44

55
This application interfaces with an OpenSearch backend and exposes a GraphQL endpoint to allow anonymous users to query our data.
66

7+
- [Architecture Decision Records](#architecture-decision-records)
8+
- [Developing this application](#developing-this-application)
9+
- [Generating cassettes for tests](#generating-cassettes-for-tests)
10+
- [Confirming functionality after updating dependencies](#confirming-functionality-after-updating-dependencies)
11+
- [Publishing User Facing Documentation](#publishing-user-facing-documentation)
12+
- [Running jekyll documentation locally](#running-jekyll-documentation-locally)
13+
- [Automatic generation of technical specifications from GraphQL](#automatic-generation-of-technical-specifications-from-graphql)
14+
- [General Configuration](#general-configuration)
15+
- [Name and Domain](#name-and-domain)
16+
- [Authentication](#authentication)
17+
- [Email Configuration](#email-configuration)
18+
- [Observability (Optional)](#observability-optional)
19+
- [Rate Limiting (Optional)](#rate-limiting-optional)
20+
- [AWS Configuration](#aws-configuration)
21+
- [OpenSearch Configuration](#opensearch-configuration)
22+
- [AWS Credentials (Used for AWS-based OpenSearch and timdex-semantic-builder)](#aws-credentials-used-for-aws-based-opensearch-and-timdex-semantic-builder)
23+
- [AWS OpenSearch Service (Legacy)](#aws-opensearch-service-legacy)
24+
- [AWS OpenSearch Serverless (AOSS)](#aws-opensearch-serverless-aoss)
25+
- [TIMDEX Semantic Builder Lambda](#timdex-semantic-builder-lambda)
26+
727
## Architecture Decision Records
828

929
This repository contains Architecture Decision Records in the
@@ -127,7 +147,7 @@ to ensure everything looks as expected.
127147
bundle exec jekyll serve --incremental --source ./docs
128148
```
129149

130-
Once the jekyll server is running, you can access the local docs at http://localhost:4000/timdex/
150+
Once the jekyll server is running, you can access the local docs at <http://localhost:4000/timdex/>
131151

132152
Note: it is important to load the documentation from the `/timdex/` path locally as that is how it works when built and deployed to GitHub Pages so testing locally the same way will ensure our asset paths will work when deployed.
133153

@@ -146,51 +166,70 @@ The config file `./docs/reference/_spectaql_config.yml` controls the build proce
146166
and making changes to this file (which is included in version control) would be the main reason to run the process
147167
locally.
148168

149-
## Required Environment Variables (all ENVs)
150-
151-
- `EMAIL_FROM`: email address to send message from, including the registration
152-
and forgot password messages.
153-
- `EMAIL_URL_HOST` - base url to use when sending emails that link back to the
154-
application. In development, often `localhost:3000`. On heroku, often
155-
`yourapp.herokuapp.com`. However, if you use a custom domain in production,
156-
that should be the value you use in production.
157-
- `JWT_SECRET_KEY`: generate with `rails secret`
169+
## General Configuration
158170

159-
## Production required Environment Variables
171+
### Name and Domain
160172

161-
- `AWS_ACCESS_KEY_ID`: AWS credentials for OpenSearch and Lambda
162-
- `AWS_SECRET_ACCESS_KEY`: AWS credentials for OpenSearch and Lambda
163-
- `AWS_REGION`: AWS region for OpenSearch and Lambda services
164-
- `AWS_OPENSEARCH`: boolean. Set to true to enable AWSv4 Signing for OpenSearch
165-
- `OPENSEARCH_INDEX`: Opensearch index or alias to query, default will be to search all indexes which is generally not
166-
expected. `timdex` or `all-current` are aliases used consistently in our data pipelines, with
167-
`timdex` being most likely what most use cases will want.
168-
- `OPENSEARCH_URL`: Opensearch URL, defaults to `http://localhost:9200`
169-
- `TIMDEX_SEMANTIC_BUILDER_FUNCTION_NAME`: AWS Lambda function name with alias for semantic query building.
170-
Configurable to use alternative deployment tiers (e.g., dev1, stage, prod).
171-
- `SMTP_ADDRESS`
172-
- `SMTP_PASSWORD`
173-
- `SMTP_PORT`
174-
- `SMTP_USER`
175-
176-
## Optional Environment Variables (all ENVs)
177-
178-
- `AWS_SESSION_TOKEN`: AWS session token for temporary credentials when using expiring AWS credentials
179-
- `OPENSEARCH_LOG` if `true`, verbosely logs OpenSearch queries.
180-
181-
```text
182-
NOTE: do not set this ENV at all if you want ES logging fully disabled.
183-
Setting it to `false` is still setting it and you will be annoyed and
184-
confused.
185-
```
186-
- `OPENSEARCH_SOURCE_EXCLUDES` comma separated list of fields to exclude from the OpenSearch `_source` field. Leave unset to return all fields.
187-
- recommended value: `embedding_full_record,fulltext`
188173
- `PLATFORM_NAME`: The value set is added to the header after the MIT Libraries logo. The logic and CSS for this comes from our theme gem.
189-
- `PREFERRED_DOMAIN` - set this to the domain you would like to to use. Any
190-
other requests that come to the app will redirect to the root of this domain.
191-
This is useful to prevent access to herokuapp.com domains.
192-
- `REQUESTS_PER_PERIOD` - requests allowed before throttling. Default is 100.
193-
- `REQUEST_PERIOD` - number of minutes for the period in `REQUESTS_PER_PERIOD`.
194-
Default is 1.
174+
- `PREFERRED_DOMAIN`: set this to the domain you would like to use. Any other requests that come to the app will redirect to the root of this domain. This is useful to prevent access to herokuapp.com domains.
175+
176+
### Authentication
177+
178+
- `JWT_SECRET_KEY`: generate with `rails secret` **required**
179+
180+
### Email Configuration
181+
182+
- `EMAIL_FROM`: email address to send message from, including the registration and forgot password messages. **required**
183+
- `EMAIL_URL_HOST`: base url to use when sending emails that link back to the application. In development, often `localhost:3000`. On heroku, often `yourapp.herokuapp.com`. However, if you use a custom domain in production, that should be the value you use in production. **required**
184+
- `SMTP_ADDRESS`: SMTP server address (Required for production)
185+
- `SMTP_PORT`: SMTP server port (Required for production)
186+
- `SMTP_USER`: SMTP authentication user (Required for production)
187+
- `SMTP_PASSWORD`: SMTP authentication password (Required for production)
188+
189+
### Observability (Optional)
190+
191+
- `RAILS_LOG_LEVEL`: defaults to debug in development and info in production
195192
- `SENTRY_DSN`: client key for Sentry exception logging
196193
- `SENTRY_ENV`: Sentry environment for the application. Defaults to 'unknown' if unset.
194+
195+
### Rate Limiting (Optional)
196+
197+
- `REQUESTS_PER_PERIOD`: requests allowed before throttling. Default is 100.
198+
- `REQUEST_PERIOD`: number of minutes for the period in `REQUESTS_PER_PERIOD`. Default is 1.
199+
200+
## AWS Configuration
201+
202+
### OpenSearch Configuration
203+
204+
- `OPENSEARCH_URL`: OpenSearch endpoint URL, defaults to `http://localhost:9200`
205+
- `OPENSEARCH_INDEX`: OpenSearch index or alias to query. Defaults to searching all indexes (generally not recommended). `timdex` or `all-current` are aliases used consistently in our data pipelines, with `timdex` being most likely what most use cases will want. **required**
206+
- `OPENSEARCH_LOG`: if set to `true` (case-insensitive), verbosely logs OpenSearch queries. Leave unset, or set to any other value such as `false`, to keep OpenSearch logging disabled.
207+
- `OPENSEARCH_SOURCE_EXCLUDES`: comma-separated list of fields to exclude from the OpenSearch `_source` field. Leave unset to return all fields. Recommended value: `embedding_full_record,fulltext`
208+
209+
### AWS Credentials (Used for AWS-based OpenSearch and timdex-semantic-builder)
210+
211+
- `AWS_ACCESS_KEY_ID`: AWS access key for OpenSearch and Lambda
212+
- `AWS_SECRET_ACCESS_KEY`: AWS secret key for OpenSearch and Lambda
213+
- `AWS_REGION`: AWS region for OpenSearch and Lambda services
214+
- `AWS_SESSION_TOKEN`: (Optional) AWS session token for temporary credentials when using expiring AWS credentials.
215+
Use this with temporary AWS credentials for AWS-based OpenSearch access and Lambda.
216+
For AOSS, when this is set, temporary credentials are used directly and `AWS_AOSS_ROLE_ARN` is not needed.
217+
218+
### AWS OpenSearch Service (Legacy)
219+
220+
This is our legacy AWS OpenSearch Service Cluster. All production instances should use this until our migration to Serverless (AOSS) is complete.
221+
222+
- `AWS_OPENSEARCH`: boolean. Set to `true` to enable AWS SigV4 signing for AWS OpenSearch Service. This is the legacy approach and will be replaced with `AWS_AOSS` when we complete our migration to Serverless.
223+
224+
### AWS OpenSearch Serverless (AOSS)
225+
226+
This is our upcoming configuration once migration is complete. This uses a different [authentication mechanism](https://github.com/awsdocs/amazon-opensearch-service-developer-guide/blob/master/doc_source/serverless-clients.md#ruby) than our legacy AWS OpenSearch Service.
227+
228+
- `AWS_AOSS`: boolean. Set to `true` to enable AWS OpenSearch Serverless (AOSS).
229+
- `AWS_AOSS_ROLE_ARN`: AWS IAM role ARN to assume for AOSS authentication. **Required when** `AWS_AOSS=true` **and** `AWS_SESSION_TOKEN` is not set. This enables automatic credential refresh via role assumption.
230+
When `AWS_SESSION_TOKEN` is present, temporary credentials are used directly and `AWS_AOSS_ROLE_ARN` is not needed. This is only used in local development. `AWS_AOSS_ROLE_ARN` is used in production.
231+
232+
### TIMDEX Semantic Builder Lambda
233+
234+
- `TIMDEX_SEMANTIC_BUILDER_FUNCTION_NAME`: AWS Lambda function name with alias for semantic query building.
235+
Configurable to use alternative deployment tiers (e.g., dev1, stage, prod). Generally takes the format `function_name:live` where `live` is the alias. Failure to include the alias will result in extremely slow performance at best. Use the alias. Note: the lambda must be in the same AWS account as OpenSearch. If you want to test dev1 OpenSearch, you must also switch the lambda name to a dev1 variant.

app/graphql/timdex_field_usage_analyzer.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
class TimdexFieldUsageAnalyzer < GraphQL::Analysis::AST::FieldUsage
66
# This overrides a GraphQL::Analysis::AST::FieldUsage method
77
def result
8-
Rails.logger.info("GraphQL used fields: #{@used_fields.to_a}")
8+
Rails.logger.debug("GraphQL used fields: #{@used_fields.to_a}")
99
Rails.logger.info("GraphQL used deprecated fields: #{@used_deprecated_fields.to_a}")
1010
Rails.logger.info("GraphQL used deprecated arguments: #{@used_deprecated_arguments.to_a}")
1111
{

config/environments/development.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@
6565
# Suppress logger output for asset requests.
6666
config.assets.quiet = true
6767

68+
# Allow changing log level in development
69+
config.log_level = ENV['RAILS_LOG_LEVEL'] || :debug
70+
6871
# Raises error for missing translations.
6972
# config.i18n.raise_on_missing_translations = true
7073

config/initializers/lambda.rb

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
11
require 'aws-sdk-lambda'
22

33
def configure_lambda_client
4-
Aws::Lambda::Client.new(
5-
region: ENV.fetch('AWS_REGION', 'us-east-1'),
6-
access_key_id: ENV.fetch('AWS_ACCESS_KEY_ID'),
7-
secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY')
8-
)
4+
if ENV['AWS_SESSION_TOKEN'].present?
5+
Aws::Lambda::Client.new(
6+
region: ENV.fetch('AWS_REGION', 'us-east-1'),
7+
access_key_id: ENV.fetch('AWS_ACCESS_KEY_ID'),
8+
secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY'),
9+
session_token: ENV.fetch('AWS_SESSION_TOKEN')
10+
)
11+
else
12+
Aws::Lambda::Client.new(
13+
region: ENV.fetch('AWS_REGION', 'us-east-1'),
14+
access_key_id: ENV.fetch('AWS_ACCESS_KEY_ID'),
15+
secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY')
16+
)
17+
end
918
end
1019

1120
Timdex::LambdaClient = configure_lambda_client

0 commit comments

Comments
 (0)