Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ yarn test

Before finalizing changes, run `yarn ci` from the root for complete build/test/lint verification.

Also run `yarn format` to correct any formatting issues.

## Architecture

### Key packages
Expand Down
1 change: 0 additions & 1 deletion beachball.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,5 @@ const config = {
},
},
ignorePatterns: [".*ignore", "jest.config.js", "**/__*/**/*"],
disallowedChangeTypes: ["major"],
};
module.exports = config;
53 changes: 53 additions & 0 deletions change/change-922e3e3d-8503-4578-9876-071040126335.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{
"changes": [
{
"type": "minor",
"comment": "New plugin package for Azure Blob Storage cache",
"packageName": "@lage-run/azure-blob-cache-storage",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "major",
"comment": "Move Azure Blob cache to plugin; remove built-in azure-blob provider",
"packageName": "backfill-cache",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "major",
"comment": "Remove AzureBlobCacheStorageConfig and CustomStorageConfig from CacheStorageConfig union; add CustomCacheStoragePlugin and CustomCacheStorageConfig types",
"packageName": "backfill-config",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "minor",
"comment": "Support custom plugin providers via isCustomPluginProvider",
"packageName": "backfill",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "major",
"comment": "Remove Azure credential handling; credentials are now managed by the @lage-run/azure-blob-cache-storage plugin",
"packageName": "@lage-run/cache",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "major",
"comment": "Remove AzureCredentialName export and Azure-specific type augmentation from CacheOptions",
"packageName": "@lage-run/config",
"email": "email not defined",
"dependentChangeType": "patch"
},
{
"type": "major",
"comment": "Migrate from CustomStorageConfig to CustomCacheStoragePlugin pattern",
"packageName": "@lage-run/cache-github-actions",
"email": "email not defined",
"dependentChangeType": "patch"
}
]
}
37 changes: 30 additions & 7 deletions docs/docs/guides/remote-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,23 @@ The theory is that when the CI job runs, it'll produce a "last known good" cache

## Setting up remote cache - Azure Blob Storage

Follow these steps to set up a remote cache.
Azure Blob Storage cache is available as a plugin: `@lage-run/azure-blob-cache-storage`. This plugin must be installed separately.

### 1. Upgrade to latest `lage`
### 1. Install the plugin

```
yarn add @lage-run/azure-blob-cache-storage
```

### 2. Upgrade to latest `lage`

See the [migration guide](../cookbook/migration.mdx) for more details.

```
yarn upgrade lage
```

### 2. Create `.env` and add to `.gitignore`
### 3. Create `.env` and add to `.gitignore`

Create the file:

Expand All @@ -39,7 +45,7 @@ lib
dist
```

### 3. Generate auth tokens from Azure storage account
### 4. Generate auth tokens from Azure storage account

Prerequisite is to have a working Storage Account with Blob Storage Container created. Note that container name, it'll be needed for Step 5.

Expand All @@ -51,7 +57,7 @@ Prerequisite is to have a working Storage Account with Blob Storage Container cr
6. Click "show keys"
7. Save the "connection string" - this is your **read-write** connection string (alternatively, you can create a read-write SAS connection string)

### 4. Modify the `.env` file with the remote cache connection information
### 5. Modify the `.env` file with the remote cache connection information

```txt title=".env"
## This is required as of right now
Expand All @@ -61,7 +67,24 @@ BACKFILL_CACHE_PROVIDER="azure-blob"
BACKFILL_CACHE_PROVIDER_OPTIONS={"connectionString":"the **read-only** connection string","container":"CONTAINER NAME"}
```

### 5. Create a "secret" in the CI system for a Read/Write token
Alternatively, you can configure it directly in `lage.config.js`:

```js title="lage.config.js"
module.exports = {
cacheOptions: {
cacheStorageConfig: {
provider: "custom",
plugin: "@lage-run/azure-blob-cache-storage",
options: {
connectionString: "...",
container: "..."
}
}
}
};
```

### 6. Create a "secret" in the CI system for a Read/Write token

Here's an example snippet of Github Action with the correct environment variable set:

Expand All @@ -81,7 +104,7 @@ Create a secret named "BACKFILL_CACHE_PROVIDER_OPTIONS":

`process.env.BACKFILL_CACHE_PROVIDER_OPTIONS`is evaluated via backfill (see [`getEnvConfig()`](https://github.com/microsoft/lage/blob/master/packages/backfill-config/src/envConfig.ts#L82) in `backfill-config`).

For "azure-blob" cache provider with a non-sas/key-based `connectionString`(storage account endpoint) requiring azure identity authentication do not use `BACKFILL_CACHE_PROVIDER_OPTIONS`, instead populate the required env variables according to the desired identity/environment. (See [Azure Idenity SDK](https://learn.microsoft.com/en-us/javascript/api/overview/azure/identity-readme)) and set `credentialName` property in the `lage.config.js` under `cacheOptions.cacheStorageConfig.options.credentialName` or via env var `AZURE_IDENTITY_CREDENTIAL_NAME` Supported options are:
For the Azure Blob cache provider with a non-sas/key-based `connectionString` (storage account endpoint) requiring [Azure Identity](https://learn.microsoft.com/en-us/javascript/api/overview/azure/identity-readme) authentication, you can pass a `credentialName` option in the plugin config or via the `AZURE_IDENTITY_CREDENTIAL_NAME` environment variable. (Do not use `BACKFILL_CACHE_PROVIDER_OPTIONS` in this case.) Supported options are:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use BACKFILL_CACHE_PROVIDER_OPTIONS in this case.

Figure out why this was needed and try to get rid of it maybe?


- `"azure-cli"`
- `"managed-identity"`
Expand Down
7 changes: 4 additions & 3 deletions docs/docs/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,10 @@ const config = {
cacheOptions: {
/** @see https://www.npmjs.com/package/backfill#configuration */
cacheStorageConfig: {
// use this to specify a remote cache provider such as "azure-blob",
provider: "azure-blob",
// there are specific options here for each cache provider
// use this to specify a remote cache plugin such as "@lage-run/azure-blob-cache-storage",
provider: "custom",
plugin: "@lage-run/azure-blob-cache-storage",
// there are specific options here for each cache plugin
options: {}
},

Expand Down
41 changes: 41 additions & 0 deletions packages/azure-blob-cache-storage/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"name": "@lage-run/azure-blob-cache-storage",
"version": "0.1.0",
"description": "Azure Blob Storage cache plugin for backfill/lage",
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/microsoft/lage"
},
"homepage": "https://microsoft.github.io/lage/",
"main": "lib/index.js",
"types": "lib/index.d.ts",
"scripts": {
"build": "yarn types && yarn transpile",
"transpile": "monorepo-scripts transpile",
"types": "yarn run -T tsc",
"lint": "monorepo-scripts lint"
},
"dependencies": {
"@azure/core-auth": "1.9.0",
"@azure/identity": "4.9.1",
"@azure/storage-blob": "12.27.0",
"backfill-cache": "workspace:^",
"backfill-config": "workspace:^",
"backfill-logger": "workspace:^",
"fs-extra": "8.1.0",
"tar-fs": "2.1.4"
},
"devDependencies": {
"@lage-run/monorepo-scripts": "workspace:^",
"@types/fs-extra": "^8.0.0",
"@types/tar-fs": "^2.0.1"
},
"engines": {
"node": ">=14"
},
"files": [
"lib/!(__*)",
"lib/!(__*)/**"
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import type { AzureBlobCacheStorageOptions } from "backfill-config";

import { stat } from "fs-extra";
import type { ContainerClient } from "@azure/storage-blob";
import { CacheStorage } from "./CacheStorage.js";
import { CacheStorage } from "backfill-cache";

const ONE_MEGABYTE = 1024 * 1024;
const FOUR_MEGABYTES = 4 * ONE_MEGABYTE;
Expand All @@ -25,11 +25,7 @@ class TimeoutStream extends Transform {
this.destroy(new Error(message));
}, timeout);
}
public _transform(
chunk: any,
_encoding: BufferEncoding,
callback: TransformCallback
): void {
public _transform(chunk: any, _encoding: BufferEncoding, callback: TransformCallback): void {
clearTimeout(this.timeout);
this.push(chunk);
callback();
Expand All @@ -49,11 +45,7 @@ class SpongeStream extends Transform {
readableHighWaterMark: 1024 * 1024 * 1024 * 1024,
});
}
public _transform(
chunk: any,
_encoding: BufferEncoding,
callback: TransformCallback
): void {
public _transform(chunk: any, _encoding: BufferEncoding, callback: TransformCallback): void {
this.pause();
this.push(chunk);
callback();
Expand Down Expand Up @@ -81,8 +73,7 @@ export class AzureBlobCacheStorage extends CacheStorage {
super(logger, cwd, incrementalCaching);

if ("containerClient" in options) {
this.getContainerClient = () =>
Promise.resolve(options.containerClient as ContainerClient);
this.getContainerClient = () => Promise.resolve(options.containerClient as ContainerClient);
} else {
const { connectionString, container, credential } = options;
// This is delay loaded because it's very slow to parse
Expand All @@ -92,8 +83,7 @@ export class AzureBlobCacheStorage extends CacheStorage {
? new BlobServiceClient(connectionString, credential)
: BlobServiceClient.fromConnectionString(connectionString);

const containerClient =
blobServiceClient.getContainerClient(container);
const containerClient = blobServiceClient.getContainerClient(container);
return containerClient;
});
}
Expand All @@ -107,13 +97,8 @@ export class AzureBlobCacheStorage extends CacheStorage {
if (this.options.maxSize) {
const sizeResponse = await blobClient.getProperties();

if (
sizeResponse.contentLength &&
sizeResponse.contentLength > this.options.maxSize
) {
this.logger.verbose(
`A blob is too large to be downloaded: ${hash}, size: ${sizeResponse.contentLength} bytes`
);
if (sizeResponse.contentLength && sizeResponse.contentLength > this.options.maxSize) {
this.logger.verbose(`A blob is too large to be downloaded: ${hash}, size: ${sizeResponse.contentLength} bytes`);
return false;
}
}
Expand All @@ -129,25 +114,16 @@ export class AzureBlobCacheStorage extends CacheStorage {

const spongeStream = new SpongeStream();

const timeoutStream = new TimeoutStream(
10 * 60 * 1000,
`The fetch request to ${hash} seems to be hanging`
);
const timeoutStream = new TimeoutStream(10 * 60 * 1000, `The fetch request to ${hash} seems to be hanging`);

const extractionPipeline = new Promise<void>((resolve, reject) =>
pipeline(
blobReadableStream,
spongeStream,
timeoutStream,
tarWritableStream,
(err) => {
if (err) {
reject(err);
} else {
resolve();
}
pipeline(blobReadableStream, spongeStream, timeoutStream, tarWritableStream, (err) => {
if (err) {
reject(err);
} else {
resolve();
}
)
})
);

await extractionPipeline;
Expand Down Expand Up @@ -177,17 +153,11 @@ export class AzureBlobCacheStorage extends CacheStorage {
}

if (total > this.options.maxSize) {
this.logger.verbose(
`The output is too large to be uploaded: ${hash}, size: ${total} bytes`
);
this.logger.verbose(`The output is too large to be uploaded: ${hash}, size: ${total} bytes`);
return;
}
}

await blockBlobClient.uploadStream(
tarStream,
uploadOptions.bufferSize,
uploadOptions.maxBuffers
);
await blockBlobClient.uploadStream(tarStream, uploadOptions.bufferSize, uploadOptions.maxBuffers);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@ import {
EnvironmentCredential,
WorkloadIdentityCredential,
} from "@azure/identity";
import type { AzureCredentialName } from "@lage-run/config";

/**
* Allowed credential names matching camelCase of @azure/identity credential class names
* @see https://learn.microsoft.com/en-us/azure/developer/javascript/sdk/authentication/credential-chains
*/
export type AzureCredentialName = "environment" | "workload-identity" | "managed-identity" | "visual-studio-code" | "azure-cli";
Comment thread
ecraig12345 marked this conversation as resolved.

/**
* Exhaustive credential factory map keyed by AzureCredentialName.
Comment thread
ecraig12345 marked this conversation as resolved.
* This enforces compile-time alignment with the AzureCredentialName union and provides a single source of truth.
Expand Down
45 changes: 45 additions & 0 deletions packages/azure-blob-cache-storage/src/createPlugin.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import type { Logger } from "backfill-logger";
import type {
ICacheStorage,
CustomCacheStoragePlugin,
AzureBlobCacheStorageOptions,
AzureBlobCacheStorageConnectionStringOptions,
} from "backfill-config";

import { AzureBlobCacheStorage } from "./AzureBlobCacheStorage.js";
import { CredentialCache, type AzureCredentialName } from "./CredentialCache.js";

export type AzureBlobPluginOptions = AzureBlobCacheStorageOptions & {
/** Optional credential name for Azure Identity authentication. */
credentialName?: AzureCredentialName;
};

function isTokenConnectionString(connectionString: string) {
return connectionString.includes("SharedAccessSignature") || connectionString.includes("AccountKey");
}

const plugin: CustomCacheStoragePlugin<AzureBlobPluginOptions> = {
name: "azure-blob",
getProvider(logger: Logger, cwd: string, options: AzureBlobPluginOptions): ICacheStorage {
// Handle credential injection for connection-string-based options
if ("connectionString" in options && !isTokenConnectionString(options.connectionString)) {
const connStringOptions = options as AzureBlobCacheStorageConnectionStringOptions & { credentialName?: AzureCredentialName };
if (!connStringOptions.credential) {
const credName = connStringOptions.credentialName ?? (process.env.AZURE_IDENTITY_CREDENTIAL_NAME || undefined);

if (credName != null) {
if (!CredentialCache.credentialNames.includes(credName as AzureCredentialName)) {
throw new Error(`Invalid credentialName: "${credName}". Allowed values: ${CredentialCache.credentialNames.join(", ")}`);
}
connStringOptions.credential = CredentialCache.getInstance(credName as AzureCredentialName);
} else {
connStringOptions.credential = CredentialCache.getInstance();
}
}
}

return new AzureBlobCacheStorage(options, logger, cwd);
},
};

export default plugin;
4 changes: 4 additions & 0 deletions packages/azure-blob-cache-storage/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export type { AzureCredentialName } from "./CredentialCache.js";
export { CredentialCache } from "./CredentialCache.js";
export type { AzureBlobPluginOptions } from "./createPlugin.js";
export { default } from "./createPlugin.js";
Loading
Loading