Conversation
cf68d97 to
cdf6b48
Compare
| // The src from JS is a fully-resolved readium:// URL. | ||
| // Relativize it against the publication base URL to get a | ||
| // publication-relative href usable with Publication.get(). |
There was a problem hiding this comment.
The comment is a bit misleading because the HTML document might contain URLs to external resources with http://, or it might already be a relative URL. In any case the code looks correct as it falls back on src when relativize() fails.
| /// A navigation controller wrapper for image preview that owns the custom | ||
| /// transition and forwards `ImagePreviewTransitioning` to its top view | ||
| /// controller. | ||
| final class ImagePreviewNavigationController: UINavigationController { |
There was a problem hiding this comment.
I'm a bit on the fence regarding the Test App implementation.
As a heads-up, we're phasing out the Test App in favor of a new Swift Playground app. The Test App is not really maintained anymore.
Your implementation looks super nice but we want to keep the Test App / Playground really simple and the code straightforward to test and demonstrate the Readium APIs. Having nice animations and UX is in the scope of the application and not the toolkit. From experience, if we offer more advanced components in the Test App, integrators will copy-paste it and expect support when it breaks or doesn't fit their app, which is why we want to move away from a full-fledged reading test app to a more technical and simple playground.
I would be fine merging your implementation as-is, but it will be removed in the near future.
There was a problem hiding this comment.
For a concrete example of a more technical interface, we could display a sheet on top of the navigator (without fancy animations) which contains very raw data about the target element. For example displaying the image but without zooming capabilities. The idea is to keep the code simple so that the actual Readium APIs usage is easy to understand.
| /// Metadata about the element under the pointer, if available. | ||
| /// | ||
| /// This is typically provided by the EPUB navigator's JavaScript bridge | ||
| /// when the pointer is over a media element (img, svg, video, etc.). | ||
| public var targetElement: TargetElement? | ||
|
|
||
| /// Metadata about the DOM element under a pointer event, extracted from | ||
| /// the JavaScript layer. | ||
| public struct TargetElement: Equatable { | ||
| /// Tag name of the element (e.g. "img", "svg"). | ||
| public var tag: String | ||
|
|
||
| /// Source URL of the media element, if available. | ||
| public var src: String? | ||
|
|
||
| /// Alt text of the element, if available. | ||
| public var alt: String? | ||
|
|
||
| /// Frame of the element relative to the navigator's view. | ||
| public var frame: CGRect | ||
|
|
||
| public init(tag: String, src: String?, alt: String? = nil, frame: CGRect) { | ||
| self.tag = tag | ||
| self.src = src | ||
| self.alt = alt | ||
| self.frame = frame | ||
| } | ||
| } |
There was a problem hiding this comment.
There's a lot of potential for this API beyond image zooms. For example you could use it to select a sentence to trigger TTS, or a word to get the definition for.
This API must be usable with other navigators too (e.g. PDF) so it should not reference anything specific to HTML (tag or alt).
As any navigator might implement it, I think a protocol would be a better fit to allow for extensions. In particular, when we don't have match with a format-agnostic element like "ImageElement", the HTML navigator could return a specific HTMLElement that contains raw HTML tags.
We can get inspired by the existing ContentElement.
For this particular PR, it's fine if we only implement the ImageElement, as long as we're open for extensions.
Here's a draft as a suggestion:
public struct PointerEvent: Equatable {
...
public var target: (any Element)?
/// Manually implement Equatable because of `target`.
public static func == (lhs: PointerEvent, rhs: PointerEvent) -> Bool {
guard
lhs.pointer == rhs.pointer,
lhs.phase == rhs.phase,
lhs.location == rhs.location,
...
else {
return false
}
switch (lhs.target, rhs.target) {
case (nil, nil): return true
case let (l?, r?): return l.isEqualTo(r)
default: return false
}
}
public var target: (any Element)?
public var targetFrame: CGRect?
public protocol Element {
/// Frame of the element relative to the navigator's view.
var frame: CGRect
/// A `Locator` to the target element, so that the app can use navigator.go(to:) on this element.
/// In the EPUB navigator, this can be computed by generating the CSS selector for the element.
var locator: Locator
/// Returns whether the receiver is equivalent to `other`.
func isEqualTo(_ other: Element) -> Bool
}
public extension Element where Self: Equatable {
func isEqualTo(_ other: Element) -> Bool {
guard let other = other as? Self else { return false }
return self == other
}
}
/// An element referencing an embedded resource.
/// This is useful if the app has a generic handler for any type of embedded resource (image, audio, video, etc.).
public protocol EmbeddedResourceElement: ContentElement {
/// Referenced resource in the publication.
var embeddedResource: Link { get }
}
public struct ImageElement: EmbeddedResourceElement, Equatable {
public var frame: CGRect
public var locator: Locator
/// Link matching the `img.src` in the HTML, taken from the publication.readingOrder.
/// Useful as a `Link` instead of raw `AnyHREF` or `String` because the client app can
/// check its `mediaType` to see if it is supported.
public var embeddedResource: Link
/// Short piece of text associated with the image (e.g. `alt`).
public var caption: String?
}
}Let me know what you think!
There was a problem hiding this comment.
Thanks @mickael-menu for your suggestions. They definitely make sense to me.
I decided to explore whether we can reuse the existing ContentElement API.
After thoroughly reviewing the ContentElement API, I found that the only limitation I encountered was the frame attribute, which a visual element could have. If we add the target frame to the PointerEvent, it might be feasible to use the existing ContentElement API. In that case, we can reuse the existing ImageContentElement, AudioContentElement, VideoContentElement, and even TextualContentElement if necessary.
Here’s the updated PointerEvent structure:
public struct PointerEvent {
…
public var target: Target?
public struct Target {
public var frame: CGRect
public var element: any ContentElement
}
}Or just
public struct PointerEvent {
…
public var targetFrame: CGRect?
public var targetElement: (any ContentElement)?
}What are your thoughts on this?
5fa6529 to
c5008c1
Compare
c5008c1 to
7c82eb4
Compare
Summary
This pull request partially addresses the feature request submitted on readium/mobile#10.
This PR adds
TargetElementmetadata toPointerEvent, enabling apps to detect when the user interacts with an image element(<img>,<svg>) and access its frame, source URL, and alt text. The TestApp demonstrates this with a fullscreen image preview viewer.Navigator changes
extractTargetElement()ingestures.jsthat extracts metadata (boundingrect,tag,src,alt) from the nearest image element under the pointer. AddedfindNearestImageElement()to walk up the DOM tree.PointerEvent.TargetElement: New struct withtag,src,alt, andframeproperties, available on every -PointerEvent. The src URL is relativized against the publication base URL so it can be used directly withPublication.get().TestApp demo
ImagePreviewViewController— fullscreen image viewer with UIScrollView-based pinch-to-zoom (1x–4x).ImagePreviewTransition— custom spring-based animated transition from the image's in-page position to centered aspect-fit, and back on dismiss.ImagePreviewNavigationController— wraps the preview and owns the custom transitioning delegate.Wired via a
.tapobserver inVisualReaderViewController.Design decisions
<img>and<svg>) — Video and audio elements don't benefit from a zoom preview, so detection is limited to image tags to keep the implementation focused.InputObservingAPI, which can be used to detect double-tap, pinch, and other gestures. Rather than adding dedicated gesture events, this PR exposesTargetElementmetadata onPointerEvent, giving apps the building blocks to implement custom gesture handling on top of the existing API.Previews
iphone.mov
ipad.mov