Implement WIPI String API#155
Conversation
…ase ops Add missing java.lang.String API as documented in the WIPI 1.2.1 spec (see https://mirusu400.github.io/wipi-wiki/java-api/java/lang/String.md). Korean J2ME apps that target this profile frequently fail with NoSuchMethodError on these signatures, especially the charset-aware byte[] constructors used to decode EUC-KR network payloads. Constructors added: - String() - String(byte[], String charsetName) - String(byte[], int, int, String charsetName) Instance methods added: - endsWith(String) - equalsIgnoreCase(String) - getBytes(String charsetName) - lastIndexOf(int, int) - regionMatches(boolean, int, String, int, int) - replace(char, char) - toLowerCase() Static methods added: - valueOf(boolean), valueOf(long), valueOf(float), valueOf(double) - valueOf(char[]), valueOf(char[], int, int)
There was a problem hiding this comment.
Pull request overview
WIPI 1.2.1 스펙에 맞춰 RustJava의 java.lang.String에서 누락된 생성자/메서드 오버로드를 추가하고, 이에 대한 단위 테스트를 보강한 PR입니다.
Changes:
String()및 charset을 받는String(byte[], ...)생성자 오버로드 추가endsWith,equalsIgnoreCase,getBytes(String),regionMatches,replace,toLowerCase,lastIndexOf(int,int),valueOf(...)오버로드 등 API 추가- charset 이름 정규화(대소문자/
_/-) 및 일부 별칭 처리 로직 추가, 관련 테스트 일부 추가
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
java_runtime/src/classes/java/lang/string.rs |
WIPI 스펙 누락 String API(생성자/인스턴스/정적 메서드) 구현 및 charset 처리 확장 |
java_runtime/tests/classes/java/lang/test_string.rs |
신규/추가된 String API에 대한 단위 테스트 추가 |
Comments suppressed due to low confidence (10)
java_runtime/src/classes/java/lang/string.rs:756
decode_strusesstr::from_utf8(bytes).unwrap(), which will panic on invalid UTF-8 input. Since this constructor is reachable from user-provided byte arrays/charsets, it should return a Java-level decoding error (or perform replacement decoding) instead of crashing the VM.
fn decode_str(charset: &str, bytes: &[u8]) -> RustString {
match charset.to_ascii_uppercase().replace('_', "-").as_str() {
"UTF-8" | "UTF8" => str::from_utf8(bytes).unwrap().to_string(),
"EUC-KR" | "EUCKR" | "KS-C-5601-1987" | "MS949" | "CP949" => encoding_rs::EUC_KR.decode(bytes).0.to_string(),
java_runtime/src/classes/java/lang/string.rs:758
decode_strtreatsUS-ASCII/ASCIIthe same as Latin-1 by directly mapping every byte to the same code point. In Java, bytes >= 0x80 are unmappable in US-ASCII and should be replaced (or trigger an error depending on policy), so this currently produces incorrect strings for non-ASCII bytes.
match charset.to_ascii_uppercase().replace('_', "-").as_str() {
"UTF-8" | "UTF8" => str::from_utf8(bytes).unwrap().to_string(),
"EUC-KR" | "EUCKR" | "KS-C-5601-1987" | "MS949" | "CP949" => encoding_rs::EUC_KR.decode(bytes).0.to_string(),
"ISO-8859-1" | "LATIN1" | "US-ASCII" | "ASCII" => bytes.iter().map(|&b| b as char).collect(),
_ => unimplemented!("unsupported charset: {}", charset),
java_runtime/src/classes/java/lang/string.rs:767
encode_strforISO-8859-1/US-ASCIIusesc as u8, which truncates code points > 0xFF (and > 0x7F for ASCII) instead of replacing/handling unmappable characters. This can silently corrupt output; the encoder should implement proper mapping (e.g., replace with '?' or error) per charset.
fn encode_str(charset: &str, string: &str) -> Vec<u8> {
match charset.to_ascii_uppercase().replace('_', "-").as_str() {
"UTF-8" | "UTF8" => string.as_bytes().to_vec(),
"EUC-KR" | "EUCKR" | "KS-C-5601-1987" | "MS949" | "CP949" => encoding_rs::EUC_KR.encode(string).0.to_vec(),
"ISO-8859-1" | "LATIN1" | "US-ASCII" | "ASCII" => string.chars().map(|c| c as u8).collect(),
_ => unimplemented!("unsupported charset: {}", charset),
java_runtime/src/classes/java/lang/string.rs:768
- Both
decode_strandencode_struseunimplemented!("unsupported charset"), which will panic on unknown/typoed charset names now that charset is passed via public APIs. Please convert this into a Java exception (e.g.,java/io/UnsupportedEncodingException) or aResulterror instead of aborting execution.
fn decode_str(charset: &str, bytes: &[u8]) -> RustString {
match charset.to_ascii_uppercase().replace('_', "-").as_str() {
"UTF-8" | "UTF8" => str::from_utf8(bytes).unwrap().to_string(),
"EUC-KR" | "EUCKR" | "KS-C-5601-1987" | "MS949" | "CP949" => encoding_rs::EUC_KR.decode(bytes).0.to_string(),
"ISO-8859-1" | "LATIN1" | "US-ASCII" | "ASCII" => bytes.iter().map(|&b| b as char).collect(),
_ => unimplemented!("unsupported charset: {}", charset),
}
}
fn encode_str(charset: &str, string: &str) -> Vec<u8> {
match charset.to_ascii_uppercase().replace('_', "-").as_str() {
"UTF-8" | "UTF8" => string.as_bytes().to_vec(),
"EUC-KR" | "EUCKR" | "KS-C-5601-1987" | "MS949" | "CP949" => encoding_rs::EUC_KR.encode(string).0.to_vec(),
"ISO-8859-1" | "LATIN1" | "US-ASCII" | "ASCII" => string.chars().map(|c| c as u8).collect(),
_ => unimplemented!("unsupported charset: {}", charset),
}
java_runtime/src/classes/java/lang/string.rs:589
get_bytes_charsetdereferencescharset_namewithout checking for null; passing null will currently panic viaJavaLangString::to_rust_string. Java expects aNullPointerExceptionhere, so this should be handled explicitly.
async fn get_bytes_charset(
jvm: &Jvm,
_: &mut RuntimeContext,
this: ClassInstanceRef<Self>,
charset_name: ClassInstanceRef<Self>,
) -> Result<ClassInstanceRef<Array<i8>>> {
tracing::debug!("java.lang.String::getBytes({:?}, {:?})", &this, &charset_name);
let string = JavaLangString::to_rust_string(jvm, &this).await?;
let charset = JavaLangString::to_rust_string(jvm, &charset_name).await?;
let bytes = cast_vec(Self::encode_str(&charset, &string));
java_runtime/src/classes/java/lang/string.rs:656
region_matchesdereferencesotherwithout checking for null (to_rust_stringunwraps). In Java,regionMatches(..., null, ...)throwsNullPointerException; the current implementation will panic instead.
let this_string = JavaLangString::to_rust_string(jvm, &this).await?;
let other_string = JavaLangString::to_rust_string(jvm, &other).await?;
java_runtime/src/classes/java/lang/string.rs:703
ends_withdoes not handlesuffix == null.JavaLangString::to_rust_stringwill unwrap and panic, but Java’sString.endsWith(null)should throwNullPointerException. Add an explicit null check and raise the correct Java exception.
async fn ends_with(jvm: &Jvm, _: &mut RuntimeContext, this: ClassInstanceRef<Self>, suffix: ClassInstanceRef<Self>) -> Result<bool> {
tracing::debug!("java.lang.String::endsWith({:?}, {:?})", &this, &suffix);
let this_string = JavaLangString::to_rust_string(jvm, &this).await?;
let suffix_string = JavaLangString::to_rust_string(jvm, &suffix).await?;
Ok(this_string.ends_with(&suffix_string))
java_runtime/src/classes/java/lang/string.rs:694
last_index_of_fromcomputes indices over Rustchars()(Unicode scalar values), but JavaString.lastIndexOf(int, int)is specified in terms of UTF-16 code units (and supports searching for code points > 0xFFFF via surrogate pairs). This will return different indices for strings containing supplementary characters.
let this_string = JavaLangString::to_rust_string(jvm, &this).await?;
let chars: Vec<char> = this_string.chars().collect();
let end = (from_index as usize + 1).min(chars.len());
let index = chars[..end].iter().rposition(|&c| c as u32 == ch as u32).map(|x| x as i32);
Ok(index.unwrap_or(-1))
java_runtime/tests/classes/java/lang/test_string.rs:327
- Missing test coverage for the new charset-related APIs: constructors
String(byte[], int, int, String)andString(byte[], String)are only partially covered, and there is no test forString.getBytes(String charsetName)(including charset alias normalization likeEUC_KR). Add tests to validate behavior and prevent regressions.
#[tokio::test]
async fn test_init_byte_array_charset() -> Result<()> {
let jvm = test_jvm().await?;
let bytes = vec![b'H' as i8, b'i' as i8, b'!' as i8];
let mut array = jvm.instantiate_array("B", 3).await?;
jvm.store_array(&mut array, 0, bytes).await?;
let charset = JavaLangString::from_rust_string(&jvm, "UTF-8").await?;
let string = jvm.new_class("java/lang/String", "([BLjava/lang/String;)V", (array, charset)).await?;
assert_eq!(JavaLangString::to_rust_string(&jvm, &string).await?, "Hi!");
java_runtime/src/classes/java/lang/string.rs:729
valueOf(float)/valueOf(double)currently delegate to Rust’sto_string(), whose formatting does not match Java’sFloat.toString/Double.toString(e.g., Java prints1.0while Rust prints1). This will break string representations expected by Java/WIPI code; please implement Java-compatible float formatting.
async fn value_of_float(jvm: &Jvm, _: &mut RuntimeContext, value: f32) -> Result<ClassInstanceRef<Self>> {
tracing::debug!("java.lang.String::valueOf({})", value);
Ok(JavaLangString::from_rust_string(jvm, &value.to_string()).await?.into())
}
async fn value_of_double(jvm: &Jvm, _: &mut RuntimeContext, value: f64) -> Result<ClassInstanceRef<Self>> {
tracing::debug!("java.lang.String::valueOf({})", value);
Ok(JavaLangString::from_rust_string(jvm, &value.to_string()).await?.into())
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Head branch was pushed to by a user without write access
요약
java.lang.StringAPI 중 RustJava에 누락된 메서드들을 추가했습니다. (레퍼런스 참고)java_runtime/tests/classes/java/lang/test_string.rs에 단위 테스트 작성했습니다생성자
String()String(byte[] bytes, String charsetName)String(byte[] bytes, int offset, int length, String charsetName)인스턴스 메서드
endsWith(String)equalsIgnoreCase(String)getBytes(String charsetName)lastIndexOf(int ch, int fromIndex)regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)replace(char oldChar, char newChar)toLowerCase()정적 메서드
valueOf(boolean),valueOf(long),valueOf(float),valueOf(double)valueOf(char[]),valueOf(char[], int offset, int count)비고
decode_str,encode_str같은 함수들은EUC-KREUC_KREUCKR 등 여러 방식으로 넘겨줘서 최대한 호환있게 바꿨습니다 (게임 예: KTF 전설의 마법학교2 -EUC_KR` 사용)