rust学习-string

介绍

A UTF-8–encoded, growable string（可增长字符串）.
拥有string内容的所有权

A String is made up of three components: a pointer to some bytes, a length, and a capacity.
The length is the number of bytes currently stored in the buffer

pub fn as_bytes(&self) -> &[u8]

字符串的内容转为byte slice

fn main() {
   
	let s = String::from("我的世界123abc");
	print!("{:?}\n", s.as_bytes());
}

pub fn as_str(&self) -> &str

将 String 转换为一个不可变引用 &str

fn main() {
   
    let s = String::from("hello");

    let s_ref: &str = s.as_str();

    println!("{}", s_ref);
}

在上述示例中，定义一个字符串 s，然后使用 as_str() 方法将其转换为一个不可变引用 &str

注意
as_str() 方法返回的是对原始 String 中的数据的引用，并不会对字符串的所有权进行转移。
这意味着在转换后，原始的 String 对象仍然可以使用。
但是，要保证转换后的 &str 引用在原始 String 存在时有效，以防止访问无效的数据。

（这个名字起的真好，as，作为，他既是一个老师，也是一个舞蹈家）

pub fn into_boxed_str(self) -> Box<str, Global>

将 String 转换为 Box<str, Global> 类型。Box<str, Global> 是一个堆分配的 str 的智能指针。

fn main() {
   
    let s = String::from("hello");

    let boxed_str: Box<str> = s.into_boxed_str();

    println!("{}", boxed_str);
}

通过将 String 转换为 Box，将字符串的所有权转移到了堆上的一个固定大小的箱子（box）中。
注意，该转换可能会涉及堆分配的内存，因此请确保进行必要的内存管理和清理。

pub fn into_bytes(self) -> Vec<u8, Global>

String -> byte vector

会消耗字符串，因此不需要复制内

fn main() {
   
	// move occurs because `s` has type `String`, which does not implement the `Copy` trait
	let s = String::from("hello");
	// `s` moved due to this method call
	let bytes = s.into_bytes();

	assert_eq!(&[104, 101, 108, 108, 111][..], &bytes[..]);
	// 编译失败，value borrowed here after move
	print!("{}", s);
}

pub fn into_raw_parts(self) -> (*mut u8, usize, usize)

a nightly-only experimental API
将字符串分解为其原始组件，String -> raw components

返回指向基础数据的原始指针、字符串的长度（以字节为单位）以及数据的分配容量（以字节为单位）。
Returns the raw pointer to the underlying data, the length of the string (in bytes), and the allocated capacity of the data (in bytes)
这些参数与 from_raw_parts 的参数的顺序相同。

调用此函数后，调用者负责之前由 String 管理的内存。
执行此操作的唯一方法是使用 from_raw_parts 函数将原始指针、长度和容量转换回字符串，从而允许析构函数执行清理。
Decomposes a String into its raw components.

#![feature(vec_into_raw_parts)]
let s = String::from("hello");

let (ptr, len, cap) = s.into_raw_parts();

let rebuilt = unsafe {
    String::from_raw_parts(ptr, len, cap) };
assert_eq!(rebuilt, "hello");

pub fn len(&self) -> usize

返回此字符串的长度（以字节为单位），而不是字符或字素（not chars or graphemes）。
换句话说，它可能不是人类所认为的字符串长度。

let a = String::from("foo");
assert_eq!(a.len(), 3);

let fancy_f = String::from("ƒoo");
assert_eq!(fancy_f.len(), 4);
assert_eq!(fancy_f.chars().count(), 3);

pub fn from_utf8(vec: Vec<u8, Global>) -> Result<String, FromUtf8Error>

A string (String) is made of bytes (u8)
A vector of bytes (Vec) is made of bytes
Converts a vector of bytes to a String.

如果确定byte slice是有效的 UTF-8，并且不想承担有效性检查的开销
则该函数有一个不安全版本 from_utf8_unchecked，它具有相同的行为但跳过检查。

该方法的逆方法是 into_bytes
为了提高效率，from_utf8方法不复制向量。
如果需要 &str 而不是 String，请考虑 str::from_utf8。

错误
如果切片不是 UTF-8，则返回 Err，并说明为什么提供的字节不是 UTF-8

注意：
String requires that it is valid UTF-8. from_utf8() checks to ensure that the bytes are valid UTF-8
And then does the conversion.

pub fn from_utf8_lossy(v: &[u8]) -> Cow<'_, str>

它接受一个 &[u8] 类型的字节数组 v，并尝试将其解析为一个 str 类型
内部使用了 Cow（Clone on Write）类型来处理不同的情况

fn main() {
   
    let bytes = b"Hello, \xF0\x9F\x8C\x8E";
    let s = String::from_utf8_lossy(bytes);
    println!("{}", s); // Hello, ?
}

有一个字节数组 bytes，其中包含一些 Unicode 字符。将该字节数组传递给 from_utf8_lossy

当字节数组包含有效的 UTF-8 字符时，from_utf8_lossy 函数会解析它们并返回一个具有借用/拥有（Owned）字符串的 Cow::Borrowed 或 Cow::Owned。它们是 Cow 类型的两个变体，允许在需要时选择最合适的字符串表示。

在处理非 UTF-8 数据时，from_utf8_lossy 函数会用 U+FFFD REPLACEMENT CHARACTER 替换不合法的 UTF-8 序列，并返回一个包含替代字符的字符串。

from_utf8_lossy 并不是一个严格的 UTF-8 解析器，它在不符合规范的情况下会进行一些补救措施
如果需要更严格的 UTF-8 解析，可以使用 std::str::from_utf8 函数文章来源地址https://uudwc.com/A/4rnx3

大小和索引问题

fn main() {
   
	use std::mem;

	// `s` is ASCII which represents each `char` as one byte
	let s = "hello";
	assert_eq!(s.len(),