Rust正式入坑 2021-02-15 19:41:51 从14年开始一直在关注Rust的发展,只是这几年都在大数据领域摸索,工作中基本上以Scala、Python、Go和JS这些语言为主,一直没有在实际项目中使用。 最近在做更为底层的系统应用,刚好Rust安全、高性能的特点非常契合我们的场景,于是又对Rust进行了一遍系统性的学习,这门语言有不少特殊的地方,我结合自己以往的经验,将Rust学习过程中的一些知识点做了整理记录,主要便于自己加强学习。 整体学习下来的感受是,Rust从语法、工具链、注释(即是文档又是测试)等现代化设计,丰富的文档、详尽的编译错误提示以及完善的社区生态,让我为之兴奋。 我比较在意开发效率及软件的极致性能,Rust从这点上非常贴合我的需求。 ## 工具链 - rustup - [cargo](https://github.com/rust-lang/cargo/wiki/Third-party-cargo-subcommands), cargo-watch, cargo-tarpaulin, cargo-expand,cargo-edit,cargo-udeps - rustc - clippy - rustfmt ## 第三方库 - neon - pest - config - tokio - clap, tui-rs, colored - chrono, itoa, dtoa - Rayon: a data-parallelism library for Rust - Tantivy, Toshi, sonic, MeiliSearch - byteorder, bincode - thiserror, anyhow - criterion - tauri,yew ## Crate包 有一个比较好的官方替代:<https://lib.rs/> 国内下载加速: ``` tee $HOME/.cargo/config <<-'EOF' [source.crates-io] replace-with = 'ustc' [source.ustc] registry = "git://mirrors.ustc.edu.cn/crates.io-index" EOF ``` ## 学习资料 Rust语言,学习进阶。 - `rustup doc` - [Safe Systems Programming in Rust](https://people.mpi-sws.org/~dreyer/papers/safe-sysprog-rust/paper.pdf) - [Learn Rust by writing Entirely Too Many Linked Lists](https://rust-unofficial.github.io/too-many-lists/) - [The Rust Performance Book](https://github.com/nnethercote/perf-book) - [Rust Design Patterns](https://rust-unofficial.github.io/patterns/intro.html) - [Frequently Asked Questions](https://github.com/dtolnay/rust-faq) - [Rust by example](https://doc.rust-lang.org/stable/rust-by-example/) - [rustlings](https://github.com/rust-lang/rustlings):Small exercises to get you used to reading and writing Rust code! - <https://rust-lang-nursery.github.io/rust-cookbook/> - <https://doc.rust-lang.org/book/> - <https://github.com/ctjhoa/rust-learning> - <https://cheats.rs/> - [Rust语言圣经](https://course.rs/into-rust.html) - <https://docs.rs/> - [Rust Programming Language: The Ultimate Guide](https://masteringbackend.com/posts/rust-programming-the-ultimate-guide) - <https://cfsamson.github.io/books-futures-explained/> - <https://fasterthanli.me/articles/a-half-hour-to-learn-rust> - <https://dhghomon.github.io/easy_rust/> - [Rust Programming Language Tutorial – How to Build a To-Do List App](https://www.freecodecamp.org/news/how-to-build-a-to-do-app-with-rust/) - [Command line apps in Rust](https://rust-cli.github.io/book/) - [Asynchronous Programming in Rust](https://rust-lang.github.io/async-book/) - [Easy Rust](https://github.com/Dhghomon/easy_rust) - [Tour of Rust's Standard Library Traits](https://github.com/pretzelhammer/rust-blog/blob/master/posts/tour-of-rusts-standard-library-traits.md) ## 社区 - <https://rustcc.cn/> - <https://this-week-in-rust.org/> - <https://www.ralfj.de/projects/rust-101/main.html> - <https://dev.to/t/rust> ## 数据库相关项目 - [TiKV](https://github.com/tikv/tikv): 分布式KV数据库,已经实现了事务,支撑了TiDB的实现 - [IndraDB](https://github.com/indradb/indradb): 单机版图数据库 - [Oxigraph](https://github.com/oxigraph/oxigraph): SPARQL graph database - [toydb](https://github.com/erikgrinaker/toydb): 分布式SQL数据库 - [Materialize](https://github.com/MaterializeInc/materialize): 基于Timely Dataflow的SQL流式处理平台 - [DataFusion](https://github.com/apache/arrow/tree/master/rust/datafusion): an extensible query execution framework,supports both an SQL and a DataFrame API ## 语法 ### 常规概念 变量默认不可变,和常量的区别?常量无法使用mut关键字;const和let声明关键字不同;声明的作用域不一样;常量不能使用函数返回定义; 变量shadow,`let spaces = " "; let spaces = spaces.len();` 数值类型的选择:So how do you know which type of integer to use? If you’re unsure, Rust’s defaults are generally good choices, and integer types default to i32: this type is generally the fastest, even on 64-bit systems. The primary situation in which you’d use isize or usize is when indexing some sort of collection. Rust’s char type is four bytes in size and represents a Unicode Scalar Value Arrays in Rust are different from arrays in some other languages because arrays in Rust have a fixed length, like tuples. 数组定义: ``` let a = [1, 2, 3, 4, 5]; let a: [i32; 5] = [1, 2, 3, 4, 5]; let a = [3; 5]; // 等同于:let a = [3, 3, 3, 3, 3]; ``` 语句和表达式的区别:Statements are instructions that perform some action and do not return a value. Expressions evaluate to a resulting value. 一个表达式:`x + 1`一个语句`x + 1;` 函数的返回不一定要加return关键字,Rust默认将最后一个表达式的值返回,但最后一个不能是语句,示例: ``` fn plus_one(x: i32) -> i32 { x + 1 } ``` if表达式`let number = if condition { 5 } else { 6 };`,if的判断条件必须是bool类型 ### 类型推断 The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types. 简单点: ``` fn main() { let mut things = vec![]; things.push("thing"); } ``` 更复杂的: ``` // addr: SocketAddr let addr = "[::1]:21021".parse()?; // "[::1]:21021".parse::<SocketAddr>()? Server::builder() .serve(addr) // 这里,基于函数的类型参数进行推断 .await?; ``` 参考:<https://rustc-dev-guide.rust-lang.org/type-inference.html> ### Ownership所有权 一般程序管理内存使用GC或手动管理,Rust则使用所有权系统,通过一系列规则,在编译阶段进行检查。 All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead. Pushing to the stack is faster than allocating on the heap because the allocator never has to search for a place to store new data; that location is always at the top of the stack. Accessing data in the heap is slower than accessing data on the stack because you have to follow a pointer to get there. Keeping track of what parts of code are using what data on the heap, minimizing the amount of duplicate data on the heap, and cleaning up unused data on the heap so you don’t run out of space are all problems that ownership addresses. Once you understand ownership, you won’t need to think about the stack and the heap very often, but knowing that managing heap data is why ownership exists can help explain why it works the way it does. let some_u8_value = Some(0u8); match some_u8_value { Some(3) => println!("three"), _ => (), } if let Some(3) = some_u8_value { println!("three"); } ![](/api/file/getImage?fileId=6012678366f3b3215d0007fb) ### References and borrowing But mutable references have one big restriction: you can have only one mutable reference to a particular piece of data in a particular scope. A similar rule exists for combining mutable and immutable references. - At any given time, you can have either one mutable reference or any number of immutable references. - References must always be valid. `ref`关键字: ``` let maybe_name = Some(String::from("Alice")); // Using `ref`, the value is borrowed, not moved ... match maybe_name { Some(ref n) => println!("Hello, {}", n), _ => println!("Hello, world"), } // ... so it's available here! println!("Hello again, {}", maybe_name.unwrap_or("world".into())); ``` `ref` vs. `&` - `&` denotes that your pattern expects a reference to an object. Hence & is a part of said pattern: &Foo matches different objects than Foo does. - `ref` indicates that you want a reference to an unpacked value. It is not matched against: Foo(ref foo) matches the same objects as Foo(foo). ### Ranges The `..` syntax is just range literals. Ranges are just a few structs defined in the standard library. ``` fn main() { // 0 or greater println!("{:?}", (0..).contains(&100)); // true // strictly less than 20 println!("{:?}", (..20).contains(&20)); // false // 20 or less than 20 println!("{:?}", (..=20).contains(&20)); // true // only 3, 4, 5 println!("{:?}", (3..6).contains(&4)); // true } ``` ### Enum ``` enum Message { Quit, Move { x: i32, y: i32 }, Write(String), ChangeColor(i32, i32, i32), } ``` Rust doesn’t have the null feature that many other languages have. Null is a value that means there is no value there. 可以像结构体那样为枚举添加方法实现。 枚举用于模式匹配的场景比较多。 ### Package A package must contain zero or one library crates, and no more. It can contain as many binary crates as you’d like, but it must contain at least one crate (either library or binary). ``` // when bringing in structs, enums, and other items with use, it’s idiomatic to specify the full path. use std::collections::HashMap; // use nested paths to bring the same items into scope in one line use std::io::{self, Write}; // Re-exporting Names with pub use pub use crate::front_of_house::hosting; // The glob operator is often used when testing to bring everything under test into the tests module; use std::collections::*; ``` ### HashMap Rust内置的HashMap安全性考虑(预防DoS)性能不是最优的,可以考虑第三方实现了BuildHasher的库。 ``` use std::collections::HashMap; let text = "hello world wonderful world"; let mut map = HashMap::new(); for word in text.split_whitespace() { let count = map.entry(word).or_insert(0); *count += 1; } println!("{:?}", map); ``` ### Error Handing 如果函数有处理失败的情况,通常会返回`Result`类型。 ``` fn main() { let s = std::str::from_utf8(&[240, 159, 141, 137]); println!("{:?}", s); // prints: Ok("🍉") let s = std::str::from_utf8(&[195, 40]); println!("{:?}", s); // prints: Err(Utf8Error { valid_up_to: 0, error_len: Some(1) }) } ``` 可以使用`.unwrap()`触发panic处理 ``` fn main() { let s = std::str::from_utf8(&[240, 159, 141, 137]).unwrap(); println!("{:?}", s); // prints: "🍉" let s = std::str::from_utf8(&[195, 40]).unwrap(); // Or .expect(), for a custom message // prints: thread 'main' panicked at 'called `Result::unwrap()` // on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', // src/libcore/result.rs:1165:5 } ``` 使用`match`处理异常 ``` fn main() { match std::str::from_utf8(&[240, 159, 141, 137]) { Ok(s) => println!("{}", s), Err(e) => panic!(e), } // prints 🍉 } ``` 使用`if let` ``` fn main() { if let Ok(s) = std::str::from_utf8(&[240, 159, 141, 137]) { println!("{}", s); } // prints 🍉 } ``` 使用`?`简化异常处理: ``` use std::fs::File; use std::io; use std::io::Read; fn read_username_from_file() -> Result<String, io::Error> { let mut s = String::new(); File::open("hello.txt")?.read_to_string(&mut s)?; Ok(s) } ``` the `?` operator called on them go through the `from` function, defined in the `From` trait in the standard library, which is used to convert errors from one type into another. ### Generic types ``` struct Point<T> { x: T, y: T, } impl<T> Point<T> { fn x(&self) -> &T { &self.x } } // 单独为f32类型实现的一些方法,而其他类型无法调用该方法 impl Point<f32> { fn distance_from_origin(&self) -> f32 { (self.x.powi(2) + self.y.powi(2)).sqrt() } } struct Point<T, U> { x: T, y: U, } impl<T, U> Point<T, U> { fn mixup<V, W>(self, other: Point<V, W>) -> Point<T, W> { Point { x: self.x, y: other.y, } } } ``` 泛型不会影响性能:Rust accomplishes this by performing monomorphization of the code that is using generics at compile time. 反过来这也导致Rust编译速度变慢。 ### Trait 类似其他语言的接口,可以实现默认的方法,trait的实现仅限于local的crate中(例如自定义的trait不能为Vec实现方法),这是为了保证coherence,避免不同crate的实现冲突。 ``` pub trait Summary { fn summarize_author(&self) -> String; fn summarize(&self) -> String { format!("(Read more from {}...)", self.summarize_author()) } } impl Summary for Tweet { fn summarize_author(&self) -> String { format!("@{}", self.username) } } // use traits to define functions that accept many different types pub fn notify(item: &impl Summary) { println!("Breaking news! {}", item.summarize()); } // 限定同时实现多个trait pub fn notify(item: &(impl Summary + Display)) { // 等同于上面的写法(impl Trait可以看作是泛型的一种语法糖) pub fn notify<T: Summary + Display>(item: &T) { // 这种写法没问题,但导致可读性比较差 fn some_function<T: Display + Clone, U: Clone + Debug>(t: &T, u: &U) -> i32 { // Rust提供了一种where的写法,解决泛型中多个组合trait可读性差的问题 fn some_function<T, U>(t: &T, u: &U) -> i32 where T: Display + Clone, U: Clone + Debug { // Rust可以为任意类型实现trait,下面是标准库的一个定义 impl<T: Display> ToString for T { // --snip-- } // ?Sized表示T可以实现了Sized,也可以没有实现 unsafe impl<T: ?Sized + Sync + Send> Send for Arc<T> {} // 1. All generic types get an implicit Sized bound. // 2. There's an implicit ?Sized bound on all traits ``` Using Trait Bounds to Conditionally Implement Methods ``` use std::fmt::Display; struct Pair<T> { x: T, y: T, } impl<T> Pair<T> { fn new(x: T, y: T) -> Self { Self { x, y } } } // 为泛型实现方法是可以限定trait // 下面这个实现表示,只有同时实现了Display、PartialOrd两个trait的类型T,才会为其实现cmp_display方法 impl<T: Display + PartialOrd> Pair<T> { fn cmp_display(&self) { if self.x >= self.y { println!("The largest member is x = {}", self.x); } else { println!("The largest member is y = {}", self.y); } } } ``` Trait方法调用的本质:When invoking trait methods, the receiver is borrowed implicitly。 ``` impl std::clone::Clone for Number { fn clone(&self) -> Self { Self { ..*self } } } let n = Number { odd: true, value: 51 }; let m = n.clone(); // 等价于下面的代码 let m = std::clone::Clone::clone(&n); ``` Marker traits,如Copy是没有任何方法的Trait. Marker traits are traits that have no trait items. Their job is to "mark" the implementing type as having some property which is otherwise not possible to represent using the type system. ``` impl std::marker::Copy for Number {} // f64 impls PartialEq but not Eq because NaN != NaN // i32 impls PartialEq & Eq because there's no NaNs :) ``` Trait items cannot be used unless the trait is in scope. ### Lifetime The main aim of lifetimes is to prevent dangling references, which cause a program to reference data other than the data it’s intended to reference. (other than除了; 不同,不同于,不; 绝不是) lifetime是针对引用的一个概念,Rust编译器不能理解代码的所有处理逻辑,无法帮我们对引用的生命周期做合理的判断,因此当Rust编译器不能推测lifetime时(Rust 1.0以前,函数或方法的引用类的参数是都需要指定lifetime的),就需要我们自己指定了。 在语法上lifetime和泛型的写法一样,都是写到尖括号总,一个典型示例如下: ``` use std::fmt::Display; // 这里有2个引用参数,返回值也是个引用,Rust编译器无法确认返回的引用和x、y有啥关系 // 我们定义函数时,我们不知道传入的具体参数,所以也不能确定内部的if\else逻辑改如何执行,我们也不知道引用参数的声明周期 // 所以Rust的Borrow checker更不确定了,为了避免悬空(dangling)指针,就必须指定lifetime协助Rust编译器处理了。 // 由于Rust会检查所有变量,在使用前必须赋值,也就没有了null值这一说,因此也不存在什么野指针(wild pointer)了。 fn longest_with_an_announcement<'a, T>( x: &'a str, y: &'a str, ann: T, ) -> &'a str where T: Display, { println!("Announcement! {}", ann); if x.len() > y.len() { x } else { y } } ``` We’ve told Rust that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in. Ultimately, lifetime syntax is about connecting the lifetimes of various parameters and return values of functions. Rust编译目前有3个规则来判断引用的lifetime是否被明确标注了: 1. The first rule is that each parameter that is a reference gets its own lifetime parameter. 定义了多少个引用参数就有多少个对应的lifetime parameters,如:`fn foo<'a, 'b>(x: &'a i32, y: &'b i32)` 2. The second rule is if there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: `fn foo<'a>(x: &'a i32) -> &'a i32`. 3. The third rule is if there are multiple input lifetime parameters, but one of them is `&self` or `&mut self` because this is a method, the lifetime of `self` is assigned to all output lifetime parameters. Structs can also be generic over lifetimes, which allows them to hold references: ``` struct NumRef<'a> { x: &'a i32, } // 实现中lifetime的标记可以简化掉,impl<'a> NumRef<'a>,简化为: impl NumRef<'_> { fn as_i32_ref(&self) -> &i32 { self.x } } fn as_num_ref(x: &i32) -> NumRef<'_> { NumRef { x: &x } } fn main() { let x: i32 = 99; let x_ref = as_num_ref(&x); // `x_ref` cannot outlive `x`, etc. } ``` 关于`'static` ``` fn tail<'a>(s: &'a [u8]) -> &'a [u8] { &s[1..] } fn main() { let y = { // x is a 'static array let x = &[1, 2, 3, 4, 5]; tail(x) }; println!("y = {:?}", y); // x是一个'static array,所以这里的y仍然能继续使用,没有问题 } // 另一个case fn main() { let y = { // a vector is heap-allocated, and it has a non-'static lifetime let v = vec![1, 2, 3, 4, 5]; tail(&v) // error: `v` does not live long enough }; println!("y = {:?}", y); } ``` ### Closure Closures are just functions of type `Fn`, `FnMut` or `FnOnce` with some captured context. `FnMut` exists because some closures *mutably borrow* local variables: ``` fn foobar<F>(mut f: F) where F: FnMut(i32) -> i32 { let tmp = f(2); println!("{}", f(tmp)); } fn main() { let mut acc = 2; foobar(|x| { acc += 1; x * acc }); } // output: 24 ``` ### Tests ``` #[test] #[ignore] #[should_panic(expected = "Guess value must be less than or equal to 100")] panic! assert! assert_eq! assert_ne! cargo test -- --test-threads=1 cargo test -- --show-output // 仅执行包含add关键字的用例 cargo test add ``` Rust的测试分为2大类,unit tests and integration tests. `#[cfg(test)]`作用是告诉Rust编译器,仅执行`cargo test`编译测试模块代码,而执行`cargo build`是不会编译的。对于集成测试(和src平级的tests目录中的测试用例,是不需要添加#[cfg(test)]的,因为Cargo会特殊对待tests目录)。 集成测试仅针对library crate,对于binary crate无效: > If the important functionality works, the small amount of code in the src/main.rs file will work as well, and that small amount of code doesn’t need to be tested. ### Smart Pointers 和引用的区别: - 一种数据结构:smart pointers defined in the standard library provide functionality beyond that provided by references. - references are pointers that only borrow data - in many cases, smart pointers own the data they point to,`String`或`Vec<T>`也是智能指针的一种 Implicit Deref Coercions with Functions and Methods。Deref coercion is a convenience that Rust performs on arguments to functions and methods. Deref coercion works only on types that implement the Deref trait. For example, deref coercion can convert `&String` to `&str` because `String` implements the `Deref` trait such that it returns `str`. 隐式Deref强制转换在Rust里可以被连续调用多次,比如下面这个示例: ``` use std::ops::Deref; impl<T> Deref for MyBox<T> { type Target = T; fn deref(&self) -> &T { &self.0 } } fn hello(name: &str) { println!("Hello, {}!", name); } fn main() { let m = MyBox::new(String::from("Rust")); hello(&m); } ``` Rust在编译期间会为隐式Deref强制转换生成实际的调用的代码,所以不会在运行期间带来负面作用。 Rust的智能指针不像其他语言,使用后必须调用free方法,编译器会自动帮忙插入一段代码用于释放内存。 Rust通常使用`Rc::clone`增加引用计数,而不是使用`a.clone()`这样的方式,可以避免深拷贝,另外也比较直观的区分便于排查性能。 ### Concurrency ``` use std::thread; fn main() { let v = vec![1, 2, 3]; let handle = thread::spawn(move || { println!("Here's a vector: {:?}", v); }); handle.join().unwrap(); } ``` Rust infers how to capture `v`, and because `println!` only needs a reference to `v`, the closure tries to borrow `v`. However, there’s a problem: Rust can’t tell how long the spawned thread will run, so it doesn’t know if the reference to `v` will always be valid. ### Macro 宏的作用:metaprogramming,macros are a way of writing code that writes other code. 通过写Rust代码来生成Rust代码。 Rust不支持反射,无法在运行期获取类型信息,因此需要借助宏,在编译器生成代码(这也是编译慢的原因之一)。 Rust的宏定义有两大类,使用`macro_rules!`的*declarative* macros,和*procedural* macros. 过程宏又分为3种: - Custom `#[derive]` macros that specify code added with the `derive` attribute used on structs and enums - Attribute-like macros that define custom attributes usable on any item - Function-like macros that look like function calls but operate on the tokens specified as their argument Custom `derive` Macro ``` #[proc_macro_derive(HelloMacro)] pub fn hello_macro_derive(input: TokenStream) -> TokenStream { ``` Attribute-like macros ``` #[proc_macro_attribute] pub fn route(attr: TokenStream, item: TokenStream) -> TokenStream { ``` Function-like macros ``` #[proc_macro] pub fn sql(input: TokenStream) -> TokenStream { ``` ## 其他 ### Cargo - `cargo fix`: Automatically fix lint warnings reported by rustc - `cargo fmt`: 基于rustfmt对代码进行格式化处理 - `cargo bench`: 执行benchmark,可以是第三方的benchmark,rust支持bench hook - `cargo +nightly build -Z timings `: 输出编译时间报告 ### 交叉编译相关 - <https://colobu.com/2019/12/18/How-to-Cross-Compile-from-Mac-to-Linux-on-Rust/> ### 编译耗时问题 Rust 的开发人员越习惯于跨多个分支开发他们的 Rust 项目,在构建之间切换上下文,就越不需要考虑编译时间。 Rust编译慢的问题:<https://www.imooc.com/article/301395> 非特殊说明,均为原创,原创文章,未经允许谢绝转载。 原始链接:Rust正式入坑 赏 Prev 对Rust生命周期的常见误解 Next 一条命令解决GitHub代码clone极慢的问题