2017 年,PWA 已经成为 web 应用新的风潮。我们决定试试,以我们现有的“Vue + 多页”的架构,能在升级 PWA 的道路上走多远,达到怎样的效果。
实现 “PRPL” 模式
“PRPL”(读作 “purple”)是 Google 的工程师提出的一种 web 应用架构模式,它旨在利用现代 web 平台的新技术以大幅优化移动 web 的性能与体验,对如何组织与设计高性能的 PWA 系统提供了一种高层次的抽象。我们并不准备从头重构我们的 web 应用,不过我们可以把实现 “PRPL” 模式作为我们的迁移目标。“PRPL”实际上是 Push/Preload、Render、Precache、Lazy-Load 的缩写,我们会在下文中展开它们的具体含义。
根据 Google 推出的 Web 性能分析工具 Lighthouse(v1.6),在模拟的 3G 网络下,用户的初次访问(无任何缓存)大约在 2 秒左右达到“可交互”,可以说非常不错。而对于再次访问,由于所有资源都直接来自于 Service Worker 缓存,页面可以在 1 秒左右就达到可交互的状态了。
但是,故事并不是这么简单得就结束了。在实际的体验中我们发现,应用在页与页的切换时,仍然存在着非常明显的白屏空隙,由于 PWA 是全屏运行的,白屏对用户体验所带来的负面影响甚至比以往在浏览器内更大。我们不是已经用 Service Worker 缓存了所有资源了吗,怎么还会这样呢?
从首页点击到发现页,跳转过程中的白屏
多页应用的陷阱:重启开销
与 SPA 不同,在多页应用中,路由的切换是原生的浏览器文档跳转(Navigating across documents),这意味着之前的页面会被完全丢弃而浏览器需要为下一个路由的页面重新执行所有的启动步骤:重新下载资源、重新解析 HTML、重新运行 JavaScript、重新解码图片、重新布局页面、重新绘制……即使其中的很多步骤本是可以在多个路由之间复用的。这些工作无疑将产生巨大的计算开销,也因此需要付出相当的时间成本。
图中为我们的入口页(同时也是最重的页面)在 2 倍 CPU 节流模拟下的 profile 数据。即使我们可以将“可交互时间”控制在 1 秒左右,我们的用户仍然会觉得这对于“仅仅切换个标签”来说实在是太慢了。
HTML 文件中有标签并不意味着这些标签就能立刻被绘制到屏幕上,你必须保证页面的关键渲染路径是为此优化的。很多开发者相信将 script 标签放在 body 的底部就足以保证内容能在脚本执行之前被绘制,这对于能渲染不完整 DOM 树的浏览器(比如桌面浏览器常见的流式渲染)来说可能是成立的。但移动端的浏览器很可能因为考虑到较慢的硬件、电量消耗等因素并不这么做。不仅如此,即使你曾被告知设为 async 或 defer 的脚本就不会阻塞 HTML 解析了,但这可不意味着浏览器就一定会在执行它们之前进行渲染。
而更重要的是,一个不阻塞 HTML 解析的脚本仍然可能阻塞到绘制。我做了一个简化的“最小多页 PWA”(Minimal Multi-page PWA,或 MMPWA)来测试这个问题,:我们在一个 async(且确实不阻塞 HTML 解析)脚本中,生成并渲染 1000 个列表项,然后测试骨架屏能否在脚本执行之前渲染出来。下面是通过 USB Debugging 在我的 Nexus 5 真机上录制的 profile:
是的,出乎意料吗?首次渲染确实被阻塞到脚本执行结束后才发生。究其原因,如果我们在浏览器还未完成上一次绘制工作之前就过快得进行了 DOM 操作,我们亲爱的浏览器就只好抛弃所有它已经完成的像素,且一直要等待到 DOM 操作引起的所有工作结束之后才能重新进行下一次渲染。而这种情况更容易在拥有较慢 CPU/GPU 的移动设备上出现。
Web 是一个极其多样化的平台。从静态的博客,到电商网站,再到桌面级的生产力软件,它们全都是 Web 这个大家庭的第一公民。而我们组织 web 应用的方式,也同样只会更多而不会更少:多页、单页、Universal JavaScript 应用、WebGL、以及可以预见的 Web Assembly。不同的技术之间没有贵贱,但是适用场景的差距确是客观存在的。
即使我们的多页应用在升级 PWA 的路上不如单页的那些来得那么闪亮,但是 PWA 背后的想法与技术却实实在在的帮助我们在 web 平台上提供了更好的用户体验。
PWA 作为下一代 Web 应用模型,其尝试解决的是 web 平台本身的根本性问题:对网络与浏览器 UI 的硬依赖。因此,任何 web 应用都可以从中获益,这与你是多页还是单页、面向桌面还是移动端、是用 React 还是 Vue 无关。或许,它还终将改变用户对移动 web 的期待。现如今,谁还觉得桌面端的 web 只是个看文档的地方呢?
Since the very first experiments that @Vue.js tweeted, we at Ele.me (the biggest food ordering and delivering company in China) have been working on upgrading our mobile website to a Progressive Web App. We’re proud to ship the world-first PWA exclusively for the Chinese market, but even prouder to collaborate with Google, UC and Tencent to push the boundary of web experience and browser supports in China.
Multi-page, Vue, PWA?
There is a prevailing opinion that only structuring a web app as a Single Page App can we build PWAs that deliver app-like user experience. Popular reference examples including Twitter Lite, Flipkart Lite, Housing Go and Polymer Shop are all using the SPA model.
However at Ele.me, we’ve come to appreciate many advantages of a Multi-Page App model, and decided to refactor the mobile site from an Angular 1 SPA to a Multi-Paged app more than a year ago. The most important advantage we see is the isolation and decoupling between pages, which allows us to built different parts of the mobile site as “micro-services”. These services can then be independently iterated, embedded into 3rd-party apps, and even maintained by different teams.
Meanwhile, we still leverage Vue.js to boost our productivity. You may have heard of Vue as a rival of React or Angular, but Vue’s lightweight and performance make it also a perfect replacement of traditional “jQuery/Zepto + template engine” stack when engineering a Multi-page app. We built every component as Single File Components so they can be easily shareable between pages. The declarative-ness plus reactivity Vue offered help us manage both code and data flow. Oh, did I mention that Vue is progressive? So things like Vuex or Vue-Router can be incrementally adopted if our site’s complexity scales up, like…migrating to SPA again? (Who knows…)
In 2017, PWA seems to be all the rage, so we embark on exploring how far can our Vue-based Multi-page PWAs actually go.
Implementing “PRPL” with MPA
I love PRPL pattern because it gives you a high-level abstraction of how to structure and design your own PWA systems. Since we are not rebuild everything from scratch, we decided taking implementing PRPL as our migration goal:
1. PUSH critical resources for initial URL route.
The key of pushing/preloading is to prioritize resources hidden in deep dependency graph and make browser’s network stack busy ASAP. Let’s say you have a SPA with code splitting by route, you can push/preload chunks for the current route before the “entry chunks” (e.g. webpack manifest, router) finish downloading and evaluating. So when the actual fetches happen, they might already be in caches.
Routes in MPAs naturally fetch code for that route only, and tend to have a flattening dependency graph. Most scripts depended by Ele.me are just <script> elements, so they can be found and fetched by good old browser preloader in early parsing phase without explicit <link rel="preload">.
To take benefits from HTTP2 Multiplexing, we currently serve all critical resources under a single domain (no more domain sharding), and we are also experimenting on Server Push.
2. RENDER initial route & get it interactive ASAP
This one is essentially free (ridiculously obvious) in MPA since there’s only one route at one time.
A straightforward rendering is critical for metrics such as First-Meaningful-Paint and Time-To-Interactive. MPAs gain it for free due to the simplicity of traditional HTML navigation they used.
3. PRE-CACHE remaining routes using Service Worker
This’s the part Service Worker come to join the show. Service Worker is known as a client-side proxy enabling developers to intercept requests and serve responses from cache, but it can also perform initiative fetch to prefetch then precache future resources.
We already used Webpack in the build process to do .vue compilation and asset versioning, so we create a webpack plugin to help us collecting dependencies into a “precache manifest” and generating a new Service Worker file after each build. This is pretty much like how SW-Precache works.
In fact, we only collect dependencies of routes we flagged as “Critical Route”. You can think of them as “App Shell” or the “Installation Package” of our app. Once they are cached/installed successfully, our web app can boot up directly from cache and available offline. Routes that “not critical” would be incrementally cached at runtime during the first visit. Thanks to the LRU cache policies and TTL invalidation mechanisms provided by SW-Toolbox, we have no worries of hitting the quota in a long run.
4. LAZY-load & instantiate remaining routes on demand
Lazy-loading and lazily instantiating remaining parts of the app is relatively challenging for SPA to achieve. It requires both code splitting and async importing. Fortunately, this is also a built-in feature of MPA model, in which routes are naturally separated.
Noticed that the lazy-loading can be done instantly if the requested route is already pre-cached in Service Worker cache, no matter whether SPA or MPA is used. #ServiceWorkerAwesomeness
Surprisingly, we found Multi-page PWA is kinda naturally “PRPL”! MPA has already provided built-in support for “PRL”, and the second “P” involving Service Worker can be easily fulfilled in any PWA.
So what about the end result?
In Lighthouse simulation (3G & 5x Slower CPU), we made Time-To-Interactive around 2 seconds, and this was benchmarked on our HTTP1 test server.
The first visit is fast. The repeat visit with Service Worker is even faster. You can check out this video to see the huge difference between with or without Service Worker:
Did you see that? No, I mean the annoying blank screen. Even in the Service Worker one, the blank screen is still conspicuous during navigating. How can that be?
Multi-page Pitfall: Redo Everything!
Unlike SPA, changing routes in MPA means actual browser navigation happens: The previous page is discarded completely and the browser need to redo everything for next route: re-download resources, re-parse HTML, re-evaluate JavaScript, re-decode image data, re-layout the page and re-paint the screen, even many of them could be shared across routes. All of these works combined requires significant computing and time.
So here is the profile (2x slower CPU simulated) of our entry page (most heavy one). Even we can make Time-To-Interactive around 1s in repeat visit, our users can still feel too slow for just “switching a tab”.
Huge JavaScript Re-Startup Cost
According to the profile, most of the time (900ms) before hitting the first paint is spent on evaluating JavaScript. Half is on dependencies including Vue Runtime, components, libraries etc., another half is on actual Vue starting-up and mounting. Because all UI rendering is depended on JavaScript/Vue, all of the critical scripts remain guiltily parser-blocking. I’m by no means blaming JavaScript or Vue’s overheads here, It’s just a tradeoff when we need this layer of abstraction in engineering.
In SPA, JavaScript Start-up Cost is amortized during the whole lifecycle. Parsing/Compiling for each script is only once, many heavy executing can be only once. The big JavaScript objects like Vue’s ViewModels and Virtual DOM can be kept in memory and reused as much as you want. This is not the case in MPA however.
Could Browser Caches Help?
Yes or no.
V8 introduced code caching, a way to store a local copy of compiled code so fetching, parsing and compilation could all be skipped next time. As @addyosmani mentioned in JavaScript Start-up Performance, scripts stored in Cache Storage via Service Worker could trigger code caching in just the first execution.
Another browser cache you might hear of is “Back-Forward Cache”, or bfcache. The name varies, like Opera’s “Fast History Navigation” or WebKit’s “Page Cache”. The idea is that browsers can keep the previous page live in memory, i.e. DOM/JS states, instead of destroying everything. In fact, this idea works very well for MPA. You can try every traditional Multi-page websites in iOS Safari and observe an instantaneously loading when back/forward. (With browser UI/Gesture or with hyperlink can have a slight difference though.)
Unfortunately, Chrome has no this kind of in-memory bfcache currently concerning to memory consumption and its multi-process architecture. It just leverages HTTP disk cache to simplify the loading pipeline, almost everything still needs to be redone. More details and discussions can be seen here.
Striving for Perceived Performance
Although the reality is dark, we don’t want to give up so easily. One optimization we try to do is to render DOM nodes/create Virtual DOM nodes as less as possible to improve the Time-To-Interactive. While another opportunity we see is to play tricks on perceived performance.
@owencm have written a great post “Reactive Web Design: The secret to building web apps that feel amazing” covering both “Instant loads with skeleton screens” and “Stable loads via predefined sizes on elements” to improve perceived performance and user experience. Yes, we actually used both.
What about we showing the end result after these optimizations first before entering technical nitty gritty? There you go!
Too fast and can not see the pulsing Skeleton Screen clearly? Here is a version showing how it looks like under 10 times slower CPU.
This is a much better UX, right? Even we have slow navigation in slow devices, at least the UI is stable, consistent and always responding. So how we get there?
Rendering Skeleton Screen with Vue at Build-Time
As you might have guessed, the Skeleton Screen that consists of markups, styles, and images is inlined into *.html of each route. So they can be cached by Service Worker, be loaded instantly, and be rendered independently with any JavaScript.
We don’t want to manually craft each Skeleton Screen for each routes. It’s a tedious job and we have to manually sync every change between Skeleton Screens and the actual UI components (Yes we treat every route as just a Vue component). But think about it, Skeleton Screen is just a blank version of a page into which information is gradually loaded. What if we bake the Skeleton Screen into the actual UI component as just a loading state so we can render Skeleton Screen out directly from it without the issue of syncing?
Thanks to the versatility of Vue, we can actually realize it with Vue.js Server-Side Rendering. Instead of using it on a real server, we use it at build time to render Vue components to strings and injected them into HTML templates.
Fast Skeleton Painting…
Having markups in *.html doesn’t mean that they will be painted fast, you have to make sure the Critical Rendering Path is optimized for that. Many developers believed that putting script tags in the end of the body is sufficient for getting content painted before executing scripts. This might be true for browsers supporting rendering an incomplete DOM tree (e.g. streaming render), But browsers might not do that in mobile concerning slower hardwares, battery, and heats. And even we are told that script tags with async or defer is not parser-blocking, it also doesn’t mean we can get content painted before executing scripts in reality.
First I want to clarify it a little bit. According to the Scripting section of HTML (WHATWG living standard, the W3C’s same here), async scripts would be evaluated as soon as it is available thus could potentially blocking parsing. Only defer (and not inlined) is specified to be never block parsing. That’s why Steve Souders ever posted “Prefer DEFER Over ASYNC”. (defer has its own issue and we will cover it later.)
Then I want to say: A script not blocking parser could still block painting nonetheless. So here is a reduced test I wrote named “Minimal Multi-page PWA”, or MMPWA, which basically render 1000 list items within an async (and truly not parser-blocking) script to see if we can get Skeleton Screen painted before scripts get executed. The profile below (over USB debugging on my real Nexus 5) shows my ignorance:
Yes, keep your mouth open. The first paint is blocked. I was also surprised here. The reason I guess is that if we touch DOM so quickly that the browser has still NOT finished previous painting job, our dear browser has to abort every pixel it has drawn, and has to wait until current DOM manipulation task finished and redo the rendering pipeline again. And this more often happens with a mobile device with a slower CPU/GPU.
Fast Skeleton Painting with setTimeout Hack
We indeed encountered this problem when testing our new beautiful Skeleton Screen. Perhaps Vue finishes its job and start to mount nodes too fast ;). But anyway we have to make it slower, or rather lazier. So we try to put DOM manipulation things inside setTimeout(callback, 0), and it works like a charm!
I think you may curious about how this change performs in the wild, so I have refined MMPWA by rendering 5000 list items rather 1000 to make the differences more obvious, and by designing it in an A/B testing manner. The code is on Github and the demo is live on huangxuan.me/mmpwa/. Here is also a video for loungers.
This famous setTimeout hack (a.k.a. Zero Delays) looks quite magic, but it is science™. If you are familiar with event loop, it just prevents these code from executing in the current loop by putting everything to the task queues with the Timer Callback, so the browser could breath (update the rendering) in the main thread.
So we applied what we learned from MMPWA by putting new Vue() inside setTimeout and BOOM! We have Skeleton Screen painted consistently after every navigating! Here is the profile after all these optimizations.
Huge improvements right? This time we hit First Paint (Skeleton Screen Paint) at 400ms and TTI at 600ms. You should really go back to have a before-after comparison in details.
One more thing that I deferred
But wait, why is there still a bunch of guiltily parser-blocking scripts? Are them all async? OK, ok. For historical reasons, we do keep some parser-blocking scripts, like lib-flexible, we couldn’t get rid of it without a huge refactoring. But most of these blocking scripts are in fact defer. We expected that they can be executed after parsing and in order, however the profile kinda slap on my face. :(
Remember I said I would talk about one issue of defer previously? Yes, that’s it. I have had a conversation with Jake Archibald and it turns out it might be a bug of Chrome when the deferred scripts are fully cached. Vote it at crbug!
Similar improvements can be seen from Lighthouse (Under same server and network environment). A Pro Tip is you should always use lighthouse in a variable controlling approach.
Chinese users tend to have a pretty powerful phone. MI4 is shipped with snapdragon 801 (slightly out-performs Nexus 5) but only costs 100$. It’s affordable by at least 80% of our users so we take it as a baseline.
Here is a video screen-recorded on my Nexus 5 showing switching between 4 tabs. The performance varies between tabs due to their variant scale. The heaviest one, entry page, take around 1s to hit real Time-To-Interactive on my Nexus 5.
FYI. This is surprisingly comparable to what I get from Chrome Simulation with 2x CPU throttling. With 5x throttling, this can spend 2–3s to get TTI, horribly. (To be honest, I found even under same throttling, the results can vary drastically depended on my Macbook’s “mood”.)
Final Thoughts
This article is much longer than I could imagine. I am really appreciated if you could get here. So what can we learn from it?
MPA still has some way to go
Jake Archibald ever said that “PWA !== SPA” at Chrome Dev Summit 2016. But the sad truth is that even we have taken advantages of bleeding edge technologies such as “PRPL” pattern, Service Worker, App-Shell, Skeleton Screen, there is still a distance between us and many Single Page PWA just because we are Multi-page structured.
The web is extremely versatile. Static blogs, e-business sites, desktop-level software, all of them should be the first-class citizens of the web family. MPA might have things like “bfcache API”, navigation transitions to catch up the SPA in the future, but it is not today certainly.
PWA is Awesome No Matter What
Hey, I am not overblowing it. Even we as a Multi-page PWA couldn’t be as stunning and app-like as many Single Page PWAs are. The idea and technologies behind PWA still help us deliver a much better experience to our users on the web that hasn’t been possible before.
What PWA is trying to solve are some fundamental problems of current web application model such as its hard dependencies to network and browser UIs. That’ why PWA can be always beneficial no matter what architecture or what framework you actually used. Addy Osmani would give a talk Production Progressive Web Apps With JavaScript Frameworks at this year’s I/O (and I/O 16). You won’t want to miss it!