omniverse theirix's Thoughts About Research and Development

Learning by outdated books

bookshelf

(not my actual bookshelf, but close enough, CC0)

Learning new things is hard, and finding the right way to do that is even more complicated. Since academia, I prefer books to get a systematic introduction to a subject. The first chapters of a book, as well as per-chapter and book conclusions, are the most valuable parts to skim through. It usually works well with more theoretical material. For example, I dug into books about distributed systems and storage architectures. Usually, they are peer-reviewed, thick and stand a couple of editions. The theoretical foundations for those systems rarely change.

There is a different story for more practical books. If you are going to read a book about a specific technology, you have a great chance to stumble upon an older book written a few years ago. The pace of technology is very high nowadays, and that book could be outdated soon after being published. I assume an attentive reader should follow the book and try doing the described topics by oneself, even if exercises and follow-ups are made-up. The first idea is to drop it and start learning from more up-to-date documentation. But wait! Learning from outdated books is a perfect chance to master technology. You would stop taking written as granted and begin to find out why the subsystems or approaches were deprecated or replaced with others. It helps to understand the rationale behind the outdated system design, its advantages and defects, and why the new system is superior to the former. You will dig into documentation more often rather than copy-pasting the code from the book.

And to be honest, everyone has a bookshelf with books that were bought a long time ago just to read them someday. So outdated books are helpful. Use them to your advantage.

Crossposted from my LinkedIn

Sad story about modems and routers

Need for broadband

We are so used to have broadband and mobile internet that we take it for granted. This year we escaped to the summer house from a city, heat and pandemic. One of the most important things for us is a good and fast connection.

We have several internet usage scenarios:

  1. casual browsing
  2. watching streaming services
  3. listening to the home music library

I was surprised how easily first the two scenarios can be accomplished even with a crappy internet connection. Browser caching and progressive loading work very nice. YouTube and streaming services can degrade video quality smoothly and buffer minutes of video without hiccups. But when you require a consistent and steady connection, problems arise.

Our house is equipped with a Wi-Fi router connected via a LoRa radio link to the base station which is connected to the ISP by an optic or copper channel. The problem here is a very limited bandwidth of the ISP, the download rate rarely exceeds 3 Mbps.

Okay, there are a lot of 3G/LTE USB dongles that can be plugged into a router and provide a steady LTE connection. Without hesitation, I got one from my mobile ISP with a prepaid plan. While it flawlessly works with a Windows laptop after plugging into a USB port, making it work with a Wi-Fi router took a while.

ZTE modem

The dongle appeared to be a branded ZTE 8810FT device that was sold by MTS ISP (VID 0x19d2, PID 0x1225). It can work as an usb-to-ethernet device or as a modem device. Internally this dongle works on an Android OS and has special drivers for radio and USB networking. Also, it has a simple web interface for configuring basic settings and working with SMS and USSD calls.

What is the difference between those modes? In an ethernet mode, a modem device should work without drivers with any system supporting a generic RNDIS or CDC device. On Linux, you will have a CDC ethernet device (cdc_ether), on Windows most likely an RNDIS device. In a modem mode, a dongle provides a modem device via a COM port that can be used to send AT commands for dialling to the mobile network. Configuring a modem connection is performed by NCM, QMI or MBIM protocols. You will have a /dev/ttyUSB0 or /dev/cdc-wdm0 on successful configuration.

Besides Windows, you are out of luck. On Mac, you need a special driver even for an ethernet mode. Because providing a driver for a USB device requires another USB device (it increases a package cost) or internet access (to download drivers for a modem), the dongle uses a trick. A dongle announces itself as a composite USB device including mass storage. So the first time the host system does not have network drivers for a dongle. A USB disk device appears. A user runs an installer from this disk and installs a driver for usb-to-ethernet. Next time a plugged-in device is detected as an Ethernet device and you get a DHCP-provided IP address. Unfortunately, this trick works extremely bad with headless hosts like a router or a server. It is extremely unreliable with Mac or Windows after touching any part of this fragile system (like removing an Ethernet adapter from the system).

My experiments with ZTE dongle and Linux systems show that the dongle just does not work on Linux. I had tried a laptop with a Debian Buster as well as a TP-LINK TL-WR842N v5 router with an OpenWRT 17 and 21. The dongle can enter an Ethernet mode (ethernet interface is created) but it does not provide a connection to a host system. Entering a modem mode is much more complicated. The dongle should create COM ports only after switching to serial mode (i.e. debug mode) which is not persisted across device reboots. But even before reboot COM ports are non-functioning and do not accept any AT commands on any available OS. A special factory mode that exists for other dongles is not available for this device. So both modes do not work without a special driver that does not exist for Linux.

Amusingly, switching between modes can be performed by nvram tweaks as well as by requesting a web server for a special URL. Hello, CSRF!

Unfortunately, my experiments with ZTE were ended early because this device just bricked without any special efforts from me. Host systems cannot see the USB device. Looks like a read/write filesystem on the device can be damaged due to mode switching, or it is just a faulty device. ISP employee told me I am the third person in a day returning this modem. I doubt all customers are tinkering with firmware so much. So if you can, avoid ZTE 8800FT, especially if you need non-Windows support.

Huawei modem

I purchased another well-known modem Huawei E3372h-320 (VID 0x12d1, PID 0x1f01). It has a usb-to-ethernet mode known as HiLink-mode after the firmware name. There was a more versatile 153 model that has two modes (modem and usb-to-ethernet) but it is out of production.

This new modem was recognized in macOS and Windows after installing vendor-provided drivers. A stock router firmware did not recognize the modem. OpenWRT 21.02 works fine after installing drivers:

   opkg install kmod-usb-net-cdc-ether kmod-usb-net-huawei-cdc-ncm

Then I just configured a bunch of OpenWRT settings and got a new ethernet device with a DHCP-provided IP address.

It would be a happy ending to the story but connections were extremely unstable. A connection to the router via Wi-Fi and Ethernet so the LAN is ok. Let’s check the WAN. Sadly, I can use only the web interface to the router which can only display basic instant metrics. It is awkward. So I decided to monitor metrics from the modem continuously.

Building a monitoring system on InfluxDB

To get metrics programmatically I found a few HTTP endpoints on the modem’s web server that provide data for displaying in a HiLink web interface. It was trivial to get a session cookie because Hilink does not require any authorization. Then I can fetch all data, normalize and store them into a time-series database. Because I gather data periodically, I need to push metrics into a database.

A push-based approach dismissed Prometheus and Grafana which are well-known instruments for visualizing metrics. So I tried InfluxDB. A version 2.0 provides a nice dashboard (ex-Telegraph, I think) which is more than enough to draw plots. Versatile queries allow me to calculate statistics over sliding windows.

I prefer SQL-based query languages like InfluxQL in older versions of InfluxDB. The second version provides a new javascript-based language Flux. It is easy to start with and fun to use but sometimes plain old SQL could be enough.

What I dislike in InfluxDB 2.0 is a very glitchy web interface for editing queries which sometimes lost updates from the query editor. I gave up, fired a Vim and started sending queries via API:

curl -X POST 'http://localhost:8086/api/v2/query?org=hilinkmon-org' \
  --header "Authorization: Token $(sed '/INFLUXDB/!d;s/.*TOKEN=//' .env)" \
  --header 'Accept: application/csv' \
  --header 'Content-type: application/vnd.flux' \
  --data "$(cat test.flux)"

One more disturbing issue is the inequality of table and scalar values in UI. One cannot simply display a scalar value in UI without wrapping it into table. Hope it will be addressed in future versions of UI but for now you are out for using API calls.

The monitoring application hilinkmon I wrote is available on GitHub. It is easy to set up via provided Docker compose. Hope it helps somebody with debugging LTE issues.

The repository includes a predefined dashboard for monitoring with key metrics for LTE connections: RSSI, RSRP, SINR and CQI.

hilinkmon dashboard

Analyzing metrics

Okay, I can gather a lot of metrics now. Let’s describe what those acronyms stay for:

  • RSSI (Received Signal Strength Indicator) is signal strength in dBm. It is used in a bar indicator.
  • RSRP (Reference Signal Received Power) has a close meaning to RSSI.
  • SINR (Signal Interference Noise Ratio) is a signal to noise ratio. Simple and easy to understand.
  • CQI (Channel Quality Indication) is a discrete class of quality that specify a modulation scheme and code rate.

Different network generations like 3G and LTE use those metrics but physical interpretation differs. All metrics listed above are better with higher values. Consult Internets for proper ranges of each metric. What is more important is to check metric dynamics as you tune your system.

Why do you need to look for all those metrics? Maximal RSSI does not always mean the best connection. You will get a bad connection if the modem is too optimistic and prefer to pick a higher channel (large CQI is chosen by user equipment by checking SINR) while the signal (SINR and RSSI) is unstable. Looks like Huawei modems do exactly this. By moving the modem around I tend to maximize RSSI while keeping SINR stable to avoid rapid channel jumps.

With a help of hilinkmon, I figured out that my router makes a lot of radio noise to the plugged-in modem. So I wrapped a Wi-Fi router with a foil to make a shield between an LTE modem and router. Only Wi-Fi antennas and vent holes stay open. Things got much better. And hot because it was July. So I switched from a foil shield to an air shield.

A simple USB cable could not work because my modem has a weak PSU and a cable is long so the signal fades. So I built a weird construction with an additional power brick and USB cable splitter. The connection was the best when a dongle was at a precise position hold by rubber bands. No photo of a foil shield and the optimized position, sorry. Check a photo of a generic setup.

lte cables

When I noticed a frequent switching between 3G and LTE on a graph, I quickly disabled 3G in the modem settings. Then metrics stabilized.

Problems with a router

After all those tweaks I noticed frequent freezes in the Internet connection. Connection from the router to the Internet was ok, connection to the router was ok but the Internet connection from the Wi-Fi client interrupted up to a minute. Nothing in logs, temperature sensors or metrics.

It can be reproduced on OpenWRT 21.02 (last release candidate) by a permanent load like copying files over a VPN connection. No multiple connections, additional Wi-Fi clients, even a simple client can trigger a freeze.

When I almost lost hope, I flashed a beta firmware for TL-WR842N which added support for a Huawei E3372h modem. And it helps!

Of course, I lost a lot of cool OpenWRT features but all freezes disappeared at all. A lot of threads about freezes can be found on OpenWRT forums without proper solutions. Looks like there is a problem in the firmware itself. Maybe this problem is somehow caused by a weak CPU. I did not expect that a super-stable OpenWRT can freeze in a strange unpredictable way. Based on my experience, it is much more likely to see it in DD-WRT firmware betas.

Regarding the stock firmware, I don’t know why adding support for a simple ethernet-based modem from 2014 took so much time for TP-Link (remember, it is still beta). As I understand from OpenWRT, there is no need to use a modem mode. Just add a few lines with USB PIDs/VIDs so an USB device is properly detected.

Outcomes

It was a long journey to a stable and fast connection, and I achieved it only at the end of my vacation. It is a sad story but what did I learn from it?

  1. Never buy hardware that is not confirmed to work on your operating system. Confirmed means a reported success story in a forum or an explicit statement from a vendor.

  2. Sometimes even a super-stable open-source firmware like OpenWRT can experience a strange bug.

  3. Observability is a key engineering principle. Always measure system output before you change something.

  4. Turn problems into a chance to learn something new. Like digging into USB mode-switching and drivers, reverse-engineering a strange android box, learning about cellular networks and trying a new time-series database. At least make it fun.