Projects

Digging Seattle Sewers with Geospatial Tools

This past summer I took a data analysis in python course from Dataquest. At the end of it I felt pretty uncertain about my grasp of some of the statistical techniques it introduced, but I loved writing python and SQL code to massage and analyze data sets and I wanted to do more of it. But what to analyze?

Last year we completely replaced our connection to the city sewer. It was expensive, but we’d already spent a good chunk of change on two previous spot-repairs and saw that it was better to bite the bullet. Our sewer was about 90y old. Houses like ours had sewers made with 1st generation concrete pipe and we’d reached the end of the serviceable life, along with many others — in the time before and since we had our line replaced, we’ve noticed dozens of other houses in the neighborhoods with major sewer work.

I thought it would be interesting to see what data the data showed about sewer repairs, and what predictions one could make.

I didn’t know what data was available, but it seemed like I should be able to find the construction dates of all houses in Seattle. Permitting data should also be available for at least the past 5 years. Combining the two should cast light on how common sewer repairs are for various cohorts of housing.

Initial scrutiny of the permit data suggested I should be able to identify sewer work without much trouble, but determining the extent of that work could be more difficult. Then I found more detailed information, a dataset that cataloged all the “side sewers” in the city, and down to the level of individual stretches of pipe. There were fields for installation and inspection dates. The majority of records left these blank, but it seemed that recent repairs and replacements might have dates populated. My hypothesis is that I can

Now that I’ve identified the data I can use, I have to figure out how to combine it. In some cases, there may be primary and foreign key relationships I can use. In others, though, it looks like I’m going to have to use geographic coincidence, something I’ve never done. It looks like I can use GeoPandas to do the necessary operations, unfortunately there are some issues with the data that may complicate things.

More to come…

Something on the Internet

NULL IS NULL, except when it isn’t

I found what I think is a bug in SQLite 3.33.0: Under certain conditions, NULL isn’t evaluated as NULL. The behavior isn’t present in 3.21.0.

I was working on a project for a data analysis course when I encountered the problem. It took me a while to conclude the problem was with SQLite, rather than my code.

I’ve created an Jupyter notebook that illustrates the problem and shared a static version.

I’ve also shared a zip file with the notebook and the database I was working with.

All this was done in preparation for bringing it to the attention of SQLite developers.

Update: They fixed it in no-time-flat!

Something on the Internet

Experiencing SixColors via their RSS feed is a weird and annoying experience

I’ve been stepping away from Twitter and Facebook, and, at the same time, trying to step back into the world of RSS feeds. It’s been a bit weird. RSS isn’t dead, as some have claimed, but it seems to be treated as more of an afterthought than it once was.

For example, take the RSS feed from Six Colors, an Apple focused site formed by former employees and current contributors to various Apple publications like MacWorld. A significant portion of the feed consists of apparently complete, albeit short (a few short paragraphs), comments on something Apple-is. There are no links in the body of most of these feed items, just a URL to the HTML page on the Six Colors site, which my feedreader (NetNewsWire), renders as a link from the title text. Nor is there any explicit hint that the feed text is just an excerpt. And, because their posts are well written and carefully excerpted, it’s not always obvious that there is something missing, waiting to be discovered.

If you click through to the Six Colors site from one of these feed items you’ll see that something else is going on. Many of the posts on the Six Colors site, are themselves just excerpts from articles by Six Colors contributors and published on other websites, like MacWorld. The excerpt on the Six Colors web page is the same as the text shown in the feed, but it is followed by a link to the full text on the partner (ie MacWorld) website.

As a result, my experience reading Six Colors in NetNewsWire goes something like this.

  1. Click the title of a feed item in a list of recent updates from all my feeds.
  2. Skim/read the feed item.
  3. Take note of the icon representing the feed the item originates from.
  4. Notice that it is Six Colors
  5. Remember (or not) that the Six Colors feed mixes full-text items with excerpts.
  6. Try and guess whether the item is an excerpt and full-text.
  7. Click through to the Six Colors site, maybe.
  8. Click through the link on the Six Colors site through to Mac to get to the full article on MacWorld (etc).

It’s generous to call this a weird experience. It’s a bad experience. Most of the steps shouldn’t exist. All of the steps should present clear and obvious decisions, rather than relying on knowledge of the quirks of the Six Colors feed. It should be three steps, at most.

  1. Click title of feed item in list of recent updates…
  2. Skim/read the feed item.
  3. Optionally, click through if I want to see the full text and presentation.

At least that’s my view as a reader. I acknowledge though, this isn’t all about me. The writers and site owners have their own needs, like knowing that there are actually people reading their work, and/or getting paid for their investment of time, money and expertise. So, presumably, these extra steps help fulfill the needs of the writers and site owners. Or do they?

Does the confusing RSS experience on Six Colors help them get paid, or measure users? I don’t see third-party ads, or trackers. The only advertisement I see on Six Colors is for a subscription to the site, and the only tracker I see is local and self-hosted.

I have a much longer and (permenantly) unfinished post speculating on what may be going on, but the bottom line is that as a user, experiencing Six Colors through an RSS reader is weird, confusing and annoying.

Update: Immediately after publishing this, I noticed that Six Colors actually has three different feeds available. Two are XML/RSS feed the other is JSON. All have slightly different content.

I was using the JSON feed, probably because that was included in NetNewsWire’s default feeds. When a feed item is an excerpt, both XML/RSS feeds include a link to the full-text from he bottom of the Item. The JSON feed omits this; if it included it, I never would have written this post.

The XML/RSS feeds are supposed to differ in that one links directly to the full-text when it is hosted on a partner site (like MacWorld), while the other links to the excerpt on Six Colors. They also differ in another way, though: One links by way of a tracker, but it’s the one that links to Six Colors for everything, and therefore presents other tracking opportunities, not the one that includes links directly to third-party sites.

This new information underscores my original thesis. Feeds are an afterthought, as evidenced by the fact that the Six Colors JSON feed emits some key information for readers, while the link tracking on the XML feeds doesn’t make any sense for the interests of the site owner.

Update 2: It looks like they fixed the json feed after I made an informal bug report. As I was saying, feeds aren’t dead, but they aren’t exactly healthy.

Ineffable

First Post

New post, new domain, new blog, same old platform (wordpress).

I’ve been in blogging limbo for the better part of a decade. I have multiple blogs, but none that feel appropriate for publishing my various thoughts on various subjects.

Geekfun.com used to be my general-purpose blog. I’ve used the domain for email since forever, and for blogging since early in the millennium, but for the past several years it hasn’t feel like the right fit anymore. These days the “geek” mainstream seems to be focused on popular media culture that was formerly fringe media culture (comic books, anime, video games), all things that don’t matter much to me now, if they ever mattered to me before. I haven’t changed much, but the meaning and significance of the word has, and I’d rather not carry around the baggage other people have saddled the word with.

I made a few false starts at choosing an email identity to replace geekfun, and I’ve finally made a commitment to one. I’ll probably host a blog on that domain, eventually, but it won’t be a place for first-drafts and random thinking, that’s what this one is for.