The biggest thing I learned from writing my first manuscript

Research is broken.

I shouldn’t complain too much, as it’s infinitely easier to find facts, data, studies, quotes, and other resources than pretty much ever before in history. When I read of how famed biographer Robert Caro researched his seminal works on Robert Moses and Lyndon B. Johnson, up to his elbows in archival paperwork, piecing together a narrative from memos, receipts, and other ephemera – something he and his wife are still doing well into their 80s – I count my digital blessings.

But, despite this embarrassment of riches, it’s still far more challenging to find the truth than it should be, or even was just a few years ago.

There’s been lots of talk about how the quality of Google search has been degrading over the past few years. After a long and hard-fought battle, it seems that the SEO artists have won. Search for a product, like a camera or a mattress, and the results that aren’t ads or shops themselves are often affiliate blogs, fake reviews, and industry plants.

Google search is so noisy that many searchers have taken to habitually adding “Reddit” to the end of their search query in pursuit of getting trustworthy information from a real human. In a poll run by tech blog Android Authority, nearly 70% of respondents claimed to add the modifier sometimes or always. Others have turned to TikTok or even ChatGPT as potential replacements to a clunky Google. (Microsoft sees the opportunity, recently announced a partnership with OpenAI, the company behind ChatGPT, to bring that tech to their Bing search engine.)

These are all imperfect. Reddit and TikTok can be great tools for getting product reviews or how-to tips, but are not built for substantive research. ChatGPT and similar AI tools are promising, but have a nasty habit of making things up whole-cloth. The human race produce 2.5 quintillion bytes of data each day, and we’re drowning in new information, true or false, while we try desperately to make sense of it all.

Except that last fact there, is that true? This is the problem I ran into over and over again while doing the research for Simply Put – that stat that you see when you look something up, is it true?

That number for our daily data, 2.5 quintillion bytes, seems like it might be true. It’s big, right?

I found it on a blog I never heard of before called SeedScientific. It was number two on a listicle of 27 “staggering facts,” but there wasn’t a source for it in the article (however, there was a mass of miscellaneous sources jumbled up at the bottom). It’s also the number in a big, bold “featured snippet” at the top of Google’s results page, this one linking to another unfamiliar source, “TechJury.” Going to that page, it links to a site called The Next Tech, in particular a blog post from three years prior. That article doesn’t have any source for the claim, so back to Google I go.

The next link down is for a site, also unfamiliar, called Finances Online. The same number appears there, but just in a list with a non-linked attribution reading “SG Analytics, 2020.” Down at the bottom is another unorganized list citations, and I find one from that firm. Let’s click it.

Ok, so that lands us on the blog of a “global insights and analytics company” named SG Analytics. The title of the article is promising, referencing our 2.5 quintillion number. But in the body of the post, it just cites the source as “Social Media Today.” No link again. Back to Google.

Next is a Forbes article! They’re trustworthy, right? Except that one is just citing an infographic from 2017 on data management app Domo’s blog, which doesn’t directly cite the source, let alone link anywhere. Several other top results on search also point back to this same infographic. Others point back to the same Forbes article.

This might be a circular maze of infinite torture. Let’s change things up and start with a trusted source, the data aggregator Statista. They have a resource titled “Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025.” The sources here check out, but I can only be sure of that because I logged in with my CUNY faculty account to see the details. While it’s not quite the same thing, we could finally end this hunt and switch the claim I made in the sentence that started this whole wild goose chase to read “The human race produced and consumed an estimated 97 zetabytes of data last year…”

All of that legwork, shifting through dozens of sites, familiar and not, to find a simple statistic that was essentially just a supporting line to a small piece of a single blog post. And even then, I was only able to find a source that I trusted by using my institutional credentials.

This was the biggest thing I learned while writing the first draft of my manuscript. It happened when I was looking up American literacy rates, social media usage patterns, and the frequency of certain words as used in modern English. In every one of these and other inquiries, the entire content-industrial-complex that is the modern internet pointed me to an unverified, inaccurate answer – citing each other back and forth as a source.

Simply Put is a short business book, not a gargantuan five thousand page, multi-volume biography. But I still want it to be thorough, accurate, and useful, and I wasn’t willing to accept hearsay statistics and quote attributions as truth. There’s still a long editing and revision journey ahead, but this is the piece of the process that has stuck with me the most.

We’re perilously sliding into a world where more of the truth is buried, jumbled, or simply locked away. The signal is increasingly lost to the noise.