Grumpy Developer. BitBucket, diffs and encodings


Grumpy Developer. BitBucket, diffs and encodings

Note: I was writing a nice descriptive post on red-black trees for today, but ran out of time and decided to do this short ranty thing instead

The Reason to Be Grumpy

As part of my day-to-day job I work with BitBucket. BitBucket in general is a pretty good service, although for my personal projects I use GitHub and GitLab. GitHub is for times when I want my projects to be publicly available (just like this website); GitLab - when I’m messing around with code that I don’t want anyone to see for the fear of public embarrassment (like ‘We Are Alive’, oh yeah, what happened to it? give me a month or two, I’ll explain). I actually used to use BitBucket, but switched to GitLab because of the promise of easy build automation. Totally justified.

But anyway. BitBucket, as you might expect, has a tool that allows you to see differences between an old and a new version of a file. A ‘diff tool’, if you will…

And the diff tool has this thing, where if you’re looking at a comparison of two files containing special characters (like a pound sign) - they get displayed as �. See, I didn’t think it was a problem with the diff tool, so I spent a while trying to figure out whether something went screwy with my Eclipse, that could’ve made it replace valid special characters with �s.

Anyway, to cut a long story short, BitBucket has a ticket for this problem kicking around since 2014. The way they explain why the problem is happening is - since the code files don’t hold any metadata about their encodings, BitBucket’s diff just defaults to UTF-8. Which means that when you’re using, say, ISO characters - BitBucket throws a tantrum and displays them as �. Or, apparently, sometimes it displays some special characters as other special characters. Great.

However, if you look at the raw file - it gets displayed as is, because you’re literally looking at the raw file; they fetch it byte by byte and echo it to the browser, no fancy encoding stuff happening there.

Apparently, there is a plan to implement a feature where they will read the encoding from the .gitattributes file, but it’s still in the works. In the meantime the 2014 ticket is still open and grumpy comments under it keep accumulating.

So, to sum up: BitBucket diff mode - confusing; displaying a raw file - pretty accurate; BitBucket service desk - ugh.