平台,框架&库
现在阅读
IP Geolocation and CIDR Range Parsing in C#
0

IP Geolocation and CIDR Range Parsing in C#

由 ultracpy2018年1月26日

Introduction

.NET provides the IPAddress class for working with IP addresses. However, as of .NET 4.6, it doesn’t provide any built-in facilities for doing IP geolocation, i.e., determining the geographic location of an IP address (e.g., continent, country, city, or latitude/longitude). IP geolocation requires a database that is updated periodically since IP address assignments change over time. There are several regularly maintained commercial and free IP geolocation databases (e.g., GeoLite2, DB-IP), and they’re relatively easy to use from .NET.

GeoLite2 Country Data

For this article, we’ll look at a way to quickly load the free GeoLite2 Country geolocation data. GeoLite2 has its own API for reading the data, but their CSV data is simple to parse without requiring another assembly. The IPv4 blocks are in GeoLite2-Country-Blocks-IPv4.csv, and the IPv6 blocks are in GeoLite2-Country-Blocks-IPv6.csv. Both files have a structure like:

network,geoname_id,registered_country_geoname_id,represented_country_geoname_id,...
1.0.0.0/24,2077456,2077456,,0,0
1.0.1.0/24,1814991,1814991,,0,0
1.0.2.0/23,1814991,1814991,,0,0
1.0.4.0/22,2077456,2077456,,0,0

76:96:42:219::/64,6252001,,,0,0
600:8801:9400:580::/128,6252001,,,0,0
2001:200::/49,1861060,,,0,0
2001:200:120::/49,1861060,,,0,0

The network field specifies an IP address block in CIDR notation (Classless Inter-Domain Routing notation), which gives an IP address and a number of significant bits for the subnet mask. That is used to calculate start and end addresses for the network or subnet. .NET doesn’t include any functionality to parse CIDR notation, so we’ll have to handle that. The geoname_id values are integers that reference country information in GeoLite2-Country-Locations-en.csv, which has a structure like:

geoname_id,locale_code,continent_code,continent_name,country_iso_code,country_name
49518,en,AF,Africa,RW,Rwanda
51537,en,AF,Africa,SO,Somalia
69543,en,AS,Asia,YE,Yemen
99237,en,AS,Asia,IQ,Iraq

The two GeoLite2-Country-Blocks-IPv?.csv files don’t contain any fields that need to be quoted, so they can be parsed using StreamReader.ReadLine and calling string.Split on every line. However, the GeoLite2-Country-Locations-en.csv file does contain field values with embedded commas (e.g., “Bonaire, Sint Eustatius, and Saba”), so it has to be parsed using something that understands the CSV format. The easiest thing to use is .NET’s TextFieldParser. While that is declared in Microsoft.VisualBasic.dll, it can be used from C# just fine if you add a reference to that assembly.

private static Dictionary<int, string[]> LoadLocations(string fileName)
{
	Dictionary<int, string[]> result = new Dictionary<int, string[]>();

	using (var reader = new TextFieldParser(Path.GetFullPath(fileName)))
	{
		reader.TextFieldType = FieldType.Delimited;
		reader.Delimiters = new[] { "," };

		while (!reader.EndOfData)
		{
			string[] fields = reader.ReadFields();
			int geoNameId;
			if (int.TryParse(fields[0], out geoNameId))
			{
				result[geoNameId] = fields;
			}
		}
	}

	return result;
}

Parsing CIDR Blocks

After loading the blocks files, we need to parse the CIDR notation and calculate the start and end addresses. .NET’s IPAddress class can parse the address, but it won’t parse the significant bits or calculate the start and end addresses. Parsing the CIDR notation is relatively simple because we just need to split at the ‘/’ character. Calculating the start and end addresses is more involved. For a 32-bit IPv4 address converted to an address uint (using IPAddress.GetAddressBytes) the logic looks like this:

// This needs to work with routingBitCount between 0 and 32
// where 0's mask is 0, and 32's mask is 0xFFFFFFFF.
const byte BitSize = 32;
uint mask = routingBitCount == 0 ? 0 :
    unchecked(~(((uint)1 << (BitSize - routingBitCount)) - 1));
uint start = address & mask;
uint end = start | ~mask;

Similar logic would work for a 128-bit IPv6 address. However, .NET doesn’t include a simple 128-bit unsigned integer type. It includes BigInteger for arbitrary precision integers, but BigInteger is relatively slow compared to fixed-size integers. We can do better than BigInteger by splitting the logic to work with two 64-bit unsigned integers. That logic looks like:

// This needs to work with routingBitCount between 0 and 128
// where 0's mask is 0, and 128's mask is 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF.
ulong upperMask, lowerMask;
const byte BitSize = 128;
const byte HalfBitSize = BitSize / 2;
if (routingBitCount == 0)
{
	upperMask = 0ul;
	lowerMask = 0ul;
}
else if (routingBitCount <= HalfBitSize)
{
	upperMask = unchecked(~(((ulong)1 << (HalfBitSize - routingBitCount)) - 1));
	lowerMask = ulong.MinValue;
}
else
{
	upperMask = ulong.MaxValue;
	lowerMask = unchecked(~(((ulong)1 << (BitSize - routingBitCount)) - 1));
}

ulong startUpper = upper & upperMask;
ulong startLower = lower & lowerMask;
ulong endUpper = startUpper | ~upperMask;
ulong endLower = startLower | ~lowerMask;

We can encapsulate this logic for splitting CIDR notation and calculating address ranges into a CidrBlock class with an API like this:

public Address NetworkAddress { get; }
public byte RoutingBitCount { get; }
public Address StartAddress { get; }
public Address EndAddress { get; }
public static bool TryParse(string ipAddressAndBits, out CidrBlock value)

And we can create a comparable, equitable hierarchy of Address types like this:

abstract class Address : IEquatable<Address>, IComparable<Address>
sealed class V4Address : Address, IEquatable<V4Address>, IComparable<V4Address>
sealed class V6Address : Address, IEquatable<V6Address>, IComparable<V6Address>

Because they implement IComparable<T> (unlike .NET’s IPAddress class), we can use Address instances with ordered collections (e.g., a sorted List<T>’s BinarySearch). The sample AddressRangeMap<TAddress, TValue> class does this to allow searching for the first range that includes a specified Address. AddressRangeMap’s API is:

public AddressRangeMap()
public AddressRangeMap(int capacity)
public void Add(TAddress start, TAddress end, TValue value)
public void SetReadOnly()
public bool TryGetValue(TAddress address, out TValue value)

To use AddressRangeMap, you add address ranges and their associated values (e.g., geolocation information), and then call SetReadOnly so it can sort the data. Then you can call TryGetValue to look up the value associated with the address range that a specified Address falls into. Because AddressRangeMap works with any Address-derived type, you can make maps for just IPv4 addresses, just IPv6 addresses, or a mix of both depending upon your caching needs. The included sample code shows all three variations. AddressRangeMap instances are also thread-safe after SetReadOnly is called, so multiple threads can use shared instances of cached geolocation data.

Points of Interest

Address (and its descendants), CidrBlock, and AddressRangeMap are reusable and not tied to the GeoLite2 data in any way. The GeoLite2 data was just used as an example for this article and is only referenced by the Program sample class. For legal and size reasons, the article’s sample code does not include the full GeoLite2 country database. It only contains a few sample records from each file for illustration purposes. To run all of the included sample code (e.g., to test address lookups in the cached block maps), you should download the GeoLite2 country database and replace the sample .csv files.

You can also use the included classes to parse the DB-IP geolocation data if desired. For example, its dbip-country.csv file has the structure below, so you could easily cache that geolocation information using Address.Parse and AddressRangeMap.

"0.0.0.0","0.255.255.255","US"
"1.0.0.0","1.0.0.255","AU"
"1.0.1.0","1.0.3.255","CN"
"1.0.4.0","1.0.7.255","AU"
"1.0.8.0","1.0.15.255","CN"

The sample code also includes the third-party System.Net.IPNetwork.dll assembly, which contains the IPNetwork class. It is only used for validation purposes and only if you change the sample Program class’s Validate bool member to true. The IPNetwork class uses BigInteger internally for its 128-bit number calculations. By toggling Validate on and off, you can see the speed improvement we gained by using two 64-bit unsigned integers instead.

出处:https://www.codeproject.com/Articles/1082101/IP-Geolocation-and-CIDR-Range-Parsing-in-Csharp

关于作者
ultracpy
评论

你必须 登录 提交评论