### The geometry of a colorful Venn Plot

The following steps are needed to build a colorful Venn Plot. I will first identify the overlap of two circles followed by the steps needed to construct the 4-item plot.

Identifying the overlap between two circles of different sizes

Coloring the intersection of two circles requires identification of the the number of degrees of arc subtended by the intersection. We know the distance between the two circles as well as the radius of each circle. In the following graph the

radii are identified by r1 and r2 and the distance, d, between the centers is the sum of d1 and d2. The goal is solve for angles θ1 and θ2.

Calculating the angles of arc for intersection circles of different sizes

Since the chord is perpendicular to the line connecting the centers, we have two right triangles sharing a common vertical axis of length A. From this we know

r12 – d12 = A2 = r22 – d22

r12 – r22  = d12 – d22

we know that d2 = d – d1, so we substitute for d2 to get

r12  – r22  = d12 – (d – d1)2 = 2 * d * d1 – d2

which becomes

(r12 – r22 + d) / (2 * d) = d1

θ1 = acos(d1/r1) and θ2 = acos(d2/r2)

The intersection subtends the arcs from -θ1 to θ1 for the red circle and from -θ2 to θ2 for the yellow circle.

When the two radii are equal, we get the case when θ1 = θ2 = acos(d/(2 * r1)

Determining the overlap for 4-items

Displaying the overlap of 4 items requires geometric forms with more degrees of freedom than circles. I chose to use four identical ellipses with different orientations and foci.

example of a colorful Venn plot for 4 items. Colors are individually specified for each region.

As in the 2-item and 3-item Venn plots, determining the ellipse intersection points is the first task. However, determining the regions to be colored involves several additional steps. The steps are as follows

1. find the intersection points for all pairs of ellipses
2. determine the intersections defining the perimeter for each region
3. order these corner points for each region
4. determine the arcs making up the perimeter of each region

1. The general solution to finding the intersection points requires solutions to a fourth degree equation. While analytic solutions are available, they are cumbersome. (see however my post on a closed form solution for the intersections of two ellipses) Some solutions are analytically tractable using symmetry arguments. This true for ellipses A and B above as they are offset by a known amount and the intersections for A and D will be on a 45 degree line between the ellipses. I chose to use a mix of analytic solutions and numerical optimization.

2. For each intersection (corner) point: calculate whether it is on / inside / outside of each ellipse. Usually there will be only two borders on which the corner point lies. In that case there are four regions that this point abuts (or eight regions if the point is on three borders). Using the four combinations for the two borders along with the indications of inside vs outside the other ellipses, identify each point with it’s adjacent regions. So in general each point will be represented as four entries one for each adjacent region.

3. Group the points by adjacent region. Each point is now represented in several regions.

For each region compute the centroid of the corner points. If a region has two or more disconnected sections, pick one to represent the region and keep only the points that are corner points for that section.

Within each region order the points in a clockwise (or counter-clockwise) direction relative to the centroid for that region.

4.  The first arc on the region boundary is on the ellipse that is common to the first two corner points. Continue around the region perimeter using the ellipse that is common to adjacent corner points, moving to a new ellipse after each corner point.

Repeat the step for each region and color the enclosed area to create a colorful Venn plot.

The method above is best implemented in a layered fashion, coloring each ellipse, then the intersection of each pair of ellipses, then the intersection for each set of 3 ellipses, etc. The reasoning is that the intersection of convex figures such as ellipses is also convex. Knowing that the intersection is convex makes it easier to determine the boundaries and make the distinction between the interior versus exterior points making up the overlap.

Routines for plotting 2-, 3- and 4-item Venn diagrams with customizable colors for each intersection are available in the R package, colorfulVennPlot at

http://cran.r-project.org/web/packages/colorfulVennPlot/index.html

The package also contains the R source code.