Wednesday, June 14, 2017

Finding closest pair of points in two disjoint sets of points

The finding nearest pair of points in two disjoint sets of points (or Polygons made of points) can be found in O(nlogn) computational complexity (with O(n) memory complexity).

With ordinary tree or graph search, solving this is not possible because we do not know the starting and ending points in advance. The k-d binary search tree can used to solve this problem as it can be used to navigate inside the two dimensional space rather easily. 

This is infact a last resort to solve this problem if we do not make any assumptions about the distribution of the points. If we can make some assumptions about the points, e.g. if they are not nested, lie on either side of one set etc, then the problem becomes more simpler and can be solved with relative less complexity. 

K – Dimensional binary tree divides the dimensional space with each of its node with its X or Y (..) coordinate so while navigating it one doesn’t has to navigate through all of its nodes.

This is an example of the two sets of points:





















#include <iostream>

using namespace std;

int const k=2; // the number of dimensions
double min_distance = 10000; // set a large default value, in this example all distance will be shorter than this. 

double distance(int arr[], int arr2[])
{
 return sqrt(pow(arr2[0] - arr[0], 2) + pow(arr2[1] - arr[1], 2));
 
}

struct Node {
 int point[k];
 Node *left, *right;
 Node()
 {
  left = right = NULL;  

 }
};

// A method to create a node of K D tree
struct Node* newNode(int arr[])
{
 struct Node* temp = new Node;

 for (int i = 0; i<k; i++)
  temp->point[i] = arr[i];

 return temp;
}

Node * insertNode(Node * node, int arr[], int d)
{
 if (node == NULL)
  return newNode(arr);

 int dim = d%k;


 if (node->point[dim] > arr[dim])
 {
  

  node->left = insertNode(node->left, arr, dim + 1);
 }
 else
 {
 
  node->right = insertNode(node->right, arr, dim + 1);
 }

 return node;
}
Node * Nearest=NULL;

Node * FindnearestNode(Node * head1, int arr[], int d)
{
 // if empty tree, return
 if (head1 == NULL)
  return NULL;

 // check for each tree. 
   if (min_distance > distance(head1->point, arr))
 {
  min_distance = distance(head1->point, arr);
  Nearest = head1;
 }

 if (head1->left == NULL && head1->right == NULL)
  return head1;

 // findout current dimension, in this case it either x or y i.e. 0 or 1
 int dim = d%k;

 // navigate through the tree as if inserting to a new member (to remain to the nearest member in closeness). in the path for insert it will find the nearest member. 
 if (head1->right && head1->point[dim] < arr[dim])
  return FindnearestNode(head1->right, arr, d+1);
 else if(head1->left && head1->point[dim] > arr[dim] )
  return FindnearestNode(head1->left, arr, d+1);

 return Nearest;
}


int main()
{
 int const an = 10;
 int const bn = 10;

 int ax[an] = { 34,55,11,79,77,65,3,9,5,66 };
 int ay[an] = { 5, 6, 7, 9, 32,3,15,7,10,35 };

 int bx[bn] = { 5,35,4,41,32,64,41,54,87,3 };
 int by[bn] = { 23,33,17,15,32,22,33,23,21,32 };

  

 Node * head1=NULL;
 Node * head2 = NULL;


  
 double Final_Min_Distance = min_distance;

 // fill the K-D trees with the two dimensional data in two trees.
 for (int i = 0; i < an; i++)
 {
  int temp[k];
  temp[0] = ax[i];
  temp[1] = ay[i];

  head1=insertNode(head1, temp, 0);
  temp[0] = bx[i];
  temp[1] = by[i];

  head2=insertNode(head2, temp, 0);


 }
 Node * AnearB=NULL;

 Node * BnearA = NULL;



 min_distance = 1000;
   Final_Min_Distance = min_distance;
 for (int i = 0; i < an; i++)
 {
  int temp[k];
  temp[0] = bx[i];
  temp[1] = by[i];

  
  Node * Nearer2 = FindnearestNode(head1, temp, 0);
  if (Final_Min_Distance > min_distance)
  {
   BnearA = Nearer2;
   Final_Min_Distance = min_distance;
  }
  cout << " distance of B (" << temp[0] << "," << temp[1] << ") to nearest A (" << BnearA->point[0] << "," << BnearA->point[1] << ") distance:" << Final_Min_Distance << endl;
  min_distance = 1000;


 }
 cout << "Minimum Distance is " << Final_Min_Distance<<endl<<endl;


 min_distance = 1000;
 Final_Min_Distance = min_distance;
 for (int i = 0; i < an; i++)
 {
  int temp[k];
  temp[0] = ax[i];
  temp[1] = ay[i];


  Node * Nearer2 = FindnearestNode(head2, temp, 0);
  if (Final_Min_Distance > min_distance)
  {
   AnearB = Nearer2;
   Final_Min_Distance = min_distance;
  }
  cout << " distance of A (" << temp[0] << "," << temp[1] << ") to nearest B (" << AnearB->point[0] << "," << AnearB->point[1] << ") distance:" << Final_Min_Distance << endl;
  min_distance = 1000;


 }
 cout << "Minimum Distance is " << Final_Min_Distance;



 system("pause");

}

here is the result:




Computational Complexity to build the trees: O(n)
Memory Complexity to build the trees: O(n)
Computational Complexity to navigate through all members of set A or B: O(n)
Computational Complexity to search the tree to find the nearest neighbor points: O(logn)



So the final computational complexity would be : O(n)+O(n)+O(logn) = O(nlogn)

Saturday, September 08, 2012

SSAS : Save hours by automating the cube deployments!


Every now and then it is quite an overhead in large development environments to deploy the SSAS cubes at a large scale. Automating this process particularly helps when doing lots of changes in the SSDT (SQL Server data tools) in the development environment and when finished the changes then deploying the cubes in the UAT environment. Then after completion of next round of issues the cubes have to be deployed on the production server. With dozens of cubes, with each of it requiring deployment to two additional environments it is a considerable overhead to do it manually.

I have recently automated the deployment of a few dozen of cubes. There are several ways to do it, the below one is among the simpler ones. Basically it is about creating deployment profiles in SSDT and then invoking the SSDT via command line to build the cubes and use a SSAS utility (Microsoft.AnalysisServices.Deployment.exe )  to deploy the cubes.

The Microsoft.AnalysisServices.Deployment.exe utility can be found in this path:

Drive:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio

For creating the deployment profiles in SSDT, right click on the top line in the solution explorer, i.e. the solution and then Properties. Then click the “configuration manager” and then under ‘Active Solution configuration’ in the drop down, click “New” and give the configuration profile a name, e.g. SSASProject_Production. Now for the newly created profile, edit the deployment location in the properties page. It will change the deployment address only for that particular profile.

The data source connection properties have to be edited in the SSDT project file (found with the extension .dwproj , can be opened in notepad), it can be found inside the SSDT project folder. This file has a format like this and the location of the profile with data source connection can be found near to these kind tags around end of the file.

  <Name>SSAS Prod</Name>
      <Options>
        <ConnectionMappings>
          <ConfigurationSetting>

………   
All steps can be combined in a batch file.
----------------Batch file start ------------------------------------------------
 
ECHO ON
ECHO Build started . . .

REM  here is the location of your visual studio/SSDT exe, in this case its C, the project solution file location, the deployment profile name “SSAS Prod” here and the location of the build log file, it can be anywhere.

"C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\devenv.exe" "C:\***\****\***\***\******\SSAS_Project.sln" /build "SSAS Prod" /out "C:\ ***\****\***\***\******\\ssasbuild.log"

 ECHO Build completed . . .

 ECHO XMLA Script generation started 

REM now invoking the deployment utility. The first argument is location of the SSAS database file, can be fund inside bin folder of SSDT project with extension .asdat and the location of .xmla file which is to be generated here for deployment.

Microsoft.AnalysisServices.Deployment.exe "C:\ **\****\***\***\******\SSAS_Project.asdatabase" /d /o:"C:\ ***\****\***\***\******\SSAS_Deployment.xmla"


ECHO XMLA Script generation complete . . .

ECHO Deployment is starting now.......

REM Finally deploy the solution.

Microsoft.AnalysisServices.Deployment.exe "C:\ ***\****\***\***\******\SSAS_Project.asdatabase" /s:"C:\ ***\****\***\***\******\SSAS_Proejct\deployment.log"


 ECHO OFF

PAUSE Completed

----------------Batch file End ------------------------------------------------

 

1)      Create configuration profiles for each deployment environments

2)      Update the batch file with locations of each project or make one file for all of the projects/cube.

3)      See log for any deployment issues!

4)      process your cubes via SSIS or another method!

You are done!

Thursday, February 10, 2011

Junk Dimensions?!


Quite often SSAS/DWH designers face with the situation with several if not dozens of small what could be called small dimensions, e.g. Yes/No flags, status etc. To make each of them a separate dimension (say 20 different Yes/No flags dimensions) would simply clutter the data mart and eventually the SSAS cube. The convenient way in my opinion is to rather combine all of them in one dimension with all possible combinations (Cartesian product) from of them from Fact table in the combined dimensions.

So it could look like this:  

DimFlags(FlagID,Status1Flag, Status2Flag, Status3Flag, … etc)
 
 
 
 

Wednesday, September 03, 2008

Minimum sliding Window Problem can be solved in O(n) rather easily.

Several solutions to the minumum sliding window Problem actually create complexity  but in my opinion it can solved with rather ease while still maintaining the time and Memory complexity at O(n).

The idea is to basically create a temporary sum and Keep adding the new elements and subtracting the old elements as we scan the Array while keeping record of the last biggest sum window by storing its index.

So we need to scan the Array only once and without any additional array.


# include <iostream>
using namespace::std;

const int datalentgh=16;
int data[datalentgh]= {2, 3, 10, 12, 6, 2, 5, 1,9,8,1,5,3,8,9,9};

void main()
{
for(int i=0;i<datalentgh;i++)
cout<<data[i]<<" ";
// we create the variables for temporary sum, temporary starting index and for final starting index of the window and its sum.
int sum, tsum,start,tstart, k;
sum=tsum=start=tstart=0;
k=3;
for(int i=0; i < k ; i++)
sum+=data[i];
tsum=sum;
for(int i=k; i < datalentgh ; i++)
{
tsum= tsum - data[i-k];
tsum= tsum + data[i];
tstart=i-k;
if(tsum>sum)
{
sum=tsum;
start=tstart;
}



}

cout<<endl<<endl<<" Starting Index of Minimum Window is "<<start+1<<" and the number at that index is "<<data[start+1] <<" and sum is "<<sum<<endl;

system("pause");
}
 

Sunday, April 01, 2007

Addition without using any arthmetic operator

Often is this puzzle brought up to test someone's 'bits' knowledge but it is rather straight forward to add two numbers (whole numbers).

It works by same principle as we do it by Hand, i.e. add two numbers, Keep the carry and add it into the sum. In programming we can do it achieve the simple 'addition' via XOR operator i.e. ^ and the carry can be obtained by AND Operation on both numbers and giving them one left shift. While doing it via recurstivity it may can take more than one calls to the function .


int Addition(float a, float b)
{
if(a==0) return b; // check if first Digit is Zero, if yes then we are done
if(b==0) return a; // Same as above, check if second Digit is Zero we are done.
gcounter++; // a global Counter to see how many times was the Additiona function called.
float sum = a^b;
float carry = (a & b) <<1;

cout<<"SUM : "<<sum<<endl;
cout<<"carry : "<<carry<<endl;
return Addition(sum, carry);
}
The time and space complexity would approximately be log(n).