Quaternion

De WikiFractal
Révision datée du 15 juin 2019 à 10:15 par Ren39 (discussion | contributions) (Forme polaire)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à : navigation, rechercher



Introduction

Dans un premier temps, je poserais les bases des quaternions. Dans un deuxième temps, j’expliquerais les opération arithmétiques. Puis je présenterais une implémentation en VB.NET et en C/CUDA. Après nous travaillerons sur les fractals

Quaternions, les Bases

En mathématiques, un quaternion est un nombre dans un sens généralisé. Les quaternions englobent les nombres réels et complexes dans un système de nombres où la multiplication n’est plus une loi commutative. Les quaternions sont ainsi le premier exemple de nombres hypercomplexes.

Forme générale

L’ensemble des quaternions [math]\mathbb{H}[/math] peut être décrit comme l’algèbre associative unifère sur le corps des nombres réels [math]\mathbb{R}[/math] engendrée par trois éléments [math]i[/math], [math]j[/math] et [math]k[/math] satisfaisant les relations quaternioniques [math]i^2 = j^2 = k^2 = ijk = -1[/math].

Concrètement, tout quaternion [math]q[/math] s’écrit de manière unique sous la forme [math]q = w + xi + yj + zk[/math][math]w[/math], [math]x[/math], [math]y[/math], et [math]z[/math] sont des nombres réels et [math]i[/math], [math]j[/math] et [math]k[/math] sont trois symboles.

Les quaternions s’ajoutent et se multiplient comme d’autres nombres (associativité de la multiplication et de l’addition, distributivité de la multiplication sur l’addition, etc), en prenant garde à ne pas s’autoriser de changer l’ordre des facteurs dans un produit (la multiplication n’est pas commutative), sauf pour un facteur réel. Lorsque des produits des symboles [math]i[/math], [math]j[/math] et [math]k[/math] sont rencontrés, ils sont remplacés par leurs valeurs :

[math]\begin{aligned} {4} i^2 &= -1, &\qquad ij & = k, & \qquad ji & = -k, \\ j^2 &= -1, &\qquad jk & = i, & kj & = -i, \\ k^2 &= -1, &\qquad ki & = j, & ik & = -j, \end{aligned}[/math]

La formule [math]i^2 = j^2 = k^2 = ijk = -1[/math] condense toutes ces relations.

Quaternions, les opérations arithmétiques

Définition

On définit un quaternion tel que : [math]\begin{aligned} q=w+xi+yj+zk \label{DefQ}\end{aligned}[/math] Avec comme partie Réel : [math]\begin{aligned} Reel(q)=w\end{aligned}[/math] Avec comme partie Imaginaire : [math]\begin{aligned} Img(q)=xi+yj+zk\end{aligned}[/math]

Forme algébrique et polaire

On peut définir un quaternion sois de manière algébrique comme vue précédemment, mais aussi sous forme polaire. Comme les nombre complexe d’ailleurs. Afin de faciliter la compréhension, je vais faire un rappel sur les nombres complexes.

Rappel sur les nombre complexe

Soit un nombre complexe [math]z[/math] sous la forme algébrique :

[math]\begin{aligned} z=x+yi\end{aligned}[/math]

Soit un nombre complexe [math]z[/math] sous la forme polaire :

[math]\begin{aligned} z=r*cos(\theta)+r*sin(\theta)i\end{aligned}[/math]

Avec comme formule pour passer de l’un à l’autre :

[math]\begin{aligned} r= \sqrt{x^2+y^2}\end{aligned}[/math]

[math]\begin{aligned} \theta= {\begin{cases}\arccos \left({\tfrac {a}{r}}\right)&{\text{si }}b\geq 0\\-\arccos \left({\tfrac {a}{r}}\right)&{\text{si }}b\lt 0,\end{cases}} \end{aligned}[/math]

[math]\begin{aligned} \theta={\begin{cases}\arcsin \left({\tfrac {b}{r}}\right)&{\text{si }}a\geq 0\\\pi -\arcsin \left({\tfrac {b}{r}}\right)&{\text{si }}a\lt 0\end{cases}}\end{aligned}[/math]

[math]\begin{aligned} \theta= {\begin{cases}\arctan \left({\tfrac {b}{a}}\right)&{\text{si }}a\gt 0\\\arctan \left({\tfrac {b}{a}}\right)+\pi &{\text{si }}a\lt 0.\end{cases}}\end{aligned}[/math]

[math]\begin{aligned} x=r*cos(\theta)\end{aligned}[/math]

[math]\begin{aligned} y=r*sin(\theta)\end{aligned}[/math]

Forme algébrique

La forme algébrique [DefQ] simplifie les additions et soustractions, car c’est une opération de terme à terme.

Forme polaire

La forme algébrique [RefQP] simplifie les multiplications et les divisions, car c’est une opération séparer entre l’argument [math]\theta[/math] et la rayon [math]r[/math]. [math]\begin{aligned} q=r*cos(\theta) + r*sin(\theta)(cos(\Phi_x)i + cos(\Phi_y)j + cos(\Phi_z)k) \label{RefQP}\end{aligned}[/math] Soit : [math]\begin{aligned} w =r*cos(\theta)\end{aligned}[/math] [math]\begin{aligned} x =r*sin(\theta)cos(\Phi_x)i\end{aligned}[/math] [math]\begin{aligned} y =r*sin(\theta)cos(\Phi_y)j\end{aligned}[/math] [math]\begin{aligned} z =r*sin(\theta)cos(\Phi_z)k\end{aligned}[/math]

L’algorithme suivant permet passer de la forme algébrique à ma forme polaire : [math]\begin{aligned} r=sqrt(x^2+y^2+z^2+w^2)\end{aligned}[/math] [math]\begin{aligned} \theta = acos\left( \frac{w}{r} \right) \end{aligned}[/math] [math]\begin{aligned} B = sqrt\left( r^2 - w^2\right) \end{aligned}[/math] [math]\begin{aligned} \Phi_x = acos\left( \frac{x}{B} \right) \end{aligned}[/math] [math]\begin{aligned} \Phi_y = acos\left( \frac{y}{B} \right) \end{aligned}[/math] [math]\begin{aligned} \Phi_z = acos\left( \frac{z}{B} \right) \end{aligned}[/math]

Addition et soustraction

Quaternions, l’implémentation

Implémentation en scilab-6.0.1

clear;
clf();

function out=RandN(n)
    out=rand()*2*n-n;
endfunction

q=struct('x','y','z','w');
qu=struct('x','y','z','w');
qout=struct('x','y','z','w');
q_out2=struct('x','y','z','w');

q.x=RandN(4)
q.y=RandN(4)
q.z=RandN(4)
q.w=RandN(4)


A=sqrt(q.x*q.x+q.y*q.y+q.z*q.z+q.w*q.w)

qu.x=q.x/A
qu.y=q.y/A
qu.z=q.z/A
qu.w=q.w/A

theta = acos(q.w/A)

B=sqrt(A*A-q.w*q.w)

phi_x=acos(q.x/B)
phi_y=acos(q.y/B)
phi_z=acos(q.z/B)

qout.w=A*cos(theta)
qout.x=A*sin(theta)*cos(phi_x)
qout.y=A*sin(theta)*cos(phi_y)
qout.z=A*sin(theta)*cos(phi_z)


disp(qout.w-q.w)
disp(qout.x-q.x)
disp(qout.y-q.y)
disp(qout.z-q.z)


function q=MultQ(q2,q1)
    q=struct('x','y','z','w')
    //q.x =  q1.x * q2.w + q1.y * q2.z - q1.z * q2.y + q1.w * q2.x;
    //q.y = -q1.x * q2.z + q1.y * q2.w + q1.z * q2.x + q1.w * q2.y;
   // q.z =  q1.x * q2.y - q1.y * q2.x + q1.z * q2.w + q1.w * q2.z;
   //q.w = -q1.x * q2.x - q1.y * q2.y - q1.z * q2.z + q1.w * q2.w;
    
    q.w = q1.w * q2.w - q1.x * q2.x - q1.y * q2.y - q1.z * q2.z;
    q.x = q1.w * q2.x + q2.w * q1.x + q1.y * q2.z - q1.z * q2.y;
    q.y = q1.w * q2.y + q2.w * q1.y - q1.x * q2.z + q1.z * q2.x;
    q.z = q1.w * q2.z + q2.w * q1.z + q1.x * q2.y - q1.y * q2.x;
endfunction



function q_out=PowQ(q_in,power) 
    q_out=q_in;
    
    if power <= 0 then
        q_out.x=0;
        q_out.y=0;
        q_out.z=0;
        q_out.w=0;
    else
        for i=1:(power-1)
            q_out=MultQ(q_out,q_in);
        end
    end
endfunction
Power=5
q_outP=PowQ(q,Power) 

q_out2.w=A^Power*cos(theta*Power)
q_out2.x=A^Power*sin(theta*Power)*cos(phi_x)
q_out2.y=A^Power*sin(theta*Power)*cos(phi_y)
q_out2.z=A^Power*sin(theta*Power)*cos(phi_z)
disp("*****************")
disp(q_outP.x-q_out2.x)
disp(q_outP.y-q_out2.y)
disp(q_outP.z-q_out2.z)
disp(q_outP.w-q_out2.w)

disp("*****************")
q_out2.w=exp(log(A)*Power)*cos(theta*Power)
q_out2.x=exp(log(A)*Power)*sin(theta*Power)*cos(phi_x)
q_out2.y=exp(log(A)*Power)*sin(theta*Power)*cos(phi_y)
q_out2.z=exp(log(A)*Power)*sin(theta*Power)*cos(phi_z)


disp(q_outP.x-q_out2.x)
disp(q_outP.y-q_out2.y)
disp(q_outP.z-q_out2.z)
disp(q_outP.w-q_out2.w)

Implémentation en VB.net

Public Class Quaternion
#Region "Constructor"
    Public Sub New(xAs Double, y As Double, z As Double, w As Double)
        m_x = x
        m_y = y
        m_z = z
        m_w = w
    End Sub
#End Region

#Region "Properties"
    Private m_x As Double
    Public Property x() As Double
        Get
            Return m_x
        End Get
        Set(ByVal value As Double)
            m_x = value
        End Set
    End Property

    Private m_y As Double
    Public Property y() As Double
        Get
            Return m_y
        End Get
        Set(ByVal value As Double)
            m_y = value
        End Set
    End Property

    Private m_z As Double
    Public Property z() As Double
        Get
            Return m_z
        End Get
        Set(ByVal value As Double)
            m_z = value
        End Set
    End Property

    Private m_w As Double
    Public Property w() As Double
        Get
            Return m_w
        End Get
        Set(ByVal value As Double)
            m_w = value
        End Set
    End Property
#End Region
#Region "Operation"
    Public Sub normalise()
        Dim n = Me.length
        Me.x /= n
        Me.y /= n
        Me.z /= n
        Me.w /= n
    End Sub

    Public Sub add(s As Double)
        Me.x += s
        Me.y += s
        Me.z += s
        Me.w += s
    End Sub

    Public Sub multiplication(x2 As Double, y2 As Double, z2 As Double, w2 As Double)
        Dim x1 = m_x
        Dim y1 = m_y
        Dim z1 = m_z
        Dim w1 = m_w

        Me.x = x1 * w2 + y1 * z2 - z1 * y2 + w1 * x2
        Me.y = -x1 * z2 + y1 * w2 + z1 * x2 + w1 * y2
        Me.z = x1 * y2 - y1 * x2 + z1 * w2 + w1 * z2
        Me.w = -x1 * x2 - y1 * y2 - z1 * z2 + w1 * w2
    End Sub
    Public Sub Power(n As Integer)
        Dim a1 = Me.x
        Dim a2 = Me.y
        Dim a3 = Me.z
        Dim a4 = Me.w



        For i As Integer = 1 To (n - 1)
            Me.multiplication(a1, a2, a3, a4)
        Next


    End Sub
    Public Sub Power(pow As Double)
        Dim a1 = Me.x
        Dim a2 = Me.y
        Dim a3 = Me.z
        Dim a4 = Me.w

        Dim A As Double = length()
        Dim coef As Double = 1.0
        If A > 1 Then
            coef = 1.0
        ElseIf A <> 0 Then
            coef = 1 / A
        End If


        Me.x = Me.x * coef
        Me.y = Me.y * coef
        Me.z = Me.z * coef
        Me.w = Me.w * coef

        A = length()


        Dim theta As Double = Math.Acos(Me.w)
        Dim B As Double = Math.Sqrt(A * A - Me.w * Me.w)
        Dim phi_x As Double = Math.Acos(Me.x / B)
        Dim phi_y As Double = Math.Acos(Me.y / B)
        Dim phi_z As Double = Math.Acos(Me.z / B)

        Me.x = Math.Exp(pow * Math.Log(A / coef)) * Math.Sin(theta * pow) * Math.Cos(phi_x)
        Me.y = Math.Exp(pow * Math.Log(A / coef)) * Math.Sin(theta * pow) * Math.Cos(phi_y)
        Me.z = Math.Exp(pow * Math.Log(A / coef)) * Math.Sin(theta * pow) * Math.Cos(phi_z)
        Me.w = Math.Exp(pow * Math.Log(A / coef)) * Math.Cos(theta * pow)

    End Sub


    Public Sub add(Q As Quaternion)
        Me.x += Q.x
        Me.y += Q.x
        Me.z += Q.x
        Me.w += Q.x
    End Sub
#End Region

#Region "Function"
    Public Function length() As Double
        Return Math.Sqrt(x * x + y * y + z * z + w * w)
    End Function

    Public Function print() As String
        Return "[" + Math.Round(Me.length, 2).ToString + "] i*" +
            Math.Round(x, 2).ToString + " + j*" +
            Math.Round(y, 2).ToString + " + k*" +
            Math.Round(z, 2).ToString + " + " +
            Math.Round(w, 2).ToString
    End Function

#End Region
End Class

Implémentation en C/CUDA

En C/Cuda, il n’y a a pas la possibilité de de faire des classes, comme plus haut en VB.NET. Donc dans un premier temps on définit une structure :

typedef struct  struct_Q {
    float x;
    float y;
    float z;
    float w;
} struct_Q_T;

Puis on définit la fonction Norme :

__device__  float  Get_QNorm_DEVICE_(struct_Q_T Q)
{
    return sqrtf(Q.x*Q.x + Q.y*Q.y + Q.z*Q.z + Q.w*Q.w);
}

Puis on définit la fonction Power :

__device__ void Get_QPow(struct_Q_T *Q, float pow)
{
    float A = Get_QNorm(Q);
    float theta = 0.0f;
    float B = 0.0f;
    float R = 0.0f;
    if (pow > 0.0f && A>0.000001f)
    {
        float coef = 1.0f;
        if (A<1.0f)
        {
            //printf("%f *******\n", A);
            coef = 1 / A;
            Q->x *= coef;
            Q->y *= coef;
            Q->z *= coef;
            Q->z *= coef;

        }
        A = Get_QNorm(Q);
        //printf("%f +++++++++\n", A);
        theta = acosf(Q->w / A)*pow;
        B = sqrt(A*A - Q->w*Q->w);
        R = exp2f(logf(A / coef)* pow);
        Q->x = R*sinf(theta)*(Q->x / B);
        Q->y = R*sinf(theta)*(Q->y / B);
        Q->z = R*sinf(theta)*(Q->z / B);
        Q->z = R*cosf(theta);

    }
    else
    {
        //printf("%f --------\n", A);
        Q->w = 0.0f;
        Q->x = 0.0f;
        Q->y = 0.0f;
        Q->z = 0.0f;

    }
}

Fractals

Exemple

Algorithme

Algorithme Séquentiel

Quaternion Algo1.PNG

Algorithme Parallèle

Quaternion Algo2.PNG

Implémentation VB.NET

Implémentation C/CUDA

Le programme est composé de 6 parties principales

  • L’importation des librairies
  • La définition des structures
  • La définition des variables globales
  • La définition des fonctions Device GPU
  • La définition du programme Device GPU
  • La définition du programme Host CPU

L’importation des librairies

#include "cuda_runtime.h" //lib W10
#include "device_launch_parameters.h"//lib W10
#include <iostream> // prompt Output
#include <fstream> //File Output
#include <math.h> //lib mayh
#include <stdio.h> // lib stantard
#include <cuda_fp16.h> // lib CUDA

La définition des structures

// Pour X,Y,Z et W
typedef struct  struct_P_float {
    float start;
    float end;
    int NbPoints;
    float step;
} struct_P_float_T;


typedef struct  struct_P_Power {
    float value;
} struct_P_Power_T;

typedef struct  struct_P_Iter {
    int start;
    int end;
} struct_P_Iter_T;


typedef struct  struct_P_Rlimit {
    float value;
} struct_P_Rlimit_T;

typedef struct  struct_P_Simulation {
    //Quaternions
    struct_P_Power_T X;
    struct_P_float_T Y;
    struct_P_float_T Z;
    struct_P_float_T W;
    //Parametre Fixe
    struct_P_Iter_T Iter;
    float Rlimit;
    //Parametrer variable systematique
    float Power;
} struct_P_Simulation_T;

typedef struct  struct_Q {
    float x;
    float y;
    float z;
    float w;
} struct_Q_T;

La définition des variables globales

struct_P_Simulation_T *P_Simulation_DEVICE;
short *Tab_Iter;

La définition des fonctions Device GPU

__device__  void CreateQ_By_float(struct_Q_T *out, float x, float y, float z, float w)
{
    out->x = x;
    out->y = y;
    out->z = z;
    out->w = w;
}

__device__  float  Get_QNorm(struct_Q_T *Q)
{
    return sqrtf(Q->x*Q->x + Q->y*Q->y + Q->z*Q->z + Q->w*Q->w);
}

__device__ void Get_QPow(struct_Q_T *Q, float pow)
{
    float A = Get_QNorm(Q);
    float theta = 0.0f;
    float B = 0.0f;
    float R = 0.0f;
    if (pow > 0.0f && A>0.000001f)
    {
        float coef = 1.0f;
        if (A<1.0f)
        {
            //printf("%f *******\n", A);
            coef = 1 / A;
            Q->x *= coef;
            Q->y *= coef;
            Q->z *= coef;
            Q->z *= coef;

        }
        A = Get_QNorm(Q);
        //printf("%f +++++++++\n", A);
        theta = acosf(Q->w / A)*pow;
        B = sqrt(A*A - Q->w*Q->w);
        R = exp2f(logf(A / coef)* pow);
        Q->x = R*sinf(theta)*(Q->x / B);
        Q->y = R*sinf(theta)*(Q->y / B);
        Q->z = R*sinf(theta)*(Q->z / B);
        Q->z = R*cosf(theta);

    }
    else
    {
        //printf("%f --------\n", A);
        Q->w = 0.0f;
        Q->x = 0.0f;
        Q->y = 0.0f;
        Q->z = 0.0f;

    }
}

La définition du programme Device GPU

// CUDA kernel to Compute itermax of quaternion
__global__ void kernel( const struct_P_Simulation_T *P_Simulation, short *Tab_Iter)
{
    struct_Q_T Q_Current;
    float w, x, y, z;
    int iter = 0;
        //X
        x = P_Simulation->X.value;
        //Y
        y= ((float)blockIdx.y)*P_Simulation->Y.step + P_Simulation->Y.start;
        //Z
        z = ((float)blockIdx.x)*P_Simulation->Z.step + P_Simulation->Z.start;
        //W
        w = ((float)threadIdx.x)*P_Simulation->W.step + P_Simulation->W.start ;

        CreateQ_By_float(&Q_Current, x, y, z, w);

        for (iter = 0; iter <= P_Simulation->Iter.end; iter++)
        {
            Get_QPow(&Q_Current, P_Simulation->Power);
            Q_Current.x += x;
            Q_Current.y += y;
            Q_Current.z += z;
            Q_Current.w += w;

            if (Get_QNorm(&Q_Current) > P_Simulation->Rlimit)
                goto Fin;
        }
        Fin:
        if (iter > 0)
            iter--;
        Tab_Iter[(blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x] = (short)iter;
}

Le programme Device GPU est composé de 4 parties principales

  • Définition des variables

        struct_Q_T Q_Current;
        float w, x, y, z;
        int iter = 0;
  • Calcul du quaternion

    //X
            x = P_Simulation->X.value;
            //Y
            y= ((float)blockIdx.y)*P_Simulation->Y.step + P_Simulation->Y.start;
            //Z
            z = ((float)blockIdx.x)*P_Simulation->Z.step + P_Simulation->Z.start;
            //W
            w = ((float)threadIdx.x)*P_Simulation->W.step + P_Simulation->W.start ;
            CreateQ_By_float(&Q_Current, x, y, z, w);
  • Boucle du calcul de iter max

    for (iter = 0; iter <= P_Simulation->Iter.end; iter++)
            {
                Get_QPow(&Q_Current, P_Simulation->Power);
                Q_Current.x += x;
                Q_Current.y += y;
                Q_Current.z += z;
                Q_Current.w += w;
    
                if (Get_QNorm(&Q_Current) > P_Simulation->Rlimit)
                    goto Fin;
            }
            Fin:
            if (iter > 0)
                iter--;
  • Sauvegarde de la valeur trouvée

    Tab_Iter[(blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x] = (short)iter;

La définition du programme Host CPU

Le programme Host CPU est composé de 8 parties principales

  • Définition des paramètres de la fonction main
  • Définition des paramètres par défaut
  • Gestion des paramètres rentrés par ligne de commande
    • Détection si des paramètres sont rentrés
    • Analyse si nombre impaires d’arguments
    • Analyse si nombre impaires d’arguments
    • Analyse des erreurs de cohérence
  • Initialisation des fichiers
  • Initialisation des constantes
  • Initialisation des variables
  • Sélection du Device
  • Boucle sur les pas Master
    • Calcul des pas Master
    • Allocation en mémoire des variables CPU/GPU
    • Paramétrage de la simulation
    • Initialisation des Tableaux
    • Paramétrage du kernel GPU
    • Lancement du kernel GPU
    • Synchronisation des mémoires CPU/GPU
    • Analyse de la simulation
    • Sauvegarde des données dans les fichiers
      • Calcul du quaternion à partir de l’index
      • sauvegarde du point
    • sauvegarde du point
    • Libération de la mémoire
    • Reset du Device

Annexes

Programme C/CUDA complet

#include "cuda_runtime.h" //lib W10
#include "device_launch_parameters.h"//lib W10
#include <iostream> // prompt Output
#include <fstream> //File Output
#include <math.h> //lib mayh
#include <stdio.h> // lib stantard
#include <cuda_fp16.h> // lib CUDA

// Pour X,Y,Z et W
typedef struct  struct_P_float {
    float start;
    float end;
    int NbPoints;
    float step;
} struct_P_float_T;


typedef struct  struct_P_Power {
    float value;
} struct_P_Power_T;

typedef struct  struct_P_Iter {
    int start;
    int end;
} struct_P_Iter_T;


typedef struct  struct_P_Rlimit {
    float value;
} struct_P_Rlimit_T;

typedef struct  struct_P_Simulation {
    //Quaternions
    struct_P_Power_T X;
    struct_P_float_T Y;
    struct_P_float_T Z;
    struct_P_float_T W;
    //Parametre Fixe
    struct_P_Iter_T Iter;
    float Rlimit;
    //Parametrer variable systematique
    float Power;
} struct_P_Simulation_T;

typedef struct  struct_Q {
    float x;
    float y;
    float z;
    float w;
} struct_Q_T;

struct_P_Simulation_T *P_Simulation_DEVICE;
short *Tab_Iter;


__device__  void CreateQ_By_float(struct_Q_T *out, float x, float y, float z, float w)
{
    out->x = x;
    out->y = y;
    out->z = z;
    out->w = w;
}

__device__  float  Get_QNorm(struct_Q_T *Q)
{
    return sqrtf(Q->x*Q->x + Q->y*Q->y + Q->z*Q->z + Q->w*Q->w);
}

__device__ void Get_QPow(struct_Q_T *Q, float pow)
{
    float A = Get_QNorm(Q);
    float theta = 0.0f;
    float B = 0.0f;
    float R = 0.0f;
    if (pow > 0.0f && A>0.000001f)
    {
        float coef = 1.0f;
        if (A<1.0f)
        {
            //printf("%f *******\n", A);
            coef = 1 / A;
            Q->x *= coef;
            Q->y *= coef;
            Q->z *= coef;
            Q->z *= coef;

        }
        A = Get_QNorm(Q);
        //printf("%f +++++++++\n", A);
        theta = acosf(Q->w / A)*pow;
        B = sqrt(A*A - Q->w*Q->w);
        R = exp2f(logf(A / coef)* pow);
        Q->x = R*sinf(theta)*(Q->x / B);
        Q->y = R*sinf(theta)*(Q->y / B);
        Q->z = R*sinf(theta)*(Q->z / B);
        Q->z = R*cosf(theta);

    }
    else
    {
        //printf("%f --------\n", A);
        Q->w = 0.0f;
        Q->x = 0.0f;
        Q->y = 0.0f;
        Q->z = 0.0f;

    }
}

//
//__device__  void  Get_QAdd(struct_Q_T *Q1, struct_Q_T *Q2)
//{
//  Q1->x += Q2->x;
//  Q1->y += Q2->y;
//  Q1->z += Q2->z;
//  Q1->w += Q2->w;
//}





// CUDA kernel to Compute itermax of quaternion
__global__ void kernel( const struct_P_Simulation_T *P_Simulation, short *Tab_Iter)
{
    //int Tempindex = 0;
    struct_Q_T Q_Current;
    float w, x, y, z;
    int iter = 0;
        //X
        x = P_Simulation->X.value;
        //Y
        y= ((float)blockIdx.y)*P_Simulation->Y.step + P_Simulation->Y.start;
        //Z
        z = ((float)blockIdx.x)*P_Simulation->Z.step + P_Simulation->Z.start;
        //W
        w = ((float)threadIdx.x)*P_Simulation->W.step + P_Simulation->W.start ;

        CreateQ_By_float(&Q_Current, x, y, z, w);

        for (iter = 0; iter <= P_Simulation->Iter.end; iter++)
        {
            Get_QPow(&Q_Current, P_Simulation->Power);
            Q_Current.x += x;
            Q_Current.y += y;
            Q_Current.z += z;
            Q_Current.w += w;

            if (Get_QNorm(&Q_Current) > P_Simulation->Rlimit)
                goto Fin;
        }
        Fin:
        if (iter > 0)
            iter--;
        Tab_Iter[(blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x] = (short)iter;
}

int main(int argc, char *argv[])
{

    //#################### CONFIG par DEFAUT #################
    struct_P_float_T ParameterDelaults;
    // ******************** NbPoints **************
    ParameterDelaults.NbPoints = 10;
    char Str_NbPoints[] = "-NbPoints";
    //********************* Parameter y,z,w *************
    ParameterDelaults.start = -3.0f;
    char Str_Q_start[] = "-Q_start";
    ParameterDelaults.end = 3.0f;
    char Str_Q_end[] = "-Q_end";
    ParameterDelaults.step = (ParameterDelaults.end - ParameterDelaults.start) / ((float)ParameterDelaults.NbPoints - 1);
    char Str_Q_step[] = "-Q_step";
    //*********************  NameFile Output *************
    char NameFile[110];
    char NameFile_csv[110];
    char NameFile_histo[110];
    strcpy(NameFile, "OutputFile");
    char Str_NameFile[] = "-o";
    //*********************  X value *************
    float X = -0.3375f;
    char Str_X[] = "-X";

    float POWER = 2.0;
    char Str_Power[] = "-Power";

    char Str_H[] = "-h";
    char Str_Help[] = "--help";

    int itermax = 255;
    float Rmax = 4.0;


    if (argc > 1)
    {
        if (argc % 2 == 0)
        {
            
            for (int i = 1; i <= argc; i++)
            {
                if (strcmp(argv[i], Str_Help) == 0 || strcmp(argv[i], Str_H) == 0)
                {
                    std::cout << "Help :  \n";
                    std::cout << "      -NbPoints : numbers of points \n";
                    std::cout << "                  Ctrt 01 : if start = end , NbPoints must be equal to 1 \n";
                    std::cout << "                  Ctrt 02 : if start < end , NbPoints must be sup to 1 \n";
                    std::cout << "                  Ctrt 03 : NbPoints must be type int\n";
                    std::cout << "      -Q_start : value  of start \n";
                    std::cout << "                  Ctrt 03 :  start >= end \n";
                    std::cout << "                  Ctrt 03 : start must be type float\n";
                    std::cout << "      -Q_end : value  of end \n";
                    std::cout << "                  Ctrt 03 :  start >= end \n";
                    std::cout << "                  Ctrt 03 : end must be type float\n";
                    std::cout << "      -o : Output File \n";
                    std::cout << "                  Ctrt 05 :  len  must be inf 100 char \n";
                    std::cout << "      -X : value of x \n";
                    std::cout << "                  Ctrt 03 : x must be type float\n";
                    std::cout << "      -Power : value of Power \n";
                    std::cout << "                  Ctrt 03 : Power must be type float\n";
                    std::cout << "                  Ctrt 03 : Power must be sup 0.0 \n";
                    std::cout << "      -h / --help : show help \n";
                    std::cout << "      Example :\n";
                    std::cout << "               Programme.exe -X 0.3375 -Q_start -2.0 -Q_end 2.0 -NbPoints 4 -o FileOutput.csv \n";
                    return 0;
                }
            }
            std::cout << "Error 00 : Argument impaire" << "\n";
            return -1;
        }
        for (int i = 1; i < argc; i += 2)
        {
            std::cout <<"Analyse du couple d'arguments :  "<< argv[i] << " " << argv[i + 1] << "\n";
            if (strcmp(argv[i], Str_Q_start) == 0)
            {
                ParameterDelaults.start = (float)atof(argv[i + 1]);
                if (errno)
                {
                    std::cout << "Error 02 " << Str_Q_start << ": value is not float " << "\n";
                    return -1;
                }
            }
            else if ((strcmp(argv[i], Str_Q_end) == 0))
            {
                ParameterDelaults.end = (float)atof(argv[i + 1]);
                if (errno)
                {
                    std::cout << "Error 02 " << Str_Q_end << ": value is not type float " << "\n";
                    return -1;
                }
            }
            else if ((strcmp(argv[i], Str_X) == 0))
            {
                X = (float)atof(argv[i + 1]);
                if (errno)
                {
                    std::cout << "Error 02 " << Str_X << ": value is not type float " << "\n";
                    return -1;
                }
            }
            else if ((strcmp(argv[i], Str_NbPoints) == 0))
            {
                ParameterDelaults.NbPoints = atoi(argv[i + 1]);
                if (errno)
                {
                    std::cout << "Error 02 " << Str_NbPoints << ": value is not type int " << "\n";
                    return -1;
                }
            }
            else if ((strcmp(argv[i], Str_Power) == 0))
            {
                ParameterDelaults.NbPoints = atoi(argv[i + 1]);
                if (errno)
                {
                    std::cout << "Error 09 " << Str_Power << ": value is not type float " << "\n";
                    return -1;
                }
            }
            else if ((strcmp(argv[i], Str_NameFile) == 0))
            {
                if(strlen(argv[i + 1])<100)
                    strcpy(NameFile, argv[i + 1]);
                else
                {
                    std::cout << "Error 07 strlen fileOutput must be inf to 100 signe \n";
                    return -1;
                }
            }
            else
            {
                std::cout << "Warning 08  Arg not know : "<< argv[i] << " " << argv[i+1] << "\n";
            }
        }
    }
    if (POWER <=0.0f)
    {
        std::cout << "Error 10 POWER < 0.0f : " << POWER << "<" << 0.0f << "\n";
        return -1;
    }
    if (ParameterDelaults.end < ParameterDelaults.start)
    {
        std::cout << "Error 03 end < start : " << ParameterDelaults.end << "<"<< ParameterDelaults.start << "\n";
        return -1;
    }
    if (ParameterDelaults.end == ParameterDelaults.start && ParameterDelaults.NbPoints !=1)
    {
        std::cout << "Warning  04 end == start  and NbPoints !=1:  So NbPoints force to 1\n";
        ParameterDelaults.NbPoints = 1;
    }
    if(ParameterDelaults.NbPoints > 1)
    {
        ParameterDelaults.step = (ParameterDelaults.end - ParameterDelaults.start) / ((float)ParameterDelaults.NbPoints);
        if (ParameterDelaults.step < 0.0001f)
        {
            std::cout << "Error 05 step < 0.0001 : " << ParameterDelaults.end << "<" << 0.0001f << "\n";
            std::cout << "step must be sup at 0.0001 \n";
            return -1;
        }
    }
    else if(ParameterDelaults.end != ParameterDelaults.start)
    {
        std::cout << "Error 06 NbPoints < 1:  So NbPoints must be sup 0 \n";
        return -1;
    }
    /****** Print Parmaters Used ******/
    std::cout << "Parameters Current : " << "\n";
    std::cout << "      Q_start = " << ParameterDelaults.start << ", Q_end = " << ParameterDelaults.end << ", Q_Step = " << ParameterDelaults.step << ", Nbpoints = " << ParameterDelaults.NbPoints << "\n";
    std::cout << "      x_value = " << X << "\n";
    std::cout << "      FileOutput = " << NameFile << "\n";
    std::cout << "      IterMax_Max = " << itermax << ", Rmax = " << Rmax  << "\n";

    /********  Clear File ************/
    std::ofstream file;
    strcpy(NameFile_csv, NameFile);
    strcat(NameFile_csv, ".csv");
    file.open(NameFile_csv);
    file << "X;Y;Z;W;iter;\n";
    file.close();

    strcpy(NameFile_histo, NameFile);
    strcat(NameFile_histo, ".histo");
    file.open(NameFile_histo);
    file << "index;";
    for (int i = 0; i <= itermax; i++)
        file << i << ";";
    file << "\n";
    file.close();
    
    //#################### Constante(s) #################

    const int  maxMaster = ParameterDelaults.NbPoints * ParameterDelaults.NbPoints  * ParameterDelaults.NbPoints;
    const int NbPoints = 128;
    const int  maxMinor = NbPoints * NbPoints  * NbPoints;

    //#################### Variables(s) #################

    int Y = 0;
    int Z = 0;
    int W = 0;

    short Tab_Histo[300];
    int  Nbpoint_iter = 0;
    int  Pas = 0;
    int  Avancement = 0;

    cudaSetDevice(1);
    for(int index = 0; index < maxMaster; index++)
    {
        int indexTemp = index;

        Y = indexTemp / (ParameterDelaults.NbPoints * ParameterDelaults.NbPoints);
        indexTemp = indexTemp - (ParameterDelaults.NbPoints * ParameterDelaults.NbPoints)*Y;

        Z = indexTemp / (ParameterDelaults.NbPoints);
        indexTemp = indexTemp - (ParameterDelaults.NbPoints * Z);

        W = indexTemp;

        

        std::cout << "cudaMallocManaged Config  -->  Start" << "\n";
        // Allocate Unified Memory -- accessible from CPU or GPU
        cudaMallocManaged(&P_Simulation_DEVICE, sizeof(struct_P_Simulation_T));
        cudaMallocManaged(&Tab_Iter, maxMinor * sizeof(short));
        std::cout << "cudaMallocManaged Config  -->  End " << "\n";
        
        std::cout << "P_Simulation Config  -->  Start" << "\n";
        //Parametrage Iter
        P_Simulation_DEVICE->Iter.end = itermax;
        P_Simulation_DEVICE->Iter.start = 10;

        //Parametrage Power
        P_Simulation_DEVICE->Power = POWER;

        //Parametrage Rmax
        P_Simulation_DEVICE->Rlimit = Rmax;

        //Parametrage X
        P_Simulation_DEVICE->X.value = X;

        //Parametrage Y
        P_Simulation_DEVICE->Y.start = (float)Y*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->Y.end = (float)(Y + 1)*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->Y.NbPoints = NbPoints;
        P_Simulation_DEVICE->Y.step = (P_Simulation_DEVICE->Y.end - P_Simulation_DEVICE->Y.start) / (P_Simulation_DEVICE->Y.NbPoints - 1);

        //Parametrage Z
        P_Simulation_DEVICE->Z.start = (float)Z*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->Z.end = (float)(Z + 1)*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->Z.NbPoints = NbPoints;
        P_Simulation_DEVICE->Z.step = (P_Simulation_DEVICE->Z.end - P_Simulation_DEVICE->Z.start) / (P_Simulation_DEVICE->Z.NbPoints - 1);

        //Parametrage W
        P_Simulation_DEVICE->W.start = (float)W*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->W.end = (float)(W + 1)*ParameterDelaults.step + ParameterDelaults.start;
        P_Simulation_DEVICE->W.NbPoints = NbPoints;
        P_Simulation_DEVICE->W.step = (P_Simulation_DEVICE->W.end - P_Simulation_DEVICE->W.start) / (P_Simulation_DEVICE->W.NbPoints - 1);
        std::cout << "P_Simulation Config  -->  End" << "\n";

        std::cout << "P_Simulation Config No "<< index << " sur  " << maxMaster << "\n";


        std::cout << "Tab_Iter and Tab_Histo Init  -->  Start" << "\n";
        for (int i = 0; i < maxMinor; i++) {
            Tab_Iter[i] = (short)0;
        }
        
        for (int i = 0; i <=itermax; i++)
            Tab_Histo[i] = 0;
        std::cout << "Tab_Iter and Tab_Histo Init -->  End" << "\n";


        std::cout << "Compude GPU -->  Start" << "\n";
        int NbThreadPerBlock = NbPoints;
        int NbBlockPerGrid = NbPoints;
        dim3 grid(NbBlockPerGrid, NbBlockPerGrid, 1);
        dim3 block(NbThreadPerBlock, 1, 1);
        kernel << <grid, block >> >(P_Simulation_DEVICE, Tab_Iter);
        std::cout << "Compude GPU -->  End" << "\n";

        std::cout << "cudaDeviceSynchronize-->  Start" << "\n";
        cudaDeviceSynchronize();
        std::cout << "cudaDeviceSynchronize -->  End" << "\n";
        
        std::cout << "Analyzer Simulation -->  Start" << "\n";
        Nbpoint_iter = 0;
        for ( int i = 0; i < maxMinor; i++)
        {
            if (Tab_Iter[i] > 0)
                Nbpoint_iter++;
            Tab_Histo[Tab_Iter[i]]++;
        }
        std::cout << "Nb point Nbpoint_iter = " << Nbpoint_iter << "\n";
        std::cout << "Soit  :  " << (float)(Nbpoint_iter * 10000 / maxMinor)/100.0f << "% \n";
        std::cout << "Analyzer Simulation -->  End" << "\n";

        std::cout << "Write Histogram -->  Start" << "\n";
        file.open(NameFile_histo, std::ofstream::out | std::ofstream::app);
        file << index <<";";
        for (int i = 0; i <= itermax; i++)
            file << Tab_Histo[i] << ";";
        file << "\n";
        file.close();
        std::cout << "Write Histogram -->  End" << "\n";

        std::cout << "Write csv -->  Start" << "\n";
        file.open(NameFile_csv, std::ofstream::out | std::ofstream::app);
        Pas = Nbpoint_iter / 100;
        Avancement = 0;
        for (int i = 0; i < maxMinor; i++)
        {
            int j = i;
            //X
            float x = P_Simulation_DEVICE->X.value;

            //Y
            float y = ((float)(j / (NbPoints*NbPoints))*P_Simulation_DEVICE->Y.step) + P_Simulation_DEVICE->Y.start;
            // on retranche 
            j -= (j / (NbPoints*NbPoints))*(NbPoints*NbPoints);

            //Z
            float z = (float)(j /NbPoints)*P_Simulation_DEVICE->Z.step + P_Simulation_DEVICE->Z.start;
            // on retranche Q2
            j -= (j / NbPoints)*NbPoints;

            //W
            //printf("index = %d  - Z Tempindex = %d \n", i, Tempindex);
            float w = (((float)j)*P_Simulation_DEVICE->W.step) + P_Simulation_DEVICE->W.start;

            short iter = Tab_Iter[i];
            if (iter > 10)
            {
                file << x << ";" << y << ";" << z << ";" << w << ";" << iter << ";\n";
                
                /*if (Avancement%Pas == 0)
                    std::cout << "Avancement de " << Avancement / (Nbpoint_iter / 100) << "%  Soit " << Avancement << " lignes sur " << Nbpoint_iter << " lignes \n";
                    */
                Avancement++;
            }
                
        }
        file.close();
        std::cout << "Write csv -->  End" << "\n";

        std::cout << "Clear Mem + Reste  -->  Start" << "\n";
        cudaFree(P_Simulation_DEVICE);
        cudaFree(Tab_Iter);
        //cudaDeviceReset();
        std::cout << "Clear Mem + Reste  -->  End" << "\n";
    }
    return 0;
}